#  Visualizing Adoption and Return Trends in Sonoma Animal Data
03/15/2024

#### Rafael L.S Reis, Dalia Cabrera Hurtado, Gabe Myers

## Introduction
The Sonoma Animal Shelter dataset comprises 29,012 records detailing various attributes of animals admitted to the shelter, including demographic information, color descriptors, intake and outcome dates, and outcome types such as adoption or return to owner. This analysis seeks to answer two key questions: first, how does the number of days an animal spends in the shelter differ between those that are adopted and those that are returned to their owners; and second, is there an association between an animal's primary coat color—extracted from compound color entries—and its outcome or duration of shelter stay.

note: This is tentative and we will likely hone down our scope of questionings(likely to colors)

data downloaded from:
https://raw.githubusercontent.com/grbruns/cst383/master/sonoma-shelter-15-october-2024.csv

In [33]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Initial Data Exploration

In [36]:
df = pd.read_csv('https://raw.githubusercontent.com/grbruns/cst383/master/sonoma-shelter-15-october-2024.csv')

## Data preprocessing

In [39]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29012 entries, 0 to 29011
Data columns (total 24 columns):
Name                    21354 non-null object
Type                    29012 non-null object
Breed                   29012 non-null object
Color                   29012 non-null object
Sex                     29012 non-null object
Size                    28976 non-null object
Date Of Birth           21897 non-null object
Impound Number          29012 non-null object
Kennel Number           29004 non-null object
Animal ID               29012 non-null object
Intake Date             29012 non-null object
Outcome Date            28746 non-null object
Days in Shelter         29012 non-null int64
Intake Type             29012 non-null object
Intake Subtype          29012 non-null object
Outcome Type            28740 non-null object
Outcome Subtype         28405 non-null object
Intake Condition        29012 non-null object
Outcome Condition       28383 non-null object
Intake Jurisdictio

In [41]:
df.sample(10)

Unnamed: 0,Name,Type,Breed,Color,Sex,Size,Date Of Birth,Impound Number,Kennel Number,Animal ID,...,Intake Subtype,Outcome Type,Outcome Subtype,Intake Condition,Outcome Condition,Intake Jurisdiction,Outcome Jurisdiction,Outcome Zip Code,Location,Count
27982,OCEAN,DOG,BOXER,WHITE,Female,MED,,K19-034910,TRUCK,A395179,...,FIELD,RETURN TO OWNER,FLD_MCHIP,UNKNOWN,HEALTHY,SANTA ROSA,SANTA ROSA,95401.0,"95401(38.44366, -122.7246163)",1
21423,29102,CAT,DOMESTIC SH,ORG TABBY,Male,KITTN,02/05/2023,K23-044302,HSSC,A416541,...,PHONE,TRANSFER,HSSC,UNKNOWN,PENDING,SANTA ROSA,COUNTY,95407.0,"95407(38.4127094, -122.7412153)",1
4442,*KARI,CAT,DOMESTIC SH,TORTIE,Spayed,KITTN,06/01/2018,K18-029389,LOBBY,A375217,...,OVER THE COUNTER,ADOPTION,SPEC EVENT,TREATABLE/REHAB,HEALTHY,SANTA ROSA,COUNTY,95472.0,"95472(38.4007555, -122.8277055)",1
8685,BILLY,DOG,PIT BULL,WHITE/BR BRINDLE,Neutered,LARGE,,K18-029143,DS92,A396164,...,FLD_ARREST,RETURN TO OWNER,OVER THE COUNTER_ARREST,HEALTHY,HEALTHY,SANTA ROSA,SANTA ROSA,95401.0,"95401(38.44366, -122.7246163)",1
8578,WILLOW,DOG,LABRADOR RETR/GOLDEN RETR,GOLD,Spayed,LARGE,12/30/2015,K23-043920,DS58,A341729,...,OVER THE COUNTER,RETURN TO OWNER,OVER THE COUNTER_MCHIP,UNKNOWN,PENDING,COUNTY,COUNTY,95472.0,"95472(38.4007555, -122.8277055)",1
3422,BRUNO,DOG,LABRADOR RETR,BLACK,Neutered,MED,,K16-021191,TRUCK,A343787,...,FIELD,RETURN TO OWNER,FLD_MCHIP,HEALTHY,HEALTHY,COUNTY,COUNTY,95476.0,"95476(38.288405, -122.464525)",1
10740,EMILY,DOG,ITAL GREYHOUND,BLACK/WHITE,Female,SMALL,,K14-011893,TRUCK,A314423,...,FLD_ARREST,RETURN TO OWNER,FLD_PRVS,HEALTHY,HEALTHY,COUNTY,*ROHNERT PARK,,,1
3748,,DOG,GERM SHEPHERD,BLACK/TAN,Female,LARGE,,K19-032272,DQ113,A387165,...,FLD_STRAY,EUTHANIZE,AGGRESSIVE,UNKNOWN,HEALTHY,SANTA ROSA,SANTA ROSA,95407.0,"95407(38.4127094, -122.7412153)",1
11462,CHUY,DOG,PIT BULL,TAN/GRAY,Neutered,LARGE,05/13/2016,K22-040416,DS76,A339772,...,FIELD,RETURN TO OWNER,OVER THE COUNTER_WEB,UNKNOWN,PENDING,SANTA ROSA,SANTA ROSA,95403.0,"95403(38.4716444, -122.7398255)",1
24827,*PATRICK,CAT,DOMESTIC SH,BRN TABBY,Neutered,KITTN,07/28/2020,K20-036880,WESTFARM,A400655,...,OVER THE COUNTER,ADOPTION,WESTFARM,UNKNOWN,PENDING,COUNTY,COUNTY,95403.0,"95403(38.4716444, -122.7398255)",1


Among dogs returned to their owners, which coat colors are most frequently associated with being lost or having escaped? To try explore this, we can create a simplified color category column to analyze color hues effectively.

Below are the functions to help create a new column to get the color shade of the animals

In [44]:
# Function to get primary color from a color string
def get_primary_color(color):
    if pd.isnull(color):
        return 'unknown'
    return color.split('/')[0].strip().lower()

# Helper function to check if primary color is in a given list, maybe not necessary but cleaner?
def primary_color_in_list(primary_color, shades_list):
    return any(shade in primary_color for shade in shades_list)

# Function to categorize color into Light, Medium, Dark, or Other shades
def categorize_shade(color):
    if pd.isna(color):
        return 'Unknown'

    primary_color = get_primary_color(color)
    # mappings, might play around with these more...
    dark_shades = ['black', 'brown', 'brindle', 'blue', 'gray', 'chocolate', 'seal']
    medium_shades = ['tan', 'red', 'gold', 'fawn', 'sable', 'yellow', 'orange']
    light_shades = ['white', 'cream', 'buff']

    if primary_color_in_list(primary_color, dark_shades):
        return 'Dark'
    elif primary_color_in_list(primary_color, medium_shades):
        return 'Medium'
    elif primary_color_in_list(primary_color, light_shades):
        return 'Light'
    else:
        return 'Other'



Applying filter to create column

In [None]:
dogs_returned = df[
    (df['Type'] == 'DOG') &
    (df['Outcome Type'].str.upper() == 'RETURN TO OWNER')
].copy()

# Create 'Primary Color' column
dogs_returned['Primary Color'] = dogs_returned['Color'].apply(get_primary_color)

# Create 'Primary Shade' column correctly
dogs_returned['Primary Shade'] = dogs_returned['Primary Color'].apply(categorize_shade)

# just sampling to check if things look about right
print(dogs_returned[['Name', 'Color', 'Primary Color', 'Primary Shade']].head(10))

## Data exploration and visualization

Exploring data from primary color column

In [None]:
# Plot the shade distribution
shade_counts = dogs_returned['Primary Shade'].value_counts()
shade_counts.plot(kind='bar')

plt.title('Distribution of Coat Shades for Dogs Returned to Owners')
plt.xlabel('Shade Category')
plt.ylabel('Number of Dogs')
plt.show()

Data above looks good(at least we can see the relative primary colors), but doesn't give us the full picture, maybe there's just more black dogs. Let's explore some more. Out of all dogs of a given shade, what's the proportion successfully returned to their owner?

In [None]:
# Calculate the total count and returned count per shade
shade_summary = dogs_df.groupby('Primary Shade')['Returned'].agg(['sum', 'count'])
shade_summary['Returned Proportion'] = shade_summary['sum'] / shade_summary['count']
shade_summary = shade_summary.sort_values('Returned Proportion')

#print(shade_summary)

#proportion of returned dogs by shade
plt.figure(figsize=(8,6))
plt.bar(shade_summary.index, shade_summary['Returned Proportion'])
plt.title('Proportion of Dogs Returned to Owner by Coat Shade')
plt.xlabel('Coat Shade')
plt.ylabel('Proportion Returned')
plt.show()

There seems to be a difference, but slight, is it enough to draw any conlusions? Do i need to clean the data better so we dont have 'Other'?

Other exploration ideas: Time in shelter - Dogs who got returned vs not returned:

In [None]:
dogs_df['Returned'] = (dogs_df['Outcome Type'].str.upper() == 'RETURN TO OWNER').astype(int)
#print(dogs_df[['Primary Shade', 'Returned']].head())

sns.violinplot(
    data=dogs_df,
    x='Returned',
    y='Days in Shelter',
    hue='Primary Shade',
    # split=True,
    inner='quartile',
    cut=0  # so the violin doesn't extend beyond actual data
)
plt.yscale('log')  # Compress large values
plt.title('Distribution of Days in Shelter (Log Scale)')
plt.xlabel('Returned Status (0 = Not Returned, 1 = Returned)')
plt.ylabel('Days in Shelter (log scale)')
plt.legend(title='Coat Shade', loc='upper right')
plt.tight_layout()
plt.show()

We're uncertain how relevant this is, needs further investigation and better plotting.

### Adoptet vs Found owner
Trying to improve or get a different angle from the above violin plot

In [None]:
# Filter dataset to include only dogs that were either adopted or returned to owner
dogs_subset = df[
    (df['Type'] == 'DOG') &
    (df['Outcome Type'].str.upper().isin(['ADOPTION', 'RETURN TO OWNER']))
].copy()

# 2. Standardize the Outcome Type to make data cleaner, was haivng issues
dogs_subset['Outcome Type'] = dogs_subset['Outcome Type'].str.upper()

#boxplot for Days in Shelter by Outcome Type
sns.boxplot(data=dogs_subset, x='Outcome Type', y='Days in Shelter')
plt.title('Comparing Days in Shelter: Adopted vs. Returned to Owner')
plt.xlabel('Outcome Type')
plt.ylabel('Days in Shelter')
plt.tight_layout()
plt.show()

We need to get a better fit for this graph, and while stil not ideal, the violin graph seems better than this one to represent days in shelter x Returned vs not returned

###Exploring colors - how long they take to get adopted

In [None]:
common_colors = dogs_df['Primary Color'].value_counts()
common_colors = common_colors[common_colors >= 30].index.tolist()
common_color_df = dogs_df[dogs_df['Primary Color'].isin(common_colors)]
filtered_df = common_color_df[common_color_df['Days in Shelter'] <= 60]

plt.figure(figsize=(12, 8))
ax = sns.boxplot(
    data=filtered_df,
    x='Primary Color',
    y='Days in Shelter',
    hue='Primary Color',
    palette='Set1'
)

plt.title('Distribution of Days in Shelter (< 100 Days) by Primary Color\n(Common Colors Only)')
plt.xlabel('Primary Color')
plt.ylabel('Days in Shelter')
plt.xticks(rotation=45)
plt.show()

It seems certain colors (silver, gold and yellow) are adopted faster than avarage whilst other take longer(tan, blue, buff). We might need to come p with a better way to represented colors.

In [None]:
df["Intake Date"] = pd.to_datetime(df["Intake Date"])
df["Outcome Date"] = pd.to_datetime(df["Outcome Date"])
df["length_of_stay"] = (df["Outcome Date"] - df["Intake Date"]).dt.days

df["Outcome Type"] = df["Outcome Type"].str.strip().str.lower()
df_return = df[df["Outcome Type"] == "return to owner"]

df_return["Type"] = df_return["Type"].str.strip().str.lower()
df_return = df_return[df_return["Type"].isin(["cat", "dog"])]

animal_types = df_return["Type"].unique()

fig, axs = plt.subplots(1, len(animal_types), figsize=(12, 6), sharex=True, sharey=True)
if len(animal_types) == 1:
    axs = [axs]

for i, animal in enumerate(animal_types):
    data = df_return[df_return["Type"] == animal]["length_of_stay"]
    data.plot.density(ax=axs[i], label=animal)
    axs[i].set_title(f"Density for {animal.capitalize()}", fontweight='bold')
    axs[i].set_xlabel("Length of Stay (days)")
    axs[i].set_ylabel("Density")
    axs[i].set_xlim(-10, 50)
    axs[i].tick_params(axis='y', labelleft=True, labelright=False)

plt.suptitle("Density of Length of Stay for Animals Returned to Owner by Type", fontweight='bold', fontsize=20)
plt.show()

The data suggest that the duration of an animal's stay is strongly associated with whether it is returned to its owner, implying that length of stay may be a useful predictor of this outcome.


In [None]:
df["Intake Date"] = pd.to_datetime(df["Intake Date"])
df["Outcome Date"] = pd.to_datetime(df["Outcome Date"])

df["length_of_stay"] = (df["Outcome Date"] - df["Intake Date"]).dt.days

df["Outcome Type"] = df["Outcome Type"].str.strip().str.lower()

df_return = df[df["Outcome Type"] == "return to owner"]

df_return["Type"] = df_return["Type"].str.strip().str.lower()
df_return = df_return[df_return["Type"].isin(["cat", "dog"])]
df_return = df_return[(df_return['Days in Shelter'] <= 60)]

plt.figure(figsize=(8, 6))
df_return.boxplot(column="length_of_stay", by="Type", grid=False)
plt.title("Length of Stay for Animals Returned to Owner by Animal Type", fontweight='bold', fontsize=10)
plt.suptitle("")
plt.xlabel("Animal Type")
plt.ylabel("Length of Stay (days)")
plt.show()

It seems that cat owners will give up on retrieving their animal faster than dog owners...

## Conclusions

For now we still need to explore more and improve the notebook. As it stands it's pretty messy but we just wanted to explore as much as we could first and see if we found anything of interest or significance rather than caring too much about form. As we hone down on our areas of interest we will make the data look better and have better descriptions and organization. Lastly exploring the effects of color might be more interesting(given our exploration) so we might pivot to focus more on that.