# Case Studies in Machine Learning Final Paper

## Topic: Predict animal length-of-stay at adoption center

## Paper Overview

1) Abstract
2) Introduction
3) Literature Review
4) Data Background and Description
5) Data Preliminary Analysis
6) Model Objective and Training
7) Results and Explanations
8) Conclusions

# Import Data from data.austintexas.gov
### AAC Outcomes: https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/about_data
### AAC Intakes: https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm/about_data

# Load the data and merge tables then remove duplicates

In [39]:
from datetime import date, datetime
import os
import pandas as pd
import numpy as np
# Create a date for november 11 2024
download_date = date(2024, 11, 5)

# format download_date into a string with YYYYMMDD format
download_date = download_date.strftime('%Y%m%d')

# Insert date string in YYYYMMDD format into the filename
outcomes_filename = os.path.join('data', f'Austin_Animal_Center_Outcomes_{download_date}.csv')
intakes_filename = os.path.join('data', f'Austin_Animal_Center_Intakes_{download_date}.csv')

df_outcomes = pd.read_csv(outcomes_filename)
df_intakes = pd.read_csv(intakes_filename)

# Join dataframes on Animal ID
pd.options.display.max_columns = 50
df_joined = pd.merge(df_intakes,df_outcomes,on=["Animal ID"],suffixes=('_intake','_outcome'))

# Drop duplicate columns
cols_intakes = df_intakes.columns
cols_outcomes = df_outcomes.columns

duplicate_prefixes = ["Name", "Animal Type", "Breed", "Color"]

for pref in duplicate_prefixes:
    if (df_joined[pref + "_intake"].dropna() == df_joined[pref + "_outcome"].dropna()).all():
        df_joined[pref] = df_joined[pref + "_intake"]
        df_joined = df_joined.drop(columns=[pref + "_intake", pref + "_outcome"])

display(df_joined.sort_values(by="DateTime_intake", ascending=False))


Unnamed: 0,Animal ID,DateTime_intake,MonthYear_intake,Found Location,Intake Type,Intake Condition,Sex upon Intake,Age upon Intake,DateTime_outcome,MonthYear_outcome,Date of Birth,Outcome Type,Outcome Subtype,Sex upon Outcome,Age upon Outcome,Name,Animal Type,Breed,Color
194515,A894556,12/31/2023 11:42:00 AM,December 2023,Austin (TX),Owner Surrender,Normal,Spayed Female,6 months,12/29/2023 05:27:00 PM,Dec 2023,06/29/2023,Adoption,,Spayed Female,6 months,Kanna,Dog,Shiba Inu/Chihuahua Shorthair,Brown
194516,A894556,12/31/2023 11:42:00 AM,December 2023,Austin (TX),Owner Surrender,Normal,Spayed Female,6 months,01/04/2024 11:26:00 AM,Jan 2024,06/29/2023,Adoption,,Spayed Female,6 months,Kanna,Dog,Shiba Inu/Chihuahua Shorthair,Brown
189617,A895551,12/31/2023 01:51:00 PM,December 2023,Austin (TX),Owner Surrender,Normal,Unknown,6 years,01/13/2024 11:42:00 AM,Jan 2024,12/31/2017,Transfer,Partner,Unknown,6 years,Tom,Cat,Domestic Shorthair,Blue Tabby/White
63331,A895552,12/31/2023 01:51:00 PM,December 2023,Austin (TX),Owner Surrender,Normal,Unknown,6 years,01/06/2024 12:06:00 PM,Jan 2024,12/31/2017,Transfer,Partner,Unknown,6 years,Milo,Cat,Domestic Shorthair,Blue Tabby/White
137841,A871795,12/31/2022 11:19:00 AM,December 2022,4404 Imperial Drive in Austin (TX),Stray,Normal,Intact Male,2 years,01/06/2023 12:31:00 PM,Jan 2023,12/31/2020,Transfer,Partner,Intact Male,2 years,A871795,Dog,Yorkshire Terrier,Brown/Black
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70943,A670065,01/01/2014 02:11:00 PM,January 2014,Applewood Dr in Austin (TX),Stray,Normal,Spayed Female,7 years,01/02/2014 11:52:00 AM,Jan 2014,01/02/2007,Return to Owner,,Spayed Female,7 years,Muneca,Dog,Australian Shepherd/Chow Chow,Red/White
83404,A670064,01/01/2014 01:57:00 PM,January 2014,Gunter St And Gonzalez in Austin (TX),Stray,Normal,Intact Female,3 months,01/03/2014 04:33:00 PM,Jan 2014,09/16/2013,Died,In Kennel,Intact Female,3 months,,Dog,Pit Bull/Pit Bull,Red
165275,A670061,01/01/2014 01:33:00 PM,January 2014,Austin (TX),Owner Surrender,Normal,Intact Male,2 years,01/24/2014 01:41:00 PM,Jan 2014,01/01/2012,Adoption,,Neutered Male,2 years,Koda,Dog,Chow Chow Mix,Red
42773,A670059,01/01/2014 01:31:00 PM,January 2014,11402 Robert Wooding in Austin (TX),Stray,Normal,Spayed Female,1 year,01/05/2014 02:37:00 PM,Jan 2014,01/01/2013,Adoption,,Spayed Female,1 year,,Dog,West Highland,White


# Feature engineering

In [40]:
# Calculate the duration between intake and outcome
df_joined["DateTime_outcome"] = pd.to_datetime(df_joined["DateTime_outcome"], format="%m/%d/%Y %I:%M:%S %p")
df_joined["DateTime_intake"] = pd.to_datetime(df_joined["DateTime_intake"], format="%m/%d/%Y %I:%M:%S %p")
df_joined["duration_in_shelter"] = df_joined["DateTime_outcome"] - df_joined["DateTime_intake"]

# Drop nans in some rows
df_joined = df_joined.dropna(axis=0,subset=["Sex upon Outcome","Sex upon Intake","Outcome Type","Age upon Outcome"])
df_joined = df_joined.loc[df_joined["Sex upon Intake"].str.contains("Male|Female")]

# Display Unique Values of categorical variables
categorical_columns = ["Intake Type", "Intake Condition", "Sex upon Intake", "Age upon Intake", "Outcome Type", "Outcome Subtype", "Sex upon Outcome", "Age upon Outcome", "Animal Type", "Breed"]

for col in categorical_columns:
    print(f"Unique values for {col}: {df_joined[col].unique()}")
    print("\n")

# Display the number of missing values in each column
print(df_joined.isna().sum())


display(df_joined.loc[~df_joined['Outcome Subtype'].isna()])

Unique values for Intake Type: ['Stray' 'Public Assist' 'Owner Surrender' 'Abandoned' 'Wildlife'
 'Euthanasia Request']


Unique values for Intake Condition: ['Normal' 'Sick' 'Injured' 'Pregnant' 'Neonatal' 'Nursing' 'Aged'
 'Unknown' 'Med Attn' 'Medical' 'Other' 'Feral' 'Behavior' 'Med Urgent'
 'Parvo' 'Space' 'Agonal' 'Neurologic' 'Panleuk' 'Congenital']


Unique values for Sex upon Intake: ['Neutered Male' 'Spayed Female' 'Intact Male' 'Intact Female']


Unique values for Age upon Intake: ['2 years' '8 years' '11 months' '4 weeks' '4 years' '6 years' '6 months'
 '5 months' '1 month' '14 years' '2 weeks' '1 week' '2 months' '18 years'
 '9 years' '4 months' '1 day' '1 year' '3 years' '5 years' '15 years'
 '8 months' '6 days' '7 years' '3 months' '12 years' '3 weeks' '9 months'
 '10 years' '10 months' '7 months' '0 years' '1 weeks' '5 days' '17 years'
 '11 years' '4 days' '2 days' '3 days' '13 years' '5 weeks' '16 years'
 '19 years' '20 years' '-1 years' '-3 years' '-4 years' '22 years

Unnamed: 0,Animal ID,DateTime_intake,MonthYear_intake,Found Location,Intake Type,Intake Condition,Sex upon Intake,Age upon Intake,DateTime_outcome,MonthYear_outcome,Date of Birth,Outcome Type,Outcome Subtype,Sex upon Outcome,Age upon Outcome,Name,Animal Type,Breed,Color,duration_in_shelter
0,A786884,2019-01-03 16:19:00,January 2019,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Neutered Male,2 years,2019-01-08 15:11:00,Jan 2019,01/03/2017,Transfer,Partner,Neutered Male,2 years,*Brock,Dog,Beagle Mix,Tricolor,4 days 22:52:00
3,A665644,2013-10-21 07:59:00,October 2013,Austin (TX),Stray,Sick,Intact Female,4 weeks,2013-10-21 11:39:00,Oct 2013,09/21/2013,Transfer,Partner,Intact Female,4 weeks,,Cat,Domestic Shorthair Mix,Calico,0 days 03:40:00
4,A857105,2022-05-12 00:23:00,May 2022,4404 Sarasota Drive in Austin (TX),Public Assist,Normal,Neutered Male,2 years,2022-05-12 14:35:00,May 2022,05/12/2020,Transfer,Partner,Neutered Male,2 years,Johnny Ringo,Cat,Domestic Shorthair,Orange Tabby,0 days 14:12:00
10,A818975,2020-06-18 14:53:00,June 2020,Braker Lane And Metric in Travis (TX),Stray,Normal,Intact Male,4 weeks,2020-07-23 15:54:00,Jul 2020,05/19/2020,Adoption,Foster,Neutered Male,2 months,,Cat,Domestic Shorthair,Cream Tabby,35 days 01:01:00
11,A774147,2018-06-11 07:45:00,June 2018,6600 Elm Creek in Austin (TX),Stray,Injured,Intact Female,4 weeks,2018-06-11 00:00:00,Jun 2018,05/10/2018,Transfer,Partner,Intact Female,4 weeks,,Cat,Domestic Shorthair Mix,Black/White,-1 days +16:15:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
214826,A898037,2024-11-02 21:32:00,November 2024,2808 Sauls Dr in Travis (TX),Stray,Normal,Neutered Male,2 years,2024-11-02 22:23:00,Nov 2024,02/10/2022,Return to Owner,Field,Neutered Male,2 years,Max,Dog,German Shepherd/Border Collie,Black/White,0 days 00:51:00
214829,A917001,2024-11-02 11:35:00,November 2024,Outside Jurisdiction,Owner Surrender,Normal,Intact Male,6 months,2024-11-04 13:07:00,Nov 2024,05/02/2024,Transfer,Partner,Intact Male,6 months,Cheetoh,Cat,Siamese,Flame Point,2 days 01:32:00
214832,A917091,2024-11-04 08:53:00,November 2024,8305 Garcreek Cir in Austin (TX),Stray,Injured,Intact Female,3 months,2024-11-04 10:00:00,Nov 2024,08/04/2024,Euthanasia,Suffering,Intact Female,3 months,,Cat,Domestic Shorthair,Tortie,0 days 01:07:00
214843,A869055,2024-10-31 11:36:00,October 2024,9323 Menchaca Road in Austin (TX),Stray,Medical,Neutered Male,2 years,2022-11-17 12:14:00,Nov 2022,06/12/2022,Transfer,Partner,Neutered Male,5 months,Peanut Butter Cup,Dog,Pit Bull,Brown,-714 days +00:38:00


In [41]:

### 1) Convert Age upon intake from string to numeric, parse the strings of the form X (years/months) 
age_strings = df_joined['Age upon Intake'].str.split(' ', expand=True)
age_strings.columns = ['age', 'unit']
age_strings['age'] = pd.to_numeric(age_strings['age'])
age_strings['unit'] = age_strings['unit'].str.replace('s', '')

# Convert to years
age_strings.loc[age_strings['unit'] == 'month', 'age'] /= 12
age_strings.loc[age_strings['unit'] == 'week', 'age'] /= 52
age_strings.loc[age_strings['unit'] == 'day', 'age'] /= 365

df_joined['age_upon_intake_years'] = age_strings['age']

### 2) Convert NaN Names to Unknown name or stay as nan
df_joined['Name'] = df_joined['Name'].fillna('Unknown')

### 3) Create binary indicator flags for fixed/intact, male/female
df_joined['fixed Intake'] = df_joined['Sex upon Intake'].str.contains('Neutered|Spayed')
df_joined['fixed Outcome'] = df_joined['Sex upon Outcome'].str.contains('Neutered|Spayed')
df_joined['fixed'] = df_joined['fixed Intake'] | df_joined['fixed Outcome']
df_joined = df_joined.drop(['fixed Intake', 'fixed Outcome'], axis=1)

# 1 means female, 0 means male
df_joined['sex'] = df_joined['Sex upon Intake'].str.contains('Female')

# Figure out how many durations are negative and drop from dataframe, unfortunately this is a data quality issue
df_joined = df_joined.drop(df_joined.loc[df_joined['duration_in_shelter'] < pd.Timedelta(0)].index)


  age_strings.loc[age_strings['unit'] == 'month', 'age'] /= 12


In [None]:
from geopy.geocoders import Nominatim

# Convert Found location to coordinates with geopy package

# Use bag of words model to convert breed to a feature vector

# Use