# **CS 363M Final Project Spring 2025**

## Chenyi Wang, Bhuvan Kannaeganti, Suyog Valsangkar

### **Overview**

For the project in this class, you will participate in a machine learning competition where you’ll apply your ML skills to a real-world dataset. You may work individually or in teams of up to 3 students. 

The dataset for this competition comes from the Austin Animal Center, the largest no-kill animal shelter in the United States. It contains historical records of animals that have entered the shelter, including details such as species, breed, age, intake type, medical condition, and other attributes. Each animal in the dataset has a recorded outcome, which represents what eventually happened to the animal after entering the shelter.

Your goal in this competition is to build a machine learning model that predicts the final outcome of each animal admitted to the shelter, based on its intake characteristics. The possible outcomes are:

*Adopted* – The animal was placed into a new home.<br>
*Return to Owner* – The animal was reclaimed by its original owner.<br>
*Euthanasia* – The animal was humanely euthanized due to medical or behavioral concerns.<br>
*Died* – The animal passed away while in the shelter’s care.<br>
*Transfer* – The animal was moved to another shelter or rescue organization.<br>

By accurately predicting these outcomes, your model can help identify factors that influence an animal's journey through the shelter system and potentially aid in improving adoption and survival rates, shelter policies, or allocation of resources.


## **Code and Analysis Below:**

1. We need to go through the dataset and examine the existing features for patterns and methods we can feature engineer our data to enhance our final predictions. 

In [32]:
import pandas as pd

animal_data = pd.read_csv('train.csv')
animal_data.sample(20) # sample some data

Unnamed: 0,Id,Name,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,Outcome Time,Date of Birth,Outcome Type
44852,A877222,,03/24/2023 03:39:00 PM,W Cesar Chavez St in Austin (TX),Stray,Normal,Dog,Intact Female,4 months,Catahoula,Brown Merle,03/24/2023 06:57:00 PM,10/24/2022,Transfer
15638,A819257,,06/24/2020 04:11:00 PM,Shoal Creek in Austin (TX),Stray,Normal,Cat,Intact Male,1 month,Domestic Medium Hair Mix,Orange Tabby,07/30/2020 04:31:00 PM,04/28/2020,Adoption
47770,A769095,Fezzik,03/30/2018 12:27:00 PM,Oak Hill Road And South Interstate Highway 35 ...,Stray,Normal,Dog,Intact Male,2 years,German Shepherd,Brown/Black,04/28/2018 02:45:00 PM,03/30/2016,Adoption
54499,A811073,,12/27/2019 02:05:00 PM,806 Sylvia Lane in Travis (TX),Stray,Normal,Dog,Intact Male,1 month,Australian Cattle Dog,Blue Tick/Tan,12/31/2019 04:47:00 PM,10/27/2019,Adoption
94032,A903922,406 Grams,05/04/2024 04:03:00 PM,9024 Northgate Blvd in Austin (TX),Stray,Normal,Cat,Intact Female,4 weeks,Domestic Shorthair,Brown Tabby,05/04/2024 05:51:00 PM,04/04/2024,Transfer
95835,A710980,,08/31/2015 12:44:00 PM,4905 Misty Slope Ln in Austin (TX),Stray,Normal,Cat,Intact Male,4 months,Domestic Shorthair Mix,Brown Tabby,09/01/2015 09:00:00 AM,05/01/2015,Transfer
53887,A691207,*Juno,10/31/2014 03:32:00 PM,7007 Apperson St in Del Valle (TX),Stray,Normal,Dog,Intact Female,2 years,Pit Bull Mix,Brown/White,12/05/2014 05:13:00 PM,10/31/2012,Transfer
74746,A802752,Ace,08/22/2019 11:50:00 AM,8800 Capital Drive in Austin (TX),Stray,Normal,Dog,Intact Male,3 years,Shih Tzu,Black/White,08/26/2019 01:17:00 PM,08/22/2016,Transfer
21626,A705006,Nuckles,02/10/2016 02:33:00 PM,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,5 years,Beagle Mix,Red,02/14/2016 05:16:00 PM,06/11/2010,Adoption
39105,A680322,*Alice,06/01/2014 03:01:00 PM,7315 Providence in Austin (TX),Stray,Normal,Cat,Intact Female,2 years,Domestic Shorthair Mix,Blue Tabby/White,06/18/2014 10:00:00 AM,06/01/2012,Transfer


2. **Data Cleaning**: After observing the initial dataset, I see that we need to perform data cleaning. More specifically, there are missing records within the name column that we can just fill with "Unknown" to prevent further issues. Additionally, we need to separate the sex upon intake feature as it currently also includes sterilization health status. We also need to fix the age upon intake column as it is using all sorts of time units.

In [33]:
# fill missing names with "Unknown"
animal_data['Name'] = animal_data['Name'].fillna('Unknown')

# separate age at intake and reproductive status, create new column
split_cols = animal_data['Sex upon Intake'].str.split(' ', n=1, expand=True)

# handle cases where the split returns "Unknown"
animal_data['Sterilization Status'] = split_cols[0].map({
    'Neutered': 'Yes',
    'Spayed': 'Yes',
    'Intact': 'No',
    'Unknown': 'Unknown'
}).fillna('Unknown')  # in case of unexpected values

# assign gender
animal_data['Sex'] = split_cols[1].where(split_cols[1].notna(), 'Unknown')
animal_data.drop("Sex upon Intake", axis=1, inplace=True)

# helper to convert age upon intake to years
def age_to_years(age_str):
    if pd.isna(age_str):
        return None
    
    number, unit = age_str.split()
    number = float(number)
    
    if "year" in unit:
        return number
    elif "month" in unit:
        return number / 12
    elif "week" in unit:
        return number / 52
    elif "day" in unit:
        return number / 365
    else:
        return None  # in case of an unexpected format

# convert age to years
animal_data['Age'] = animal_data['Age upon Intake'].apply(age_to_years)
animal_data.drop('Age upon Intake', axis=1, inplace=True)

animal_data.sample(20)

Unnamed: 0,Id,Name,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Breed,Color,Outcome Time,Date of Birth,Outcome Type,Sterilization Status,Sex,Age
2175,A755179,Luna,07/30/2017 12:35:00 PM,San Felipe And Los Indios in Austin (TX),Stray,Normal,Dog,Chinese Sharpei Mix,Cream,08/01/2017 12:27:00 PM,07/30/2016,Return to Owner,No,Female,1.0
18834,A870371,Nala,12/05/2022 05:33:00 PM,5600 Brougham Way in Austin (TX),Stray,Normal,Cat,Domestic Shorthair,Blue/White,12/09/2022 03:53:00 PM,09/20/2022,Adoption,No,Female,0.166667
97589,A538585,Athena,04/16/2014 11:39:00 AM,14967 Swiss Dr in Travis (TX),Stray,Injured,Dog,Pit Bull Mix,Red/White,06/18/2014 01:04:00 PM,10/15/2008,Transfer,Yes,Female,5.0
104225,A845177,Gus,10/26/2021 04:54:00 PM,Travis (TX),Owner Surrender,Normal,Dog,Australian Cattle Dog Mix,Tricolor,12/16/2021 04:32:00 PM,10/27/2019,Transfer,No,Male,2.0
24196,A758613,Unknown,09/19/2017 12:38:00 PM,9225 South 183 in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Orange Tabby,09/19/2017 06:36:00 PM,09/12/2017,Transfer,No,Male,0.019231
9264,A757739,Nashville,09/06/2017 02:37:00 PM,Outside Jurisdiction,Public Assist,Normal,Cat,Domestic Shorthair,Brown Tabby,09/15/2017 05:26:00 PM,09/06/2016,Return to Owner,Yes,Female,1.0
94998,A821237,*Apollo,08/06/2020 05:48:00 PM,6503 Bridgewater Cove in Austin (TX),Stray,Normal,Cat,Domestic Shorthair,Orange Tabby/White,08/31/2020 06:21:00 PM,05/06/2020,Adoption,No,Male,0.25
50663,A775966,Sunny,07/06/2018 06:09:00 PM,Steck And Anderson Lane in Austin (TX),Stray,Normal,Dog,Dachshund Mix,Red/White,02/02/2019 03:42:00 PM,05/06/2018,Adoption,No,Male,0.083333
45248,A881360,Unknown,05/21/2023 01:02:00 PM,7201 Levander Loop in Austin (TX),Stray,Normal,Dog,Chihuahua Shorthair,Tricolor,05/21/2023 01:04:00 PM,05/17/2023,Transfer,Unknown,Unknown,0.010959
83583,A883349,*Reggie,06/16/2023 10:04:00 AM,6333 Florencia Ln in Austin (TX),Stray,Sick,Dog,Pit Bull Mix,Black/White,07/19/2023 02:36:00 PM,03/16/2023,Adoption,No,Male,0.25


3. **Feature Engineering**: After sampling the data, I observe some features that don't necessary have any correlation to their outcome. For example, ID is a dimension that the record uses to query their table, something that we do not need in our case. I further was debating whether the name feature would be relevant to the outcome, I am going to keep for now. I think there are also other features that we can reduce together. For example, we can easily combine intake time and outcome time into a single record of length of stay. This reduces dimensionality without losing much information (unless Austin Animal Center has had significant changes throughout the years).

In [34]:
animal_data.drop('Id', axis=1, inplace=True) # drop Id column, don't think we need that and raises dimensionality

animal_data.sample(20)

Unnamed: 0,Name,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Breed,Color,Outcome Time,Date of Birth,Outcome Type,Sterilization Status,Sex,Age
105838,Unknown,05/10/2014 04:14:00 PM,407 Cooper Dr in Austin (TX),Stray,Normal,Dog,Chihuahua Shorthair Mix,Buff,05/24/2014 03:31:00 PM,05/10/2012,Transfer,No,Male,2.0
935,Unknown,05/22/2016 12:19:00 PM,12345 Lamplight Village in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Black,05/26/2016 07:04:00 PM,10/22/2015,Transfer,No,Male,0.583333
67605,Aladdin,12/18/2019 10:48:00 AM,Austin (TX),Stray,Normal,Dog,Labrador Retriever,Black/White,12/18/2019 07:04:00 PM,07/03/2019,Return to Owner,No,Male,0.416667
75128,*Lil Pistol,03/14/2023 11:19:00 AM,Mary Searight Park in Travis (TX),Stray,Normal,Dog,Chihuahua Shorthair Mix,Tan/White,04/14/2023 06:54:00 PM,03/14/2021,Adoption,No,Male,2.0
30332,Unknown,06/13/2015 12:44:00 PM,6314 Libyan Drive in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Brown Tabby/White,06/14/2015 09:00:00 AM,06/13/2013,Transfer,No,Male,2.0
17805,Faith Hill,12/07/2015 03:22:00 PM,Flournoy Dr And Idlewood Cv in Austin (TX),Stray,Normal,Dog,Cairn Terrier Mix,White/Tan,12/09/2015 12:48:00 PM,12/07/2010,Return to Owner,Yes,Female,5.0
77233,Cujo,06/10/2021 11:40:00 AM,Springdale in Austin (TX),Stray,Normal,Dog,Rhod Ridgeback Mix,Brown/White,06/26/2021 02:34:00 PM,12/19/2020,Return to Owner,No,Male,0.416667
82106,107 Grams,03/16/2024 10:33:00 AM,1912 E William Cannon Dr in Austin (TX),Stray,Neonatal,Cat,Domestic Shorthair,Blue Tabby,03/16/2024 02:00:00 PM,03/15/2024,Transfer,No,Male,0.00274
36614,Moxy,09/23/2017 04:55:00 PM,2101 Speedway in Austin (TX),Stray,Normal,Dog,Labrador Retriever Mix,Black,09/24/2017 12:22:00 PM,09/24/2014,Return to Owner,Yes,Female,3.0
87813,Moose,01/13/2022 01:55:00 PM,4550 Mueller in Austin (TX),Stray,Normal,Dog,German Shepherd,Black/Brown,07/10/2023 02:16:00 PM,01/13/2020,Transfer,No,Male,2.0
