# **CS 363M Final Project Spring 2025**

## Chenyi Wang, Bhuvan Kannaeganti, Suyog Valsangkar

### **Overview**

For the project in this class, you will participate in a machine learning competition where you’ll apply your ML skills to a real-world dataset. You may work individually or in teams of up to 3 students. 

The dataset for this competition comes from the Austin Animal Center, the largest no-kill animal shelter in the United States. It contains historical records of animals that have entered the shelter, including details such as species, breed, age, intake type, medical condition, and other attributes. Each animal in the dataset has a recorded outcome, which represents what eventually happened to the animal after entering the shelter.

Your goal in this competition is to build a machine learning model that predicts the final outcome of each animal admitted to the shelter, based on its intake characteristics. The possible outcomes are:

**- Adopted**: The animal was placed into a new home.<br>
**- Return to Owner**: The animal was reclaimed by its original owner.<br>
**- Euthanasia**: The animal was humanely euthanized due to medical or behavioral concerns.<br>
**- Died**: The animal passed away while in the shelter’s care.<br>
**- Transfer**: The animal was moved to another shelter or rescue organization.<br>

By accurately predicting these outcomes, your model can help identify factors that influence an animal's journey through the shelter system and potentially aid in improving adoption and survival rates, shelter policies, or allocation of resources.


## **Code and Analysis Below:**

1. We need to go through the dataset and examine the existing features for patterns and methods we can feature engineer our data to enhance our final predictions. 

In [321]:
import pandas as pd

animal_data = pd.read_csv('train.csv')
animal_data.sample(20) # sample some data

Unnamed: 0,Id,Name,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,Outcome Time,Date of Birth,Outcome Type
36925,A722248,*Luli,03/13/2016 02:18:00 PM,Airport Blvd And Mlk Jr Blvd in Austin (TX),Stray,Normal,Dog,Intact Female,2 months,Labrador Retriever Mix,Gold/White,03/18/2016 12:03:00 PM,12/28/2015,Adoption
102883,A726535,*Tianya,05/11/2016 05:36:00 PM,10510 Wendts Way in Austin (TX),Stray,Normal,Cat,Intact Female,1 month,Domestic Shorthair Mix,Tortie,05/15/2016 05:48:00 PM,03/16/2016,Adoption
73991,A800405,,07/20/2019 01:47:00 PM,15310 Fagerquist Road in Travis (TX),Stray,Normal,Dog,Intact Female,1 month,German Shepherd,Black/White,07/24/2019 04:40:00 PM,05/29/2019,Adoption
32921,A668759,Ziggy,12/08/2013 08:57:00 PM,12034 Research Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,11 years,Golden Retriever,Gold,12/09/2013 07:14:00 PM,12/09/2002,Return to Owner
108922,A769014,,03/28/2018 06:08:00 PM,1044 Norwood Park Boulevard in Austin (TX),Stray,Normal,Dog,Intact Male,1 month,Pit Bull Mix,Black/White,04/02/2018 07:05:00 PM,01/28/2018,Adoption
7851,A733050,,08/16/2016 04:33:00 PM,14300 Tandem Blvd in Travis (TX),Public Assist,Normal,Dog,Intact Female,4 months,Chihuahua Shorthair Mix,Brown/White,08/21/2016 05:41:00 PM,04/01/2016,Adoption
4096,A753037,*Zooey,08/17/2017 01:09:00 PM,Travis (TX),Owner Surrender,Normal,Cat,Spayed Female,2 months,Siamese Mix,Lynx Point,08/16/2017 07:13:00 AM,05/30/2017,Adoption
29838,A735388,*Flint,09/23/2016 12:00:00 PM,Austin (TX),Owner Surrender,Normal,Cat,Intact Male,6 months,Domestic Shorthair Mix,Brown Tabby,11/14/2016 02:45:00 PM,03/23/2016,Transfer
42363,A768020,Nacho,03/11/2018 02:04:00 PM,Travis (TX),Owner Surrender,Normal,Dog,Neutered Male,2 months,German Shepherd Mix,Brown/Black,03/24/2018 12:09:00 PM,12/30/2017,Adoption
72405,A745994,,03/27/2017 04:27:00 PM,Austin (TX),Owner Surrender,Normal,Cat,Intact Female,1 month,Domestic Shorthair Mix,Torbie,03/29/2017 05:11:00 PM,01/27/2017,Adoption


2. **Data Cleaning**: After observing the initial dataset, I see that we need to perform data cleaning. We need to separate the sex upon intake feature as it currently also includes sterilization health status. We also need to fix the age upon intake column as it is using all sorts of time units. The name feature also does not exist within the test set, we need to be careful and only train the model on features that exist within the test set. This also applies to Id, which does not correlate to the Id within the test set. Therefore, we can also eliminate name and id before training the model.

In [322]:
# drop name feature entirely
animal_data.drop("Name", axis=1, inplace=True)

# drop id feature entirely
animal_data.drop("Id", axis=1, inplace=True)

# drop outcome time
animal_data.drop("Outcome Time", axis=1, inplace=True)

# drop birth date
animal_data.drop("Date of Birth", axis=1, inplace=True)

# separate age at intake and reproductive status, create new column
split_cols = animal_data['Sex upon Intake'].str.split(' ', n=1, expand=True)

# handle cases where the split returns "Unknown"
animal_data['Sterilized'] = split_cols[0].map({
    'Neutered': 'True',
    'Spayed': 'True',
    'Intact': 'False',
    'Unknown': 'False'
}).fillna('False')  # unexpected values, let's assume not sterilized

# assign gender, we can keep it a binary by make the feature "Male", where male = true and female = false
animal_data['Male'] = split_cols[1].apply(lambda x: False if x == 'Female' else True)
animal_data.drop("Sex upon Intake", axis=1, inplace=True)

# helper to convert age upon intake to years
def age_to_years(age_str):
    if pd.isna(age_str):
        return None
    
    number, unit = age_str.split()
    number = float(number)
    
    if "year" in unit:
        return number
    elif "month" in unit:
        return number / 12
    elif "week" in unit:
        return number / 52
    elif "day" in unit:
        return number / 365
    else:
        return None  # in case of an unexpected format

# convert age to years
animal_data['Age'] = animal_data['Age upon Intake'].apply(age_to_years)
animal_data.drop('Age upon Intake', axis=1, inplace=True)

pd.set_option('display.max_columns', None)
animal_data.sample(20) # sample some data after transformations

Unnamed: 0,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Breed,Color,Outcome Type,Sterilized,Male,Age
4415,07/31/2020 01:27:00 PM,5017 W 290 Highway in Austin (TX),Stray,Injured,Cat,Domestic Medium Hair,Cream Tabby,Transfer,False,True,0.076923
97458,04/10/2017 08:09:00 AM,7108 Northeast in Austin (TX),Stray,Nursing,Cat,Domestic Shorthair Mix,Black/White,Transfer,False,True,0.076923
70077,11/14/2019 11:02:00 AM,208 Dry Creek Road in Travis (TX),Stray,Normal,Dog,German Shepherd,Black/White,Transfer,False,False,0.083333
107924,10/29/2015 09:54:00 AM,10505 Macmora Rd in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Orange Tabby/White,Adoption,False,True,0.166667
54249,05/11/2017 01:21:00 PM,South Congress Ave & William Cannon Dr in Aust...,Stray,Normal,Cat,Domestic Shorthair Mix,Calico,Adoption,False,False,0.083333
68773,07/15/2014 03:34:00 PM,1001 Lipan Trl in Austin (TX),Stray,Normal,Dog,Labrador Retriever/Pointer,Black/White,Adoption,False,True,0.083333
17154,01/24/2023 04:47:00 PM,N Fm 620 And Rock Harbor in Austin (TX),Stray,Normal,Cat,Domestic Shorthair,Torbie/White,Adoption,False,False,2.0
69898,06/19/2015 12:33:00 PM,3108 Windsor Rd in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Blue Tabby,Adoption,False,True,0.166667
1704,02/04/2018 06:20:00 PM,1128 Saucedo in Austin (TX),Stray,Sick,Cat,Domestic Medium Hair Mix,Black/White,Return to Owner,True,True,12.0
85413,01/02/2024 11:25:00 PM,9124 Fainwood Ln in Austin (TX),Public Assist,Normal,Cat,Domestic Shorthair,Black/Black,Return to Owner,True,False,7.0


3. **Feature Engineering**: We need to one hot encode the categorical features. However, there are categoricals that have many labels, some with very little data points. We can study the trend of their similarities to each other and group to reduce dimensionality.

Let's look at Intake Condition and see all the possible labels for that class.

In [323]:
# Count the occurrences of each intake condition label
intake_condition_counts = animal_data['Intake Condition'].value_counts()

# Print the counts for each intake condition
for condition, count in intake_condition_counts.items():
    print(f"{condition}: {count}")

Normal: 95010
Injured: 6394
Sick: 4295
Nursing: 2957
Neonatal: 1240
Aged: 373
Medical: 298
Other: 247
Pregnant: 111
Feral: 104
Med Attn: 48
Behavior: 42
Unknown: 12
Neurologic: 10
Med Urgent: 7
Parvo: 5
Space: 2
Agonal: 1
Congenital: 1


We see above that there are 19 different categorical values for the Intake Condition feature, we can merge some of these rarer classification together.

Let's explore the outcome percentages of each intake condition so we can group these conditions better and reduce the amount of labels we need to one hot encode (increases dimensionality).

In [324]:
# Iterate through each unique intake condition and calculate outcome percentages
intake_conditions = animal_data['Intake Condition'].unique()

for condition in intake_conditions:
    condition_data = animal_data[animal_data['Intake Condition'] == condition]
    total_count = len(condition_data)
    if total_count > 0:
        outcome_percentages = condition_data['Outcome Type'].value_counts(normalize=True) * 100
        print(f"Intake Condition: {condition}")
        print(outcome_percentages)
        print("-" * 50)
    else:
        print(f"Intake Condition: {condition} has no entries.")
        print("-" * 50)

Intake Condition: Normal
Outcome Type
Adoption           52.596569
Transfer           29.360067
Return to Owner    15.985686
Euthanasia          1.513525
Died                0.544153
Name: proportion, dtype: float64
--------------------------------------------------
Intake Condition: Injured
Outcome Type
Adoption           34.813888
Transfer           30.669378
Euthanasia         18.720676
Return to Owner    12.605568
Died                3.190491
Name: proportion, dtype: float64
--------------------------------------------------
Intake Condition: Pregnant
Outcome Type
Transfer           54.954955
Adoption           39.639640
Return to Owner     4.504505
Died                0.900901
Name: proportion, dtype: float64
--------------------------------------------------
Intake Condition: Neonatal
Outcome Type
Transfer           69.919355
Adoption           25.564516
Died                3.064516
Return to Owner     0.967742
Euthanasia          0.483871
Name: proportion, dtype: float64
-------

Groupings based on similar outcome percentages

In [325]:
'''
   [Group]	       [Categories]
    Normal	        Normal
    Med_Minor	    Injured, Medical (more similar outcome adopt %)
    Nursing 	    Nursing, Neonatal
    Med_Major       Med Attn, Med Urgent, Neurologic, Pregnant, Sick (more similar outcome transfer %)
    Behavioral	    Feral, Behavior
    Critical    	Agonal, Aged, Congenital, Parvo, Space, Other, Unknown
'''

'\n   [Group]\t       [Categories]\n    Normal\t        Normal\n    Med_Minor\t    Injured, Medical (more similar outcome adopt %)\n    Nursing \t    Nursing, Neonatal\n    Med_Major       Med Attn, Med Urgent, Neurologic, Pregnant, Sick (more similar outcome transfer %)\n    Behavioral\t    Feral, Behavior\n    Critical    \tAgonal, Aged, Congenital, Parvo, Space, Other, Unknown\n'

We see that our merged class labels have relatively similar outcomes percentage-wise. This will reduce dimensionality and improve our training whilst not losing the originality of the groupings.

In [326]:
# intake condition into grouped categories
condition_map = {
    'Normal': 'Normal',
    'Injured': 'Med_Minor',
    'Sick': 'Med_Major',
    'Nursing': 'Nursing',
    'Neonatal': 'Nursing',
    'Med Attn': 'Med_Major',
    'Med Urgent': 'Med_Major',
    'Medical': 'Med_Minor',
    'Neurologic': 'Med_Major',
    'Pregnant': 'Med_Major',
    'Feral': 'Behavioral',
    'Behavior': 'Behavioral',
}

# map with a fallback to 'Rare'
animal_data['Condition'] = animal_data['Intake Condition'].map(condition_map).fillna('Rare')
animal_data = pd.get_dummies(animal_data, columns=['Condition'])

animal_data.sample(20)  # sample some data after transformation

Unnamed: 0,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Breed,Color,Outcome Type,Sterilized,Male,Age,Condition_Behavioral,Condition_Med_Major,Condition_Med_Minor,Condition_Normal,Condition_Nursing,Condition_Rare
27960,03/01/2014 05:19:00 PM,3300 Kingsworth in Travis (TX),Stray,Normal,Dog,Pit Bull Mix,Chocolate/White,Adoption,False,True,2.0,False,False,False,True,False,False
47891,10/07/2014 11:01:00 AM,Fair Oaks Dr & Buffalo Pass in Austin (TX),Stray,Normal,Cat,Domestic Longhair Mix,Black Smoke,Adoption,False,True,2.0,False,False,False,True,False,False
47864,08/20/2019 01:38:00 PM,1422 Salem Meadow Circle in Austin (TX),Stray,Normal,Cat,Domestic Shorthair,Orange Tabby,Transfer,False,False,1.0,False,False,False,True,False,False
46714,06/21/2014 03:32:00 PM,3811 Leafield Dr in Austin (TX),Stray,Normal,Dog,Boxer Mix,Brown/White,Transfer,False,True,3.0,False,False,False,True,False,False
76485,03/01/2023 12:08:00 PM,Maha in Travis (TX),Stray,Normal,Cat,Domestic Shorthair,Brown Tabby,Adoption,False,True,4.0,False,False,False,True,False,False
60302,04/23/2022 04:40:00 PM,Ih35 in Austin (TX),Stray,Normal,Dog,Great Pyrenees Mix,Tan/Black,Adoption,False,False,4.0,False,False,False,True,False,False
6771,09/08/2021 01:07:00 PM,William Cannon And Stoneleigh in Austin (TX),Stray,Neonatal,Cat,Domestic Shorthair,Blue Tabby,Transfer,False,True,0.057692,False,False,False,False,True,False
58812,12/03/2013 10:15:00 AM,6818 Hergotz Ln in Austin (TX),Stray,Normal,Dog,Pit Bull,White,Return to Owner,True,True,5.0,False,False,False,True,False,False
15814,03/08/2017 09:41:00 AM,4510 B Est Way in Austin (TX),Stray,Normal,Dog,Labrador Retriever Mix,Brown,Transfer,False,False,1.0,False,False,False,True,False,False
103446,03/22/2021 03:31:00 PM,5900 South Pleasant Valley Road in Austin (TX),Stray,Normal,Dog,Labrador Retriever/Chinese Sharpei,Brown/White,Adoption,False,False,0.333333,False,False,False,True,False,False


Let's do the same for Intake Type and check the outcome percentages of each condition.

In [327]:
# observe outcome percentages for each intake type so we can group these categoricals, reduces dimensionality
intake_types = animal_data['Intake Type'].unique()

for intake_type in intake_types:
    type_data = animal_data[animal_data['Intake Type'] == intake_type]
    total_count = len(type_data)
    if total_count > 0:
        outcome_percentages = type_data['Outcome Type'].value_counts(normalize=True) * 100
        print(f"Intake Type: {intake_type}")
        print(outcome_percentages)
        print("-" * 50)
    else:
        print(f"Intake Type: {intake_type} has no entries.")
        print("-" * 50)

Intake Type: Stray
Outcome Type
Adoption           47.774546
Transfer           34.219783
Return to Owner    13.905286
Euthanasia          3.056120
Died                1.044266
Name: proportion, dtype: float64
--------------------------------------------------
Intake Type: Public Assist
Outcome Type
Return to Owner    63.545914
Adoption           18.528788
Transfer           14.222802
Euthanasia          3.262111
Died                0.440385
Name: proportion, dtype: float64
--------------------------------------------------
Intake Type: Owner Surrender
Outcome Type
Adoption           64.671050
Transfer           26.502541
Return to Owner     5.399357
Euthanasia          2.736980
Died                0.690073
Name: proportion, dtype: float64
--------------------------------------------------
Intake Type: Abandoned
Outcome Type
Adoption           63.017032
Transfer           26.845093
Return to Owner     9.083536
Euthanasia          0.648824
Died                0.405515
Name: proportion, 

These would be the groupings that are most similar to each other based on the outcome percentages.

In [328]:
'''
    [Group]	         [Categories]
     Public Assist    Public Assist
     Stray            Stray, Wildlife
     Owner-Initiated  Abandoned, Owner Surrender
     Euthanasia       Euthanasia Request
'''

'\n    [Group]\t         [Categories]\n     Public Assist    Public Assist\n     Stray            Stray, Wildlife\n     Owner-Initiated  Abandoned, Owner Surrender\n     Euthanasia       Euthanasia Request\n'

In [329]:
# Define the mapping for grouping intake types
intake_type_map = {
    'Stray': 'Stray',
    'Public Assist': 'Public Assist',
    'Wildlife': 'Stray',
    'Abandoned': 'Owner Initiated',
    'Owner Surrender': 'Owner Initiated',
    'Euthanasia Request': 'Euthanasia'
}

# Map the intake types to their respective groups
animal_data['Intake Group'] = animal_data['Intake Type'].map(intake_type_map).fillna('Other')
animal_data = pd.get_dummies(animal_data, columns=['Intake Group'])

animal_data.sample(20)  # sample some data after transformation


Unnamed: 0,Intake Time,Found Location,Intake Type,Intake Condition,Animal Type,Breed,Color,Outcome Type,Sterilized,Male,Age,Condition_Behavioral,Condition_Med_Major,Condition_Med_Minor,Condition_Normal,Condition_Nursing,Condition_Rare,Intake Group_Euthanasia,Intake Group_Owner Initiated,Intake Group_Public Assist,Intake Group_Stray
26945,09/02/2019 03:38:00 PM,Harpers Ferry Lane And Brodie Lane in Austin (TX),Stray,Normal,Dog,Pit Bull,White/Black,Adoption,False,False,3.0,False,False,False,True,False,False,False,False,False,True
104197,09/14/2018 03:34:00 PM,5207 Two Iron B in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Brown Tabby,Transfer,False,False,0.019231,False,False,False,True,False,False,False,False,False,True
62588,05/01/2019 04:48:00 PM,Mcneil Dr And Heinemann Dr in Austin (TX),Stray,Normal,Dog,Pembroke Welsh Corgi Mix,Tan/White,Return to Owner,True,True,2.0,False,False,False,True,False,False,False,False,False,True
72834,01/25/2018 01:43:00 PM,1311 Fairbanks Unit B in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Black,Adoption,False,False,0.076923,False,False,False,True,False,False,False,False,False,True
42087,09/01/2015 11:49:00 AM,4404 E Oltorf in Austin (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Cream Tabby/White,Transfer,False,True,0.057692,False,False,False,True,False,False,False,False,False,True
43324,07/20/2016 11:53:00 AM,17944 Majestic Elm Ln in Travis (TX),Stray,Normal,Cat,Domestic Shorthair Mix,Brown Tabby/White,Transfer,False,False,0.057692,False,False,False,True,False,False,False,False,False,True
87591,02/08/2022 05:40:00 PM,5910 Johnny Morris Road in Austin (TX),Stray,Normal,Dog,Labrador Retriever Mix,Black/White,Adoption,True,True,1.0,False,False,False,True,False,False,False,False,False,True
61779,01/06/2020 02:20:00 PM,310 East Rundberg Lane in Austin (TX),Stray,Normal,Dog,German Shepherd,Black/Brown,Adoption,False,True,4.0,False,False,False,True,False,False,False,False,False,True
75443,11/18/2021 12:50:00 PM,Travis (TX),Owner Surrender,Normal,Dog,Great Pyrenees Mix,White/Tan,Transfer,False,False,0.083333,False,False,False,True,False,False,False,True,False,False
96298,07/26/2023 03:27:00 PM,Austin (TX),Stray,Normal,Cat,Domestic Shorthair,Black,Adoption,False,True,0.166667,False,False,False,True,False,False,False,False,False,True
