# Dallas Animal Shelter 2019

Data is from 2 separate Dallas Animal Shelter records: one contains data from October 2018-September 2019 and the other from October 2019-March 2020.

This analysis will simply look year 2019, taking all data within 2019 from both datasets into one.

## Hypothesis Testing  

**Null Hypothesis**: cats and dogs have similar outcomes when encountered with similar intake type and condition.

**Alternative Hypothesis**: the intake condition and type of cats affect their outcome less than dogs.


## Load in data

In [84]:
import numpy as np
import pandas as pd
from datetime import datetime
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 50)

# description of fields: https://www.dallasopendata.com/City-Services/Animals-Inventory/qgg6-h4bd
# https://gis.dallascityhall.com/documents/StaticMaps/Council/2013_Council_PDFs/2013_CouncilDistrictAllA.pdf

In [86]:
fy_2020 = pd.read_csv('FY2020_Dallas_Animal_Shelter_Data.csv', low_memory=False)
fy_2019 = pd.read_csv('FY_2019_Dallas_Animal_Shelter_Data.csv', low_memory=False)

In [88]:
fy_2019['Month'].value_counts()

JUN.2019    4953
MAY.2019    4658
JUL.2019    4539
AUG.2019    4239
SEP.2019    3928
JAN.2019    3843
APR.2019    3759
MAR.2019    3681
DEC.2018    3523
OCT.2018    3219
FEB.2019    3093
NOV.2018    2974
Name: Month, dtype: int64

In [5]:
original_df = fy_2020.

Unnamed: 0,Animal Id,Animal Type,Animal Breed,Kennel Number,Kennel Status,Tag Type,Activity Number,Activity Sequence,Source Id,Census Tract,Council District,Intake Type,Intake Subtype,Intake Total,Reason,Staff Id,Intake Date,Intake Time,Due Out,Intake Condition,Hold Request,Outcome Type,Outcome Subtype,Outcome Date,Outcome Time,Receipt Number,Impound Number,Service Request Number,Outcome Condition,Chip Status,Animal Origin,Additional Information,Month,Year
0,A0144701,DOG,HAVANESE,VT 12,IMPOUNDED,,,1,P0098773,6301,4,OWNER SURRENDER,GENERAL,1,PERSNLISSU,CDM,11/8/19,15:48:00,11/14/19,APP SICK,,RETURNED TO OWNER,WALK IN,11/9/19,11:31:00,R19-558731,K19-486742,,APP SICK,SCAN CHIP,OVER THE COUNTER,RETURNED TO OWNER,NOV.2019,FY2020
1,A0442587,DOG,TERRIER MIX,FREEZER,IMPOUNDED,,,1,P0492284,7102,2,OWNER SURRENDER,DEAD ON ARRIVAL,1,OTHRINTAKS,CDM,11/10/19,14:18:00,11/10/19,DEAD,,DEAD ON ARRIVAL,DISPOSAL,11/10/19,0:00:00,,K19-486954,,DEAD,SCAN CHIP,OVER THE COUNTER,,NOV.2019,FY2020
2,A0458972,DOG,CATAHOULA,RECEIVING,UNAVAILABLE,,A19-195601,1,P9991718,4600,1,STRAY,AT LARGE,1,OTHER,MG1718,10/3/19,11:08:00,10/3/19,TREATABLE REHABILITABLE NON-CONTAGIOUS,,RETURNED TO OWNER,FIELD,10/3/19,13:36:00,,K19-482022,,TREATABLE REHABILITABLE NON-CONTAGIOUS,SCAN NO CHIP,SWEEP,,OCT.2019,FY2020
3,A0525642,DOG,GERM SHEPHERD,INJD 001,IMPOUNDED,,A19-196573,1,P0903792,16605,8,OWNER SURRENDER,GENERAL,1,OTHER,RA 1549,10/11/19,9:55:00,10/17/19,TREATABLE REHABILITABLE NON-CONTAGIOUS,EMERGENCY RESCUE,TRANSFER,MEDICAL-CONTAGIOUS,10/15/19,17:35:00,,K19-483073,,TREATABLE REHABILITABLE NON-CONTAGIOUS,SCAN CHIP,SWEEP,TAGGED,OCT.2019,FY2020
4,A0565586,DOG,SILKY TERRIER,LFD 119,UNAVAILABLE,,,1,P0890077,6900,1,STRAY,AT LARGE,1,OTHRINTAKS,JR,11/8/19,11:55:00,11/14/19,APP WNL,RESCU ONLY,RETURNED TO OWNER,WALK IN,11/9/19,12:57:00,R19-558750,K19-486694,,APP WNL,SCAN CHIP,OVER THE COUNTER,RETURNED TO OWNER,NOV.2019,FY2020


In [6]:
original_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20247 entries, 0 to 20246
Data columns (total 34 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Animal Id               20247 non-null  object 
 1   Animal Type             20247 non-null  object 
 2   Animal Breed            20247 non-null  object 
 3   Kennel Number           20247 non-null  object 
 4   Kennel Status           20247 non-null  object 
 5   Tag Type                0 non-null      float64
 6   Activity Number         10636 non-null  object 
 7   Activity Sequence       20247 non-null  int64  
 8   Source Id               20247 non-null  object 
 9   Census Tract            18146 non-null  object 
 10  Council District        18146 non-null  object 
 11  Intake Type             20247 non-null  object 
 12  Intake Subtype          20247 non-null  object 
 13  Intake Total            20247 non-null  int64  
 14  Reason                  19189 non-null

_Looked at value counts of many columns to determine which ones are necessary._

## Create dataframe with necessary columns

In [23]:
dallas = original_df[['Animal Type', 'Animal Breed', 'Council District', 'Intake Type', 'Intake Subtype', 
                      'Reason', 'Intake Date', 'Intake Time', 'Due Out', 'Intake Condition', 'Outcome Type', 'Outcome Subtype',
                      'Outcome Date', 'Outcome Time', 'Outcome Condition', 'Animal Origin']]

In [24]:
dallas.head()

Unnamed: 0,Animal Type,Animal Breed,Council District,Intake Type,Intake Subtype,Reason,Intake Date,Intake Time,Due Out,Intake Condition,Outcome Type,Outcome Subtype,Outcome Date,Outcome Time,Outcome Condition,Animal Origin
0,DOG,HAVANESE,4,OWNER SURRENDER,GENERAL,PERSNLISSU,11/8/19,15:48:00,11/14/19,APP SICK,RETURNED TO OWNER,WALK IN,11/9/19,11:31:00,APP SICK,OVER THE COUNTER
1,DOG,TERRIER MIX,2,OWNER SURRENDER,DEAD ON ARRIVAL,OTHRINTAKS,11/10/19,14:18:00,11/10/19,DEAD,DEAD ON ARRIVAL,DISPOSAL,11/10/19,0:00:00,DEAD,OVER THE COUNTER
2,DOG,CATAHOULA,1,STRAY,AT LARGE,OTHER,10/3/19,11:08:00,10/3/19,TREATABLE REHABILITABLE NON-CONTAGIOUS,RETURNED TO OWNER,FIELD,10/3/19,13:36:00,TREATABLE REHABILITABLE NON-CONTAGIOUS,SWEEP
3,DOG,GERM SHEPHERD,8,OWNER SURRENDER,GENERAL,OTHER,10/11/19,9:55:00,10/17/19,TREATABLE REHABILITABLE NON-CONTAGIOUS,TRANSFER,MEDICAL-CONTAGIOUS,10/15/19,17:35:00,TREATABLE REHABILITABLE NON-CONTAGIOUS,SWEEP
4,DOG,SILKY TERRIER,1,STRAY,AT LARGE,OTHRINTAKS,11/8/19,11:55:00,11/14/19,APP WNL,RETURNED TO OWNER,WALK IN,11/9/19,12:57:00,APP WNL,OVER THE COUNTER


## Missing values and strange datatypes

In [25]:
dallas.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20247 entries, 0 to 20246
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Animal Type        20247 non-null  object
 1   Animal Breed       20247 non-null  object
 2   Council District   18146 non-null  object
 3   Intake Type        20247 non-null  object
 4   Intake Subtype     20247 non-null  object
 5   Reason             19189 non-null  object
 6   Intake Date        20247 non-null  object
 7   Intake Time        20247 non-null  object
 8   Due Out            20247 non-null  object
 9   Intake Condition   20247 non-null  object
 10  Outcome Type       20247 non-null  object
 11  Outcome Subtype    20247 non-null  object
 12  Outcome Date       19928 non-null  object
 13  Outcome Time       20247 non-null  object
 14  Outcome Condition  18976 non-null  object
 15  Animal Origin      19189 non-null  object
dtypes: object(16)
memory usage: 2.5+ MB


In [26]:
dallas[dallas.isnull().any(axis=1) == True].head(5)

Unnamed: 0,Animal Type,Animal Breed,Council District,Intake Type,Intake Subtype,Reason,Intake Date,Intake Time,Due Out,Intake Condition,Outcome Type,Outcome Subtype,Outcome Date,Outcome Time,Outcome Condition,Animal Origin
43,DOG,JACK RUSS TERR,,STRAY,WEB,,1/18/20,23:25:00,1/24/20,NORMAL,LOST EXP,OTHER,2/24/20,0:00:00,,
44,DOG,BOXER,6.0,OWNER SURRENDER,DEAD ON ARRIVAL,OTHER,10/25/19,11:36:00,10/25/19,UNHEALTHY UNTREATABLE NON-CONTAGIOUS,DEAD ON ARRIVAL,DISPOSAL,10/25/19,14:19:00,,OVER THE COUNTER
105,DOG,DACHSHUND,4.0,STRAY,AT LARGE,OTHRINTAKS,10/15/19,12:05:00,10/21/19,TREATABLE REHABILITABLE NON-CONTAGIOUS,TRANSFER,GENERAL,10/22/19,9:46:00,,OVER THE COUNTER
112,DOG,LABRADOR RETR,,OWNER SURRENDER,GENERAL,NOTRIGHTFT,2/15/20,16:58:00,2/15/20,APP WNL,ADOPTION,WALK IN,2/15/20,18:09:00,APP WNL,OVER THE COUNTER
239,CAT,DOMESTIC SH,,STRAY,WEB,,1/12/20,21:25:00,1/13/20,NORMAL,FOUND EXP,OTHER,2/13/20,0:00:00,,


In [27]:
dallas[dallas['Animal Type'] == 'CAT']['Council District'].value_counts(dropna=False).sort_index()

0         1
1       341
10      111
11      111
12       51
13      151
14       85
2       206
3       269
4       245
5       171
6      1823
7       205
8       251
9       125
NaN     381
Name: Council District, dtype: int64

In [28]:
dallas['Outcome Condition'].value_counts(dropna=False)

APP WNL                                   12567
TREATABLE REHABILITABLE NON-CONTAGIOUS     2972
NaN                                        1271
APP SICK                                    780
UNKNOWN                                     570
UNDERAGE                                    550
UNHEALTHY UNTREATABLE NON-CONTAGIOUS        351
APP INJ                                     337
CRITICAL                                    299
DEAD                                        178
HEALTHY                                     137
FATAL                                        71
TREATABLE MANAGEABLE NON-CONTAGIOUS          65
TREATABLE REHABILITABLE CONTAGIOUS           37
DECEASED                                     32
UNHEALTHY UNTREATABLE CONTAGIOUS             27
TREATABLE MANAGEABLE CONTAGIOUS               3
Name: Outcome Condition, dtype: int64

## Time and Dates

In [29]:
dallas[['Intake Date', 'Intake Time', 'Due Out', 'Outcome Date', 'Outcome Time']].head()

Unnamed: 0,Intake Date,Intake Time,Due Out,Outcome Date,Outcome Time
0,11/8/19,15:48:00,11/14/19,11/9/19,11:31:00
1,11/10/19,14:18:00,11/10/19,11/10/19,0:00:00
2,10/3/19,11:08:00,10/3/19,10/3/19,13:36:00
3,10/11/19,9:55:00,10/17/19,10/15/19,17:35:00
4,11/8/19,11:55:00,11/14/19,11/9/19,12:57:00


In [30]:
pd.options.mode.chained_assignment = None

dallas['Intake DateTime'] = dallas['Intake Date'] + ' ' + dallas['Intake Time']
dallas['Outcome DateTime'] = dallas['Outcome Date'] + ' ' + dallas['Outcome Time']

dallas.head()

Unnamed: 0,Animal Type,Animal Breed,Council District,Intake Type,Intake Subtype,Reason,Intake Date,Intake Time,Due Out,Intake Condition,Outcome Type,Outcome Subtype,Outcome Date,Outcome Time,Outcome Condition,Animal Origin,Intake DateTime,Outcome DateTime
0,DOG,HAVANESE,4,OWNER SURRENDER,GENERAL,PERSNLISSU,11/8/19,15:48:00,11/14/19,APP SICK,RETURNED TO OWNER,WALK IN,11/9/19,11:31:00,APP SICK,OVER THE COUNTER,11/8/19 15:48:00,11/9/19 11:31:00
1,DOG,TERRIER MIX,2,OWNER SURRENDER,DEAD ON ARRIVAL,OTHRINTAKS,11/10/19,14:18:00,11/10/19,DEAD,DEAD ON ARRIVAL,DISPOSAL,11/10/19,0:00:00,DEAD,OVER THE COUNTER,11/10/19 14:18:00,11/10/19 0:00:00
2,DOG,CATAHOULA,1,STRAY,AT LARGE,OTHER,10/3/19,11:08:00,10/3/19,TREATABLE REHABILITABLE NON-CONTAGIOUS,RETURNED TO OWNER,FIELD,10/3/19,13:36:00,TREATABLE REHABILITABLE NON-CONTAGIOUS,SWEEP,10/3/19 11:08:00,10/3/19 13:36:00
3,DOG,GERM SHEPHERD,8,OWNER SURRENDER,GENERAL,OTHER,10/11/19,9:55:00,10/17/19,TREATABLE REHABILITABLE NON-CONTAGIOUS,TRANSFER,MEDICAL-CONTAGIOUS,10/15/19,17:35:00,TREATABLE REHABILITABLE NON-CONTAGIOUS,SWEEP,10/11/19 9:55:00,10/15/19 17:35:00
4,DOG,SILKY TERRIER,1,STRAY,AT LARGE,OTHRINTAKS,11/8/19,11:55:00,11/14/19,APP WNL,RETURNED TO OWNER,WALK IN,11/9/19,12:57:00,APP WNL,OVER THE COUNTER,11/8/19 11:55:00,11/9/19 12:57:00


In [31]:
dallas['Intake DateTime'] = pd.to_datetime(dallas['Intake DateTime'], format='%m/%d/%y %H:%M:%S')
dallas['Outcome DateTime'] = pd.to_datetime(dallas['Outcome DateTime'], format='%m/%d/%y %H:%M:%S')

In [32]:
dallas.head(2)

Unnamed: 0,Animal Type,Animal Breed,Council District,Intake Type,Intake Subtype,Reason,Intake Date,Intake Time,Due Out,Intake Condition,Outcome Type,Outcome Subtype,Outcome Date,Outcome Time,Outcome Condition,Animal Origin,Intake DateTime,Outcome DateTime
0,DOG,HAVANESE,4,OWNER SURRENDER,GENERAL,PERSNLISSU,11/8/19,15:48:00,11/14/19,APP SICK,RETURNED TO OWNER,WALK IN,11/9/19,11:31:00,APP SICK,OVER THE COUNTER,2019-11-08 15:48:00,2019-11-09 11:31:00
1,DOG,TERRIER MIX,2,OWNER SURRENDER,DEAD ON ARRIVAL,OTHRINTAKS,11/10/19,14:18:00,11/10/19,DEAD,DEAD ON ARRIVAL,DISPOSAL,11/10/19,0:00:00,DEAD,OVER THE COUNTER,2019-11-10 14:18:00,2019-11-10 00:00:00


_Check the new datetime column to insure there are no strange values and determine any possible missing values._

In [60]:
# Intake DateTime had no missing values and all days, months, years, hours, minutues and seconds are accounted for.
# Outcome DateTime has missing day/month/year/time values, all with the same value(319) which means it is likely the same records missing these values.

dallas['Outcome DateTime'].dt.year.value_counts(dropna=False).sort_index()

2019.0    10327
2020.0     9601
NaN         319
Name: Outcome DateTime, dtype: int64

In [83]:
dallas[dallas['Outcome DateTime'].dt.year.isnull()]['Intake DateTime']

803     2020-03-11 10:25:00
3710    2020-03-26 18:25:00
4253    2020-03-25 13:25:00
6977    2020-03-25 14:36:00
8811    2020-02-28 10:25:00
                ...        
20231   2020-03-24 13:25:00
20233   2020-03-25 11:57:00
20235   2020-03-19 02:25:00
20240   2020-03-25 12:38:00
20241   2020-03-22 20:37:00
Name: Intake DateTime, Length: 319, dtype: datetime64[ns]

## Column Values

In [33]:
dallas['Animal Type'].value_counts()

DOG          15100
CAT           4527
WILDLIFE       537
BIRD            73
LIVESTOCK       10
Name: Animal Type, dtype: int64

In [34]:
print('Intake Type: ', list(dallas['Intake Type'].unique()))
print('\n')
print('Intake Condition: ', list(dallas['Intake Condition'].unique()))
print('\n')
print('Outcome Type: ', list(dallas['Outcome Type'].unique()))
print('\n')
print('Outcome Condition: ', list(dallas['Outcome Condition'].unique()))
print('\n')
print('Animal Origin: ', list(dallas['Animal Origin'].unique()))

Intake Type:  ['OWNER SURRENDER', 'STRAY', 'FOSTER', 'CONFISCATED', 'TRANSFER', 'WILDLIFE', 'TREATMENT', 'KEEPSAFE', 'DISPOS REQ']


Intake Condition:  ['APP SICK', 'DEAD', 'TREATABLE REHABILITABLE NON-CONTAGIOUS', 'APP WNL', 'UNHEALTHY UNTREATABLE NON-CONTAGIOUS', 'CRITICAL', 'NORMAL', 'APP INJ', 'UNKNOWN', 'UNHEALTHY UNTREATABLE CONTAGIOUS', 'TREATABLE MANAGEABLE NON-CONTAGIOUS', 'TREATABLE MANAGEABLE CONTAGIOUS', 'HEALTHY', 'TREATABLE REHABILITABLE CONTAGIOUS', 'UNDERAGE', 'FATAL', 'DECEASED']


Outcome Type:  ['RETURNED TO OWNER', 'DEAD ON ARRIVAL', 'TRANSFER', 'DIED', 'EUTHANIZED', 'ADOPTION', 'FOSTER', 'LOST EXP', 'WILDLIFE', 'TREATMENT', 'FOUND EXP', 'OTHER', 'MISSING', 'DISPOSAL']


Outcome Condition:  ['APP SICK', 'DEAD', 'TREATABLE REHABILITABLE NON-CONTAGIOUS', 'APP WNL', 'UNHEALTHY UNTREATABLE NON-CONTAGIOUS', 'UNKNOWN', 'HEALTHY', 'CRITICAL', nan, 'APP INJ', 'FATAL', 'UNDERAGE', 'TREATABLE MANAGEABLE NON-CONTAGIOUS', 'TREATABLE MANAGEABLE CONTAGIOUS', 'TREATABLE REHABILITA

## Exploratory Data Analysis (EDA)

## Plots and Graphs

## Map