In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Forecasting the population of the Austin Animal Center

This project uses data from the Austin Animal Center to forecast the expected population of the shelter. Forecasting the number of animals the shelter is likely to have in future is important for the shelter in terms of planning, budgeting and resourcing. 

The data I have used for this project can be downloaded from [here](https://www.kaggle.com/aaronschlegel/austin-animal-center-shelter-intakes-and-outcomes).

## Data
### Initial exploration

In [2]:
aac = pd.read_csv('aac_intakes_outcomes.csv')

In [3]:
aac.shape

(79672, 41)

There are 79,672 observations in the data and 41 variables.

In [4]:
aac.loc[0, ]

age_upon_outcome                                                   10 years
animal_id_outcome                                                   A006100
date_of_birth                                           2007-07-09 00:00:00
outcome_subtype                                                         NaN
outcome_type                                                Return to Owner
sex_upon_outcome                                              Neutered Male
age_upon_outcome_(days)                                                3650
age_upon_outcome_(years)                                                 10
age_upon_outcome_age_group                                      (7.5, 10.0]
outcome_datetime                                        2017-12-07 14:07:00
outcome_month                                                            12
outcome_year                                                           2017
outcome_monthyear                                                   2017-12
outcome_week

Several of the variables are derivatives of the time the animal arrived at or left the shelter. 

In [5]:
aac.isnull().sum()

age_upon_outcome                  0
animal_id_outcome                 0
date_of_birth                     0
outcome_subtype               43324
outcome_type                     10
sex_upon_outcome                  1
age_upon_outcome_(days)           0
age_upon_outcome_(years)          0
age_upon_outcome_age_group        0
outcome_datetime                  0
outcome_month                     0
outcome_year                      0
outcome_monthyear                 0
outcome_weekday                   0
outcome_hour                      0
outcome_number                    0
dob_year                          0
dob_month                         0
dob_monthyear                     0
age_upon_intake                   0
animal_id_intake                  0
animal_type                       0
breed                             0
color                             0
found_location                    0
intake_condition                  0
intake_type                       0
sex_upon_intake             

There is very little data missing from the dataset. The outcome subtype is often missing but this may be because it is often not relevant (e.g. there appears to be no subtype needed if the animal was returned to owner).

### Checking

In [63]:
## These should all total 79,672.

print(sum(aac['age_upon_outcome_(years)'] >= aac['age_upon_intake_(years)']), ': age_upon_outcome > age_upon_intake', )
print(sum(aac['dob_year'] == pd.to_datetime(aac['date_of_birth']).apply(lambda x: x.year)), ': Year in date_of_birth = dob_year')
print(sum((aac['outcome_year'] - aac['dob_year'] - aac['age_upon_outcome_(years)'].astype(int)).isin([-1, 0, 1])), ': outcome_year - dob_year - age (years) = -1, 0 or 1')
print(sum(aac['animal_id_intake'] == aac['animal_id_outcome']), ': animal_id_intake = animal_id_outcome')
print(sum(pd.to_datetime(aac['outcome_datetime']) >= pd.to_datetime(aac['intake_datetime'])), ': outcome_datetime >= intake_datetime')
print(sum((pd.to_datetime(aac['outcome_datetime']) - pd.to_datetime(aac['intake_datetime'])) == aac['time_in_shelter']), ': outcome_datetime - intake_datetime = time_in_shelter')

79518 : age_upon_outcome > age_upon_intake
79672 : Year in date_of_birth = dob_year
79672 : outcome_year - dob_year - age (years) = -1, 0 or 1
79672 : animal_id_intake = animal_id_outcome
79672 : outcome_datetime >= intake_datetime
79672 : outcome_datetime - intake_datetime = time_in_shelter


Generally the data appears in good shape. Around 150 ages change in the wrong direction while animals are at the shelter, though this may because their ages are guessed when they arrive.

### Exploratory data analysis