# M1L6 Identifying & Handling Missing Data 

 We'll be working with the same Austin Animal Center Intakes dataset from the previous lecture, which contains information about animals entering the Austin Animal Center.

### **Dataset:** [Austin Animal Center Intakes](https://catalog.data.gov/dataset/austin-animal-center-intakes) -- This is also in your data folder 

### **Objectives:**

 1.  Identify Columns with Missing Data
 2.  Handle missing data by dropping data and using imputation techniques 


## Step 1:  Import pandas and numpy 

In [40]:
#Import packages 

import pandas as pd
import numpy as np

## Step 2:  Load in the data and save it as `df`

In [41]:
df = pd.read_csv('animalData.csv')
df.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
0,A521520,Nina,10/01/2013 07:51:00 AM,October 2013,Norht Ec in Austin (TX),Stray,Normal,Dog,Spayed Female,7 years,Border Terrier/Border Collie,White/Tan
1,A664235,,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White
2,A664236,,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White
3,A664237,,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White
4,A664233,Stevie,10/01/2013 08:53:00 AM,October 2013,7405 Springtime in Austin (TX),Stray,Injured,Dog,Intact Female,3 years,Pit Bull Mix,Blue/White


## Step 3:  Look at the data (can you think of ONE method that we can use to quickly see column names and how many non-null values exist in each column)


**Instructor if they draw a blank say what method can tell us data types, column names, the number of non-null values etc**

In [42]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 173812 entries, 0 to 173811
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Animal ID         173812 non-null  object
 1   Name              123821 non-null  object
 2   DateTime          173812 non-null  object
 3   MonthYear         173812 non-null  object
 4   Found Location    173812 non-null  object
 5   Intake Type       173812 non-null  object
 6   Intake Condition  173812 non-null  object
 7   Animal Type       173812 non-null  object
 8   Sex upon Intake   173811 non-null  object
 9   Age upon Intake   173812 non-null  object
 10  Breed             173812 non-null  object
 11  Color             173812 non-null  object
dtypes: object(12)
memory usage: 15.9+ MB


## Step 4:  Assign a missing indicator to those rows who have missing animal names

In [43]:
df['Missing Animal Name Indicator'] = df['Name'].isna().astype(str)
df.head() #optional line to see changes

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,Missing Animal Name Indicator
0,A521520,Nina,10/01/2013 07:51:00 AM,October 2013,Norht Ec in Austin (TX),Stray,Normal,Dog,Spayed Female,7 years,Border Terrier/Border Collie,White/Tan,False
1,A664235,,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White,True
2,A664236,,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White,True
3,A664237,,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White,True
4,A664233,Stevie,10/01/2013 08:53:00 AM,October 2013,7405 Springtime in Austin (TX),Stray,Injured,Dog,Intact Female,3 years,Pit Bull Mix,Blue/White,False


## Step 5:  Many names are missing let's fill this in with "UNKNOWN"  

In [44]:
df['Name'].fillna("UNKNOWN", inplace = True)
df.head() #optional line to see changes 

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Name'].fillna("UNKNOWN", inplace = True)


Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color,Missing Animal Name Indicator
0,A521520,Nina,10/01/2013 07:51:00 AM,October 2013,Norht Ec in Austin (TX),Stray,Normal,Dog,Spayed Female,7 years,Border Terrier/Border Collie,White/Tan,False
1,A664235,UNKNOWN,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White,True
2,A664236,UNKNOWN,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White,True
3,A664237,UNKNOWN,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,1 week,Domestic Shorthair Mix,Orange/White,True
4,A664233,Stevie,10/01/2013 08:53:00 AM,October 2013,7405 Springtime in Austin (TX),Stray,Injured,Dog,Intact Female,3 years,Pit Bull Mix,Blue/White,False


## Step 6:  There is ony one column with 'Age' missing; let's drop it for now

In [46]:
df.drop(['Age upon Intake'], axis=1)

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Breed,Color,Missing Animal Name Indicator
0,A521520,Nina,10/01/2013 07:51:00 AM,October 2013,Norht Ec in Austin (TX),Stray,Normal,Dog,Spayed Female,Border Terrier/Border Collie,White/Tan,False
1,A664235,UNKNOWN,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,Domestic Shorthair Mix,Orange/White,True
2,A664236,UNKNOWN,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,Domestic Shorthair Mix,Orange/White,True
3,A664237,UNKNOWN,10/01/2013 08:33:00 AM,October 2013,Abia in Austin (TX),Stray,Normal,Cat,Unknown,Domestic Shorthair Mix,Orange/White,True
4,A664233,Stevie,10/01/2013 08:53:00 AM,October 2013,7405 Springtime in Austin (TX),Stray,Injured,Dog,Intact Female,Pit Bull Mix,Blue/White,False
...,...,...,...,...,...,...,...,...,...,...,...,...
173807,A929690,UNKNOWN,05/03/2025 11:18:00 PM,May 2025,8038 Exchange Dr in Austin (TX),Stray,Injured,Dog,Intact Male,Belgian Malinois,Brown/Black,True
173808,A929717,UNKNOWN,05/04/2025 03:14:00 PM,May 2025,Austin (TX),Public Assist,Normal,Dog,Intact Male,Shih Tzu Mix,White/Blue,True
173809,A929724,UNKNOWN,05/04/2025 07:43:00 PM,May 2025,7105 Providence Ave Apt 3 in Austin (TX),Stray,Normal,Other,Unknown,Rabbit Sh,Tan/White,True
173810,A929725,Oswold,05/04/2025 10:55:00 PM,May 2025,1501 Red River St in Austin (TX),Public Assist,Normal,Dog,Intact Male,Boxer Mix,Tan/White,False
