In [1]:
#Import Libraries
import numpy as np
import pandas as pd

After investigation of the "Name" column, in the attempt to address Null values, it was determined that just because an animal had a value in the name column, did not mean that the animal actually had a name. Due to the size and extent of the problem, name's were manually cleaned using excel. Some of the following are examples of name entries that were found and were replaced with a null:
    - Lost Dog
    - Cute Puppies
    - Boy
    - Girl
    - Micellaneous breed names
    - No Name Yet
    - Please Name Me
    - "Save me or I'll Die"
    - Urgent home needed
    - Puppy
    - Various descriptions of puppies (happy puppy, big eyes, sad puppy, 
      bouncy puppy, etc.) 

The names that are missing are left as null values.

A similar activity was performed after additional investigation of the "description" column. The following values are exaples of data that was removed and replaced with null vales: 

    - #NAME?
    - URGENT!!
    - ‰ªñÊòØË¢ and variations of webding like characters
    - "..." and variations on quantity of dots



These files were saved under different names, and will be used for the rest of the data wrangling in place of the original files 'train' and 'test'. 

In [2]:
#File Names
breedfile = 'data/breed_labels.csv'
colorfile = 'data/color_labels.csv'
statefile = 'data/state_labels.csv'
testfile = 'data/test/test_clean.csv'
trainfile = 'data/train_clean.csv'

#Import Files
breeds=pd.read_csv(breedfile)
color=pd.read_csv(colorfile)
state=pd.read_csv(statefile)
test_data=pd.read_csv(testfile)
train_data=pd.read_csv(trainfile)

In [3]:
#Examine Test Data Set
test_data.head()

Unnamed: 0,Type,Name,Age,Breed1,Breed2,Gender,Color1,Color2,Color3,MaturitySize,...,Sterilized,Health,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt
0,2,Dopey & Grey,8,266,266,1,2,6,7,1,...,2,1,2,0,41326,2ece3b2573dcdcebd774e635dca15fd9,0,"Dopey Age: 8mths old Male One half of a pair, ...",e2dfc2935,2
1,2,Chi Chi,36,285,264,2,1,4,7,2,...,1,2,1,0,41326,2ece3b2573dcdcebd774e635dca15fd9,0,"Please note that Chichi has been neutered, the...",f153b465f,1
2,2,Sticky,2,265,0,1,6,7,0,2,...,2,1,1,200,41326,e59c106e9912fa30c898976278c2e834,0,"Sticky, named such because of his tendency to ...",3c90f3f54,4
3,1,Dannie & Kass [In Penang],12,307,0,2,2,5,0,2,...,1,1,2,0,41326,e59c106e9912fa30c898976278c2e834,0,Dannie and Kass are mother and daughter. We en...,e02abc8a3,5
4,2,Cuddles,12,265,0,1,2,3,7,2,...,1,1,1,0,41326,e59c106e9912fa30c898976278c2e834,0,"Extremely cuddly cat, hence the origin of his ...",09f0df7d1,5


In [4]:
test_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3972 entries, 0 to 3971
Data columns (total 23 columns):
Type            3972 non-null int64
Name            3149 non-null object
Age             3972 non-null int64
Breed1          3972 non-null int64
Breed2          3972 non-null int64
Gender          3972 non-null int64
Color1          3972 non-null int64
Color2          3972 non-null int64
Color3          3972 non-null int64
MaturitySize    3972 non-null int64
FurLength       3972 non-null int64
Vaccinated      3972 non-null int64
Dewormed        3972 non-null int64
Sterilized      3972 non-null int64
Health          3972 non-null int64
Quantity        3972 non-null int64
Fee             3972 non-null int64
State           3972 non-null int64
RescuerID       3972 non-null object
VideoAmt        3972 non-null int64
Description     3954 non-null object
PetID           3972 non-null object
PhotoAmt        3972 non-null int64
dtypes: int64(19), object(4)
memory usage: 713.8+ KB


Notice that there are null values for description (1), and several for Name, which is to be expected based on the cleaning done prior to importing data sets. 

There are _not_ null values for breed, which is unexpected, as some pets should be only one breed vs a mixed breed (values in Breed1 and Breed2).

In [5]:
#Examine Train Data Set
train_data.head()

Unnamed: 0,Type,Name,Age,Breed1,Breed2,Gender,Color1,Color2,Color3,MaturitySize,...,Health,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt,AdoptionSpeed
0,2,Nibble,3,299,0,1,1,7,0,1,...,1,1,100,41326,8480853f516546f6cf33aa88cd76c379,0,Nibble is a 3+ month old ball of cuteness. He ...,86e1089a3,1,2
1,2,No Name Yet,1,265,0,1,1,2,0,2,...,1,1,0,41401,3082c7125d8fb66f7dd4bff4192c8b14,0,I just found it alone yesterday near my apartm...,6296e909a,2,0
2,1,Brisco,1,307,0,1,2,7,0,2,...,1,1,0,41326,fa90fa5b1ee11c86938398b60abc32cb,0,Their pregnant mother was dumped by her irresp...,3422e4906,7,3
3,1,Miko,4,307,0,2,1,2,0,2,...,1,1,150,41401,9238e4f44c71a75282e62f7136c6b240,0,"Good guard dog, very alert, active, obedience ...",5842f1ff5,8,2
4,1,Hunter,1,307,0,1,1,0,0,2,...,1,1,0,41326,95481e953f8aed9ec3d16fc4509537e8,0,This handsome yet cute boy is up for adoption....,850a43f90,3,2


In [6]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14991 entries, 0 to 14990
Data columns (total 24 columns):
Type             14991 non-null int64
Name             12020 non-null object
Age              14991 non-null int64
Breed1           14991 non-null int64
Breed2           14991 non-null int64
Gender           14991 non-null int64
Color1           14991 non-null int64
Color2           14991 non-null int64
Color3           14991 non-null int64
MaturitySize     14991 non-null int64
FurLength        14991 non-null int64
Vaccinated       14991 non-null int64
Dewormed         14991 non-null int64
Sterilized       14991 non-null int64
Health           14991 non-null int64
Quantity         14991 non-null int64
Fee              14991 non-null int64
State            14991 non-null int64
RescuerID        14991 non-null object
VideoAmt         14991 non-null int64
Description      14922 non-null object
PetID            14991 non-null object
PhotoAmt         14991 non-null int64
AdoptionSpeed

Notice that there are again null values in description and name, but no null values in the breed column. 

In [7]:
#Combine Data sets

#Add blank 'Adoption Speed' column to test
test_data['AdoptionSpeed']=''
data=pd.concat([test_data,train_data],axis=0).reset_index()
data.head()

Unnamed: 0,index,Type,Name,Age,Breed1,Breed2,Gender,Color1,Color2,Color3,...,Health,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt,AdoptionSpeed
0,0,2,Dopey & Grey,8,266,266,1,2,6,7,...,1,2,0,41326,2ece3b2573dcdcebd774e635dca15fd9,0,"Dopey Age: 8mths old Male One half of a pair, ...",e2dfc2935,2,
1,1,2,Chi Chi,36,285,264,2,1,4,7,...,2,1,0,41326,2ece3b2573dcdcebd774e635dca15fd9,0,"Please note that Chichi has been neutered, the...",f153b465f,1,
2,2,2,Sticky,2,265,0,1,6,7,0,...,1,1,200,41326,e59c106e9912fa30c898976278c2e834,0,"Sticky, named such because of his tendency to ...",3c90f3f54,4,
3,3,1,Dannie & Kass [In Penang],12,307,0,2,2,5,0,...,1,2,0,41326,e59c106e9912fa30c898976278c2e834,0,Dannie and Kass are mother and daughter. We en...,e02abc8a3,5,
4,4,2,Cuddles,12,265,0,1,2,3,7,...,1,1,0,41326,e59c106e9912fa30c898976278c2e834,0,"Extremely cuddly cat, hence the origin of his ...",09f0df7d1,5,


In [8]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18963 entries, 0 to 18962
Data columns (total 25 columns):
index            18963 non-null int64
Type             18963 non-null int64
Name             15169 non-null object
Age              18963 non-null int64
Breed1           18963 non-null int64
Breed2           18963 non-null int64
Gender           18963 non-null int64
Color1           18963 non-null int64
Color2           18963 non-null int64
Color3           18963 non-null int64
MaturitySize     18963 non-null int64
FurLength        18963 non-null int64
Vaccinated       18963 non-null int64
Dewormed         18963 non-null int64
Sterilized       18963 non-null int64
Health           18963 non-null int64
Quantity         18963 non-null int64
Fee              18963 non-null int64
State            18963 non-null int64
RescuerID        18963 non-null object
VideoAmt         18963 non-null int64
Description      18876 non-null object
PetID            18963 non-null object
PhotoAmt     

For the purposes of this problem, we're only interested in dog-specific information. So let's pull only the data related to dogs from each of the files. In this case dogs are pet type 1. 

In [9]:
dog_breeds = breeds[breeds.Type==1]
dog_data = data[data.Type==1]

Reinvestigate the dataframes without the cats in them.

In [10]:
dog_breeds.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 241 entries, 0 to 240
Data columns (total 3 columns):
BreedID      241 non-null int64
Type         241 non-null int64
BreedName    241 non-null object
dtypes: int64(2), object(1)
memory usage: 7.5+ KB


In [11]:
dog_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10230 entries, 3 to 18962
Data columns (total 25 columns):
index            10230 non-null int64
Type             10230 non-null int64
Name             7238 non-null object
Age              10230 non-null int64
Breed1           10230 non-null int64
Breed2           10230 non-null int64
Gender           10230 non-null int64
Color1           10230 non-null int64
Color2           10230 non-null int64
Color3           10230 non-null int64
MaturitySize     10230 non-null int64
FurLength        10230 non-null int64
Vaccinated       10230 non-null int64
Dewormed         10230 non-null int64
Sterilized       10230 non-null int64
Health           10230 non-null int64
Quantity         10230 non-null int64
Fee              10230 non-null int64
State            10230 non-null int64
RescuerID        10230 non-null object
VideoAmt         10230 non-null int64
Description      10196 non-null object
PetID            10230 non-null object
PhotoAmt      

We no longer need the type columns, so let's go ahead and drop them to save memory. 

In [12]:
dog_breeds = dog_breeds.drop(['Type'] , axis = 1)
dog_data = dog_data.drop(['Type'] , axis =1)

In [13]:
#Investigate Null Description Rows
no_descrip=dog_data[dog_data.Description.isnull()]
no_descrip

Unnamed: 0,index,Name,Age,Breed1,Breed2,Gender,Color1,Color2,Color3,MaturitySize,...,Health,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt,AdoptionSpeed
488,488,Pat Pat,3,307,307,2,5,0,0,2,...,1,1,0,41332,88d48aeb402a9d6a891fff5c0b86a29a,0,,1f85c9142,1,
3009,3009,,1,207,0,2,1,2,0,1,...,1,1,700,41326,723d408dd3017416249d388b8232decb,0,,944c4186b,3,
3010,3010,Pure Breed Silky Terrier,1,207,0,2,1,2,0,1,...,1,1,700,41326,723d408dd3017416249d388b8232decb,0,,58af3154a,8,
3011,3011,,1,207,0,2,1,2,0,1,...,1,1,700,41326,723d408dd3017416249d388b8232decb,0,,17656ed15,3,
4011,39,,12,307,0,1,5,0,0,2,...,1,1,0,41326,6757d0b9d5b72d8b78c20e355c7fe62c,0,,4e3640544,1,3.0
4330,358,DIANA,18,307,307,2,1,0,0,2,...,1,1,0,41326,13733222f015ec6a0017c3c0527738ff,0,,3a3ae56f1,3,4.0
4408,436,,1,307,0,3,1,2,7,2,...,1,5,0,41326,eee77f12979856ef498630954461b4e3,0,,1a9e7158e,15,1.0
4964,992,Tennis,10,109,0,1,5,0,0,3,...,1,1,0,41401,7abebb25ee29f6d01a5fdae400cfbbab,0,,06bf9e676,1,1.0
5541,1569,,1,307,0,2,1,2,0,1,...,1,1,0,41401,8bb437e94e4664422536b5c3ce3682af,0,,92bf4bb1a,1,2.0
5573,1601,,24,179,0,2,2,0,0,1,...,1,1,0,41326,239b99bbff682c0bc548ce9ccbc1182c,0,,f6ae0d580,1,1.0


All data appears to be intact, so a lack of description unlikely to be an indicator of missing or bad data. 
Leave missing descriptions as null values. No additional cleaning is necessary knowing description data was cleaned prior to import. 

However, notice that one of the animals (#488) has no value for adoption speed, despite there being no null values... this will need investigated in a moment. 

In [14]:
#From data investigation we know some of the breed's are zero, is zero in the breeds table?
dog_breeds.head()

Unnamed: 0,BreedID,BreedName
0,1,Affenpinscher
1,2,Afghan Hound
2,3,Airedale Terrier
3,4,Akbash
4,5,Akita


There is no breed ID of zero, however we may be merging the breed, color, and state tables with the data set so let's create a breed table with a response for a BreedID of zero.

In [15]:
#Create value for if there is no breed
index_none=['BreedID','BreedName']
nobreed=pd.DataFrame([0,np.NaN],index=index_none)
nobreed=nobreed.T

#Append value if no breed to breeds table
dog_breeds_wnone=nobreed.append(dog_breeds,sort=True)
dog_breeds_wnone.head()

Unnamed: 0,BreedID,BreedName
0,0.0,
0,1.0,Affenpinscher
1,2.0,Afghan Hound
2,3.0,Airedale Terrier
3,4.0,Akbash


Save data frame as csv for use later

In [16]:
dog_breeds_wnone.to_csv(r'tidy_data/dog_breeds.csv')

It may be important to know if an animal is a mixed breed or not.... so let's create a column to identify mixed vs "pure" breed animals. 

In [17]:
dog_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10230 entries, 3 to 18962
Data columns (total 24 columns):
index            10230 non-null int64
Name             7238 non-null object
Age              10230 non-null int64
Breed1           10230 non-null int64
Breed2           10230 non-null int64
Gender           10230 non-null int64
Color1           10230 non-null int64
Color2           10230 non-null int64
Color3           10230 non-null int64
MaturitySize     10230 non-null int64
FurLength        10230 non-null int64
Vaccinated       10230 non-null int64
Dewormed         10230 non-null int64
Sterilized       10230 non-null int64
Health           10230 non-null int64
Quantity         10230 non-null int64
Fee              10230 non-null int64
State            10230 non-null int64
RescuerID        10230 non-null object
VideoAmt         10230 non-null int64
Description      10196 non-null object
PetID            10230 non-null object
PhotoAmt         10230 non-null int64
AdoptionSpeed 

In [18]:
#Create a new column 'MixedBreed' and insert it after "Breed2"

#Initalize an empty list
breed_count=[]

for index in dog_data.index:
    #Reset variable a
    a=0
    
    #If both defined breeds are the same, there is 1 breed
    if (dog_data['Breed1'][index] == dog_data['Breed2'][index]):
        a+= 1
    
    #If either defined breed is "0", or no breed, there is 1 breed
    elif (dog_data['Breed1'][index] == 0) or (dog_data['Breed2'][index] == 0):
        a += 1
   
    #If neither of the above, the dog is mixed breed, and so there are 2
    else:
        a += 2
    breed_count.append(a)

#Find index of Breed 2 position and add 1 to define position for new column
mixed_posn = dog_data.columns.get_loc('Breed2') + 1

#Insert calculated values as new column
dog_data.insert(mixed_posn, "BreedCount" , breed_count)

#Check counts
dog_data.head()

Unnamed: 0,index,Name,Age,Breed1,Breed2,BreedCount,Gender,Color1,Color2,Color3,...,Health,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt,AdoptionSpeed
3,3,Dannie & Kass [In Penang],12,307,0,1,2,2,5,0,...,1,2,0,41326,e59c106e9912fa30c898976278c2e834,0,Dannie and Kass are mother and daughter. We en...,e02abc8a3,5,
9,9,Precious,36,76,307,2,2,7,0,0,...,1,1,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,0,"Hi, i have a dalmamation mix female dog to giv...",a3787f15e,9,
10,10,Angel,24,307,307,1,2,5,7,0,...,1,1,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,0,found a stray female dog who follows my mum ca...,0113cedff,3,
11,11,,12,307,307,1,2,2,3,0,...,1,2,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,0,both female dogs r thrown away at d food court...,0070b950a,4,
12,12,,3,218,307,2,1,1,7,0,...,1,1,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,1,"Hi, im liew here.. Im not sure how is this don...",cbe2df167,0,


Let's go back and investigate blank adoption values. First, what value is assigned, since the values are non-null.

In [19]:
#Fix missing adoption values
print(dog_data.AdoptionSpeed.unique())

['' 3 2 1 4 0]


It looks like the source of missing adoption speeds is from the 'test' data set.  

For the purposes of exploration, let's assign make the missing values nulls.

In [20]:
dog_data.AdoptionSpeed = dog_data.AdoptionSpeed.replace('',np.nan)

Notice that there are multiple zero values in the color columns.Let's repeat the same steps done for breeds and add a zero column to the color table, which would assign null values for 0 color.  

In [21]:
#Create value for if there is no color
index_none=['ColorID','ColorName']
nocolor=pd.DataFrame([0,np.NaN],index=index_none)
nocolor=nocolor.T

#Append value if no color to color table
colors_wnone=nocolor.append(color,sort=True)
colors_wnone

Unnamed: 0,ColorID,ColorName
0,0.0,
0,1.0,Black
1,2.0,Brown
2,3.0,Golden
3,4.0,Yellow
4,5.0,Cream
5,6.0,Gray
6,7.0,White


Save the modified colors table to a new csv file for later use

In [22]:
colors_wnone.to_csv(r'tidy_data/dog_colors.csv')

Now, we're going to need some data frames to allow us to "decode" the values for the other categorical variables, MaturitySize, FurLength, Vaccinated, Dewormed, Sterilized, and Health. 

Some investigation shows the following to correspond to the numerical values: 

MaturitySize - 
    - Size at maturity 
    - 1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified
FurLength - 
    - Fur length 
    - 1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified
Vaccinated - 
    - Pet has been vaccinated 
    - 1 = Yes, 2 = No, 3 = Not Sure
Dewormed - 
    -  Pet has been dewormed
    - 1 = Yes, 2 = No, 3 = Not Sure
Sterilized - 
    - Pet has been spayed / neutered 
    - 1 = Yes, 2 = No, 3 = Not Sure
Health - Health Condition 
    - 1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified
    
Let's define data frames and save them as csv files in case we need them later. 

In [23]:
#MaturitySize - 
size= {'Size': ['Small','Medium','Large','Giant','Unknown'],
        'SizeValue': [1,2,3,4,0]}
size=pd.DataFrame.from_dict(size)
size.to_csv(r'tidy_data/size.csv')
size

Unnamed: 0,Size,SizeValue
0,Small,1
1,Medium,2
2,Large,3
3,Giant,4
4,Unknown,0


In [24]:
#Fur Length - 
fur_length= {'Length': ['Short','Medium','Long','Unknown'],
        'LengthValue': [1,2,3,0]}
fur_length=pd.DataFrame.from_dict(fur_length)
fur_length.to_csv(r'tidy_data/fur_length.csv')

fur_length

Unnamed: 0,Length,LengthValue
0,Short,1
1,Medium,2
2,Long,3
3,Unknown,0


In [25]:
#Vaccinated - 
vaccine= {'VaccineStatus': ['Vaccinated','Not Vaccinated','Unknown'],
        'VaccineValue': [1,2,0]}
vaccine=pd.DataFrame.from_dict(vaccine)
vaccine.to_csv(r'tidy_data/vaccine.csv')

vaccine

Unnamed: 0,VaccineStatus,VaccineValue
0,Vaccinated,1
1,Not Vaccinated,2
2,Unknown,0


In [26]:
#Vaccinated - 
dewormed= {'DewormStatus': ['Dewormed','Not Dewormed','Unknown'],
        'DewormValue': [1,2,0]}
dewormed=pd.DataFrame.from_dict(dewormed)
dewormed.to_csv(r'tidy_data/dewormed.csv')

dewormed

Unnamed: 0,DewormStatus,DewormValue
0,Dewormed,1
1,Not Dewormed,2
2,Unknown,0


In [27]:
#Sterilized - 
sterilized= {'FixStatus': ['Fixed','Not Fixed','Unknown'],
        'FixValue': [1,2,0]}
sterilized=pd.DataFrame.from_dict(sterilized)
sterilized.to_csv(r'tidy_data/sterilized.csv')

sterilized

Unnamed: 0,FixStatus,FixValue
0,Fixed,1
1,Not Fixed,2
2,Unknown,0


In [28]:
#Health- 
health= {'HealthStatus': ['Healthy','Minor Injury','Serious Injury','Unknown'],
        'HealthValue': [1,2,3,0]}
health=pd.DataFrame.from_dict(health)
health.to_csv(r'tidy_data/health.csv')

health

Unnamed: 0,HealthStatus,HealthValue
0,Healthy,1
1,Minor Injury,2
2,Serious Injury,3
3,Unknown,0


The data looks nicely tidied! 

Now let's save the tidied up dog information to its own csv file for use later. 

In [29]:
dog_data.to_csv(r'tidy_data/dog_data.csv',index=False)

Let's now import the dataset specific to the shelter.

The data was pulled via the API and the steps performed can be seen in the the notebook found here: [Github Link](https://github.com/CJEJansson/Springboard_Projects/blob/master/Capstone%201/Data_Wrangling/BigBones%20API%20Pull.ipynb)

In [30]:
shelter_dogs=pd.read_csv('BigBonesDoggos.csv')
shelter_dogs.head()

Unnamed: 0.1,Unnamed: 0,organization_id,animal_id,animal_type,age,attributes.declawed,attributes.house_trained,attributes.shots_current,attributes.spayed_neutered,attributes.special_needs,...,organization_id.1,photos,published_at,size,species,status,status_changed_at,tags,type,url
0,0,co441,46224186,dog,Baby,,True,True,True,False,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:32:46+0000,Small,Dog,adoptable,2019-10-09T17:32:46+0000,[],Dog,https://www.petfinder.com/dog/tiko-46224186/co...
1,1,co441,46224166,dog,Adult,,True,True,True,False,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:31:22+0000,Large,Dog,adoptable,2019-10-09T17:31:22+0000,[],Dog,https://www.petfinder.com/dog/brody-46224166/c...
2,2,co441,46224128,dog,Adult,,True,True,True,False,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:30:14+0000,Large,Dog,adoptable,2019-10-09T17:30:14+0000,[],Dog,https://www.petfinder.com/dog/cooper-46224128/...
3,3,co441,46224152,dog,Adult,,True,True,True,False,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:29:33+0000,Large,Dog,adoptable,2019-10-09T17:29:33+0000,[],Dog,https://www.petfinder.com/dog/emmy-46224152/co...
4,4,co441,46201468,dog,Adult,,True,True,True,False,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-07T15:21:31+0000,Medium,Dog,adoptable,2019-10-07T15:21:31+0000,[],Dog,https://www.petfinder.com/dog/ittie-bit-462014...


In [31]:
#Remove empty columns. Replace later if data is obtained. 
#drops=[]
shelter_dogs=shelter_dogs.drop(['organization_id','attributes.declawed', 'coat', \
       'colors.primary','colors.secondary', 'colors.tertiary', 'contact.address.address1', \
       'contact.address.address2'],axis=1)
shelter_dogs.head()

Unnamed: 0.1,Unnamed: 0,animal_id,animal_type,age,attributes.house_trained,attributes.shots_current,attributes.spayed_neutered,attributes.special_needs,breeds.mixed,breeds.primary,...,organization_id.1,photos,published_at,size,species,status,status_changed_at,tags,type,url
0,0,46224186,dog,Baby,True,True,True,False,True,Chihuahua,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:32:46+0000,Small,Dog,adoptable,2019-10-09T17:32:46+0000,[],Dog,https://www.petfinder.com/dog/tiko-46224186/co...
1,1,46224166,dog,Adult,True,True,True,False,True,English Bulldog,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:31:22+0000,Large,Dog,adoptable,2019-10-09T17:31:22+0000,[],Dog,https://www.petfinder.com/dog/brody-46224166/c...
2,2,46224128,dog,Adult,True,True,True,False,True,Shepherd,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:30:14+0000,Large,Dog,adoptable,2019-10-09T17:30:14+0000,[],Dog,https://www.petfinder.com/dog/cooper-46224128/...
3,3,46224152,dog,Adult,True,True,True,False,True,English Bulldog,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-09T17:29:33+0000,Large,Dog,adoptable,2019-10-09T17:29:33+0000,[],Dog,https://www.petfinder.com/dog/emmy-46224152/co...
4,4,46201468,dog,Adult,True,True,True,False,True,Beagle,...,CO441,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,2019-10-07T15:21:31+0000,Medium,Dog,adoptable,2019-10-07T15:21:31+0000,[],Dog,https://www.petfinder.com/dog/ittie-bit-462014...


Doesn't look like the data will align very nicely. May be easier to contact the shelter and see if they track their data directly. Even if we're unable to study their data set, we can make recommendations based on what we find studying the petfinder data. 