# (Messy) Petfinder Analysis

In [2]:
import pandas as pd

## Project Details
* **Problem**: I plan to look at a snapshot of adoptable dogs across the United States (as posted on [Petfinder.com](https://www.petfinder.com)) to determine the predominant reported breeds of dogs within the shelter and rescue systems. In particular, I want to look at this distribution state-by-state.
* **Questions**:
    - For each state, what is the most populous reported breed found in shelters?
    - What is the current average length of stay (LOS) for each reported breed?
    - For each US geographical region, what are the top 10 reported breeds?
* **Justification**: Examining which reported dog breeds are most often ending up homeless and offered for adoption in local shelters/rescues may help to answer questions about breed popularity and dog ownership culture in the United States (and whether there are any region-specific trends). It may also help to answer questions about how meaningful a breed identity actually is in the context of animal rescue. [This professional study](https://pubmed.ncbi.nlm.nih.gov/27008213/) determined that breed labels in dogs can influence their perceived adoptability and LOS. Hopefully, this analysis will provide some insight into how dog breed identity in shelter/rescue systems differs by region. The results could perhaps lead to further study or reinforce preexisting conclusions.
* **Datasets**: [allDogDescriptions.csv](https://github.com/the-pudding/data/tree/master/dog-shelters), a dataset of all adoptable dogs from petfinder.com on September 20th, 2019.
* **Ethical Concerns/Considerations**:
    - The results may cause people to draw false conclusions about why certain breeds often end up in shelters and rescues.
    - The results risk reinforcing biases associated with various dog breeds.
    - The results may influence rescue/shelter intake depending on the dog's perceived adoptability.

The first task is to simply read in the data.

In [3]:
df = pd.read_csv("data/allDogDescriptions.csv")
df.head()

Unnamed: 0,id,org_id,url,type.x,species,breed_primary,breed_secondary,breed_mixed,breed_unknown,color_primary,...,status,posted,contact_city,contact_state,contact_zip,contact_country,stateQ,accessed,type.y,description
0,46042150,NV163,https://www.petfinder.com/dog/harley-46042150/...,Dog,Dog,American Staffordshire Terrier,Mixed Breed,True,False,White / Cream,...,adoptable,2019-09-20T16:37:59+0000,Las Vegas,NV,89147,US,89009,2019-09-20,Dog,Harley is not sure how he wound up at shelter ...
1,46042002,NV163,https://www.petfinder.com/dog/biggie-46042002/...,Dog,Dog,Pit Bull Terrier,Mixed Breed,True,False,Brown / Chocolate,...,adoptable,2019-09-20T16:24:57+0000,Las Vegas,NV,89147,US,89009,2019-09-20,Dog,6 year old Biggie has lost his home and really...
2,46040898,NV99,https://www.petfinder.com/dog/ziggy-46040898/n...,Dog,Dog,Shepherd,,False,False,Brindle,...,adoptable,2019-09-20T14:10:11+0000,Mesquite,NV,89027,US,89009,2019-09-20,Dog,Approx 2 years old.\n Did I catch your eye? I ...
3,46039877,NV202,https://www.petfinder.com/dog/gypsy-46039877/n...,Dog,Dog,German Shepherd Dog,,False,False,,...,adoptable,2019-09-20T10:08:22+0000,Pahrump,NV,89048,US,89009,2019-09-20,Dog,
4,46039306,NV184,https://www.petfinder.com/dog/theo-46039306/nv...,Dog,Dog,Dachshund,,False,False,,...,adoptable,2019-09-20T06:48:30+0000,Henderson,NV,89052,US,89009,2019-09-20,Dog,Theo is a friendly dachshund mix who gets alon...


I start off by getting the overall breed frequencies:

In [18]:
print("OVERALL FREQUENCIES FOR PRIMARY LISTED BREED:")
print(df['breed_primary'].value_counts())
print("\n\nOVERALL FREQUENCIES FOR SECONDARY LISTED BREED:")
print(df['breed_secondary'].value_counts())
uniq_breeds = pd.concat([df['breed_primary'], df['breed_secondary']]).drop_duplicates()
print("\n\nNUMBER OF UNIQUE DOG BREEDS: ", uniq_breeds.size)

OVERALL FREQUENCIES FOR PRIMARY LISTED BREED:
Pit Bull Terrier                7890
Labrador Retriever              7198
Chihuahua                       3766
Mixed Breed                     3242
Terrier                         2641
                                ... 
Bouvier des Flandres               1
Belgian Shepherd / Laekenois       1
Kai Dog                            1
Skye Terrier                       1
Field Spaniel                      1
Name: breed_primary, Length: 216, dtype: int64


OVERALL FREQUENCIES FOR SECONDARY LISTED BREED:
Mixed Breed                    4348
Labrador Retriever             2194
Pit Bull Terrier               1365
Terrier                        1195
Hound                          1143
                               ... 
Wirehaired Pointing Griffon       1
Afghan Hound                      1
English Foxhound                  1
Standard Schnauzer                1
Beauceron                         1
Name: breed_secondary, Length: 190, dtype: int64


NUM

The most common dog listed as the primary breed is the Pit Bull Terrier, closely followed by the Labrador Retriever and then by the Chihuahua. The most common secondary breed is simply "Mixed Breed", followed again by the lab and the pit bull. In total, there are 223 unique breeds posted on Petfinder.

**Question**: For each state, what is the most common breed found in shelters?

In [48]:
# some values in the original data have seemingly been shifted, which is why some "states" are actually zip codes
# I'm hoping to fix this at some point
# would also like to limit each listing to the top 10
"""pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)"""
df.groupby('contact_state')['breed_primary'].value_counts()

contact_state  breed_primary                           
12220          Basset Hound                                  1
               Beagle                                        1
               Mixed Breed                                   1
12477          American Bulldog                              1
               Mixed Breed                                   1
17325          Alaskan Malamute                              2
19053          Pit Bull Terrier                              1
19063          Pit Bull Terrier                              1
20136          Maltese                                       1
20905          Fox Terrier                                   1
23112          Labrador Retriever                            1
24588          Hound                                         1
37189          Yellow Labrador Retriever                     1
38506          American Staffordshire Terrier                1
45061          Australian Shepherd                           1