# Introduction:

The intent of this EDA is to investigate the impact of each recorded characteristic on the overall adoption speed. Specifically, the goal is to explore the relationships between each variable and adoption speed. 

The PetFinder Malaysia dataset investigation will hopefully yield insights that can be applied to the pet catalog of a local shelter to increase pet adoptions overall by adjusting online pet profiles. 

The datasets utilized in this notebook were cleaned and generated in the data wrangling portion of this project, details of which can be found here: [Github Link: Data_Wrangling.ipynb](https://github.com/CJEJansson/Springboard_Projects/blob/master/Capstone%201/Data_Wrangling/Data_Wrangling.ipynb)

# The Data:

In [2]:
#Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
#Import Data Sets

#File Names
adoptfile  = 'tidy_data/adoption_speed.csv'
colorfile  = 'tidy_data/color_labels.csv'
wormfile   = 'tidy_data/dewormed.csv'
breedfile  = 'tidy_data/dog_breeds.csv'
datafile   = 'tidy_data/dog_data.csv'
furfile    = 'tidy_data/fur_length.csv'
healthfile = 'tidy_data/health.csv'
sizefile   = 'tidy_data/size.csv'
statefile  = 'tidy_data/state_labels.csv'
fixedfile  = 'tidy_data/sterilized.csv'
vacfile    = 'tidy_data/vaccine.csv'

#Import Files
adptspeed  = pd.read_csv(adoptfile)
color      = pd.read_csv(colorfile)
dewormed   = pd.read_csv(wormfile)
breeds     = pd.read_csv(breedfile)
dog_data   = pd.read_csv(datafile)
fur_length = pd.read_csv(furfile)
health     = pd.read_csv(healthfile)
adult_size = pd.read_csv(sizefile)
state=pd.read_csv(statefile)
fixed = pd.read_csv(fixedfile)
vaccinated = pd.read_csv(vacfile)

#### The data set variables are defined in the column variables of the data set, which can be found via the code below:

In [4]:
dog_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10230 entries, 0 to 10229
Data columns (total 25 columns):
index            10230 non-null int64
Name             7238 non-null object
Age              10230 non-null int64
Breed1           10230 non-null int64
Breed2           10230 non-null int64
BreedCount       10230 non-null int64
Gender           10230 non-null int64
Color1           10230 non-null int64
Color2           10230 non-null int64
Color3           10230 non-null int64
MaturitySize     10230 non-null int64
FurLength        10230 non-null int64
Vaccinated       10230 non-null int64
Dewormed         10230 non-null int64
Sterilized       10230 non-null int64
Health           10230 non-null int64
Quantity         10230 non-null int64
Fee              10230 non-null int64
State            10230 non-null int64
RescuerID        10230 non-null object
VideoAmt         10230 non-null int64
Description      10196 non-null object
PetID            10230 non-null object
PhotoAmt      

In [5]:
dog_data.columns

Index(['index', 'Name', 'Age', 'Breed1', 'Breed2', 'BreedCount', 'Gender',
       'Color1', 'Color2', 'Color3', 'MaturitySize', 'FurLength', 'Vaccinated',
       'Dewormed', 'Sterilized', 'Health', 'Quantity', 'Fee', 'State',
       'RescuerID', 'VideoAmt', 'Description', 'PetID', 'PhotoAmt',
       'AdoptionSpeed'],
      dtype='object')

## The characteristics of each listing are as follows:
 
 - Animal Specific
    - Name
    - Age (in months)
    - Breed 
    - Gender (M/F or "mixed" for multipet listings)
    - Color
    - Adult size
    - Fur Length
    - Health: Vacinnations, Dewormed, Sterilized, known Injuries/Illness
    
 - Listing Specific
    - Number of pets in listing
    - Presence and number of photos/videos
    - Animal location (State)
    - Organization (Rescuer ID)
    - Adoption Fee (0=Free)
    - Description
    

# Exploratory Data Analysis:

We will begin by investigating the distribution of adoption speeds in the dataset. Once this is understood, an investigation will be performed to try and determine the impacts of the following on adoption speed: 

1.  Name
    - presence vs. absence
2.  Age
2.  Breed
    - type of breed
    - mixed vs. pure breed
3.  Gender
4.  Color
    - specific color
    - multicolor vs single color
5.  Size at maturity
6.  Fur Length
7.  General health
    - has the animal been dewormed?
    - has the animal been vaccinated?
    - has the animal been spayed/neutered?
    - does the animal have any injuries/known health problems?
8.  Number of pets per listing
9.  Adoption fee
    - cost to adopt
    - paid vs free
10. Location
11. Rescue Organization
12. Media
    - number of videos
    - presence/absence of videos
    - number of photos
    - presence/absence of photos

## Adoption Speed Distribution

There are 10,230 dogs in the provided dataset. Of those dogs, the adoption rate distribution is as follows: 

Age was not addressed during data wrangling, and so a column to address life stages of the dogs will need to be added to the data for exploratory analysis. Typical age grouping for dogs is as follows: 

- Puppy: 0-6 months
- Adolescent: 6-18 months
- Adult: 18+ months
- Senior:
  - Small Dogs: 10-12 years (120-144 months)
  - Medium Dogs: 9-11 years (108-132 months)
  - Large Dogs: 8-10 years (96 -120 months)
  - Giant Dogs: 5-7 years (60-84 months)
  
For the purposes of this data the cut offs will be defined as follows for "senior" animals

- Small: 11 years (132 months)
- Medium: 10 years (120 months)
- Large : 9 years (108 months)
- Giant: 8 years (96 months)

Let's add a column for the life stage of the animal, "lifestage". 

In [18]:
#Initalize an empty list
lf_stg = []

for index in dog_data.index:
    #Clear dummy variable
    a=''
    
    #If age <=6 months, puppy
    if dog_data['Age'][index] <= 6:
        a='puppy'
    #If age between 6-18 mo, Adolescent/young adult
    elif dog_data['Age'][index] <=18:
        a='adolescent'
    else:
        if dog_data['MaturitySize'][index] == 1:
            if dog_data['Age'][index] < 132:
                a='adult'
            else:
                a='senior'
        elif dog_data['MaturitySize'][index] == 2:
            if dog_data['Age'][index] < 120:
                a='adult'
            else:
                a='senior'
        elif dog_data['MaturitySize'][index] == 3:
            if dog_data['Age'][index] < 108:
                a='adult'
            else:
                a='senior'
        elif dog_data['MaturitySize'][index] == 4:
            if dog_data['Age'][index] < 96:
                a='adult'
            else:
                a='senior'
    
    lf_stg.append(a)

#Find index of Age position and add 1 to define position for new column
lf_stg_posn = dog_data.columns.get_loc('Age') + 1

#Insert calculated values as new column
dog_data.insert(lf_stg_posn, "LifeStage" , lf_stg)

#Check counts
dog_data.head()

Unnamed: 0,index,Name,Age,LifeStage,Breed1,Breed2,BreedCount,Gender,Color1,Color2,...,Health,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt,AdoptionSpeed
0,0,Dannie & Kass [In Penang],12,adolescent,307,0,1,2,2,5,...,1,2,0,41326,e59c106e9912fa30c898976278c2e834,0,Dannie and Kass are mother and daughter. We en...,e02abc8a3,5,
1,1,Precious,36,adult,76,307,2,2,7,0,...,1,1,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,0,"Hi, i have a dalmamation mix female dog to giv...",a3787f15e,9,
2,2,Angel,24,adult,307,307,1,2,5,7,...,1,1,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,0,found a stray female dog who follows my mum ca...,0113cedff,3,
3,3,,12,adolescent,307,307,1,2,2,3,...,1,2,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,0,both female dogs r thrown away at d food court...,0070b950a,4,
4,4,,3,puppy,218,307,2,1,1,7,...,1,1,0,41324,6f73a23fdb52bc9a30dc788fe6ccc7f6,1,"Hi, im liew here.. Im not sure how is this don...",cbe2df167,0,


In [19]:
#Export Data to CSV for later use, if needed
dog_data.to_csv(r'tidy_data/dog_data_WLStage.csv',index=False)