# FetchMaker

In [1]:
# Import libraries
import numpy as np
import pandas as pd

In [2]:
# Import data
dogs = pd.read_csv('dog_data.csv')

# Inspect first few rows of data
print(dogs.head())

   is_rescue  weight  tail_length  age  color  likes_children  \
0          0       6         2.25    2  black               1   
1          0       4         5.36    4  black               0   
2          0       7         3.63    3  black               0   
3          0       5         0.19    2  black               0   
4          0       5         0.37    1  black               1   

   is_hypoallergenic      name      breed  
0                  0      Huey  chihuahua  
1                  0   Cherish  chihuahua  
2                  1     Becka  chihuahua  
3                  0     Addie  chihuahua  
4                  1  Beverlee  chihuahua  


FetchMaker estimates (based on historical data for all dogs) that 8% of dogs in their system are rescues.  
They would like to know if whippets are significantly more or less likely than other dogs to be a rescue.

In [3]:
# Save the is_rescue column for whippets
whippet_rescue = dogs.is_rescue[dogs.breed == 'whippet']

In [4]:
# Calculate and print the number of whippet rescues
num_whippet_rescues = np.sum(whippet_rescue == 1)
print(num_whippet_rescues)

6


In [5]:
# Calculate and print the number of whippets
num_whippets = len(whippet_rescue)
print(num_whippets)

100


Use a hypothesis test to test the following null and alternative hypotheses:

Null: 8% of whippets are rescues  
Alternative: more or less than 8% of whippets are rescues

In [6]:
# Run a binomial test 
from scipy.stats import binom_test
pval = binom_test(num_whippet_rescues, num_whippets, .08)
print(pval)


0.5811780106238105


**Null: 8% of whippets are rescues**

Three of FetchMaker’s most popular mid-sized dog breeds are 'whippet's, 'terrier's, and 'pitbull's. Is there a significant difference in the average weights of these three dog breeds?

In [7]:
# Save the weights of whippets, terriers, and pitbulls
wt_whippets = dogs.weight[dogs.breed == 'whippet']
wt_terriers = dogs.weight[dogs.breed == 'terrier']
wt_pitbulls = dogs.weight[dogs.breed == 'pitbull']

In [8]:
# Run an ANOVA 
from scipy.stats import f_oneway
Fstat, pval = f_oneway(wt_whippets, wt_terriers, wt_pitbulls)
print(pval)

3.276415588274815e-17


Null: whippets, terriers, and pitbulls all weigh the same amount on average  
**Alternative: whippets, terriers, and pitbulls do not all weigh the same amount on average (at least one pair of breeds has differing average weights)**

Run another hypothesis test to determine which of those breeds (whippets, terriers, and pitbulls) weigh different amounts on average.

In [9]:
# Subset to just whippets, terriers, and pitbulls
dogs_wtp = dogs[dogs.breed.isin(['whippet', 'terrier', 'pitbull'])]

In [10]:
# Run Tukey's Range Test
from statsmodels.stats.multicomp import pairwise_tukeyhsd
output = pairwise_tukeyhsd(dogs_wtp.weight, dogs_wtp.breed)
print(output)

  Multiple Comparison of Means - Tukey HSD, FWER=0.05  
 group1  group2 meandiff p-adj   lower    upper  reject
-------------------------------------------------------
pitbull terrier   -13.24    0.0 -16.7278 -9.7522   True
pitbull whippet    -3.34 0.0638  -6.8278  0.1478  False
terrier whippet      9.9    0.0   6.4122 13.3878   True
-------------------------------------------------------


FetchMaker wants to know if 'poodle's and 'shihtzu's come in different colors.

In [11]:
# Subset to just poodles and shihtzus
dogs_ps = dogs[dogs.breed.isin(['poodle', 'shihtzu'])]

In [12]:
# Create a contingency table of color vs. breed
Xtab = pd.crosstab(dogs_ps.color, dogs_ps.breed)
print(Xtab)

breed  poodle  shihtzu
color                 
black      17       10
brown      13       36
gold        8        6
grey       52       41
white      10        7


Run a hypothesis test for the following null and alternative hypotheses:

Null: There is an association between breed (poodle vs. shihtzu) and color.  
Alternative: There is not an association between breed (poodle vs. shihtzu) and color.

In [13]:
# Run a Chi-Square Test
from scipy.stats import chi2_contingency
chi2, pval, dof, exp = chi2_contingency(Xtab)
print(pval)

0.005302408293244593


**Alternative: There is not an association between breed (poodle vs. shihtzu) and color.**