## FetchMaker
Congratulations! You’ve just started working at the hottest new tech startup, FetchMaker. FetchMaker’s mission is to match up prospective dog owners with their perfect pet. Data on thousands of adoptable dogs are in FetchMaker’s system, and it’s your job to analyze some of that data.

If you get stuck during this project or would like to see an experienced developer work through it, click “Get Help“ to see a project walkthrough video.

### Play around with the data
1.
Let’s start by including a data interface called fetchmaker that will give you access to FetchMaker’s dog data.

Use import fetchmaker at the top of your script.py file to import the fetchmaker package.

2.
The attributes that FetchMaker keeps track of are:

weight, an integer representing how heavy a dog is in pounds
tail_length, a float representing tail length in inches
age, in years
color, a String such as "brown" or "grey"
is_rescue, a boolean 0 or 1
The fetchmaker package lets you access this data for a specific breed of dog with the following format:

fetchmaker.get_weight("poodle")

This returns a Pandas DataFrame of the weights of the poodles recorded in the system. The other methods are get_tail_length, get_color, get_age, and get_is_rescue, which all take a breed as an input.

Get the tail lengths of all of the "rottweiler"s in the system, and store it in a variable called rottweiler_tl.

3.
Print out the mean of rottweiler_tl and the standard deviation of rottweiler_tl, using np.mean and np.std.

### Data to the rescue
4.
Over the years, we have seen that we expect 8% of dogs in the FetchMaker system to be rescues. We want to know if whippets are significantly more or less likely to be a rescue.

Store the is_rescue values for "whippet"s in a variable called whippet_rescue.

5.
Use np.count_nonzero to get the number of entries in whippet_rescue that are 1. Store this number in a variable called num_whippet_rescues.

6.
Get the number of samples in the whippet set by taking the np.size of whippet_rescue. Store this in a variable called num_whippets.

7.
Use a binomial test to test the number of whippet rescues, num_whippet_rescues, against our expected percentage, 8%.

Remember to import the binomial test by using from scipy.stats import binom_test.

8.
Print out the p-value. Is your result significant?

### Size does matter
9.
Three of our most popular mid-sized dog breeds are whippets, terriers, and pitbulls. Is there a significant difference in the average weights of these three dog breeds? Perform a comparative numerical test to determine if there is a significant difference.

10.
Now, perform another test to determine which of the pairs of these dog breeds differ from each other.

### Categorical dog test
11.
We want to see if "poodle"s and "shihtzu"s have significantly different color breakdowns.

Get the poodle colors and store it in a variable called poodle_colors.

Get the shih tzu colors and store it in a variable called shihtzu_colors.

12.
You can get the number of occurrences of brown poodles by using np.count_nonzero(poodle_colors == "brown").

Use this function to build a Chi Square contingency table, called color_table, with the following structure: 

color_table = [[x, x], [x, x], [x, x], [x, x], [x, x]]

Fill in the “x” entries with the number of each poodle or shih tzu with the specified color.

13.
Feed your color_table into SciPy’s Chi Square test, save the p-value and print it out.

Is there a significant difference?

### Good learner! Have a treat!
14.
Great job!

Feel free to play around with fetchmaker more and run some hypothesis tests of your own.

The breeds you can explore are "poodle", "rottweiler", "whippet", "greyhound", "terrier", "chihuahua", "shihtzu", and "pitbull".

In [2]:
import numpy as np
import pandas as pd
import fetchmaker.ipynb
from scipy.stats import binom_test, f_oneway, chi2_contingency
from statsmodels.stats.multicomp import pairwise_tukeyhsd

#fetchmaker attributes are: weight, tail_length, age, color, is_rescue
# fetchmaker.get_weight('poodle')

rottweiler_tl = fetchmaker.get_tail_length('rottweiler')
#print(rottweiler_tl)
print('Mean: '+str(np.mean(rottweiler_tl)))
print('StDev: '+str(np.std(rottweiler_tl)))

# DATA TO THE RESCUE

whippet_rescue = fetchmaker.get_is_rescue('whippet')
num_whippet_rescues = np.count_nonzero(whippet_rescue)
num_whippet = np.size(whippet_rescue)

pval = binom_test(num_whippet_rescues, num_whippet, 0.08)
print(pval)
if pval < 0.05:
  print ('Reject Ho ! More of the whippets are rescues')
else:
  print ('Whippets are just like other dogs regarding rescues')

# SIZE DOES MATTER

whippet_weight = fetchmaker.get_weight('whippet')
terrier_weight = fetchmaker.get_weight('terrier')
pitbull_weight = fetchmaker.get_weight('pitbull')

pval = f_oneway(whippet_weight, terrier_weight, pitbull_weight)
if pval < 0.05:
  print ('At least one bread has a different weight')
else:
  print ('We can\'t conclude that any breed has a different weight')

dog_weights = np.concatenate([whippet_weight, terrier_weight, pitbull_weight])
dog_labels = ['whippet']*len(whippet_weight)+ ['terrier']* len(terrier_weight)+ ['pitbull'] * len(pitbull_weight)
test = 0.05
tukey_dogs = pairwise_tukeyhsd(dog_weights, dog_labels, test)
print(tukey_dogs)

# CATEGORICAL DOG TEST

poodle_colors = fetchmaker.get_color('poodle').reset_index()
shihtzu_colors = fetchmaker.get_color('shihtzu').reset_index()

poodle_groups = poodle_colors.groupby('color').index.count()
#poodle_groups = poodle_groups.reset_index()
#print(poodle_groups)

shihtzu_groups = shihtzu_colors.groupby('color').index.count()
#print(shihtzu_groups)

color_table = []
for i in range(len(poodle_groups)):
  color_table.append([poodle_groups[i],shihtzu_groups[i]])
print(color_table)

pval_chi=chi2_contingency(color_table)[1]
print(pval_chi)

ModuleNotFoundError: No module named 'fetchmaker'