## Given the following confusion matrix, evaluate (by hand) the model's performance.


|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |



In [44]:
# positive prediction is dog
TP = 46
TN = 34
FP = 7
FN = 13


accuracy = (TP+TN) / (TP+TN+FP+FN)
precision = TP / (TP+FP)
recall = TP / (TP+FN)
print("The model's performance is:")
print("            accuracy:", accuracy)
print("            precision:", precision)
print("            recall:", recall)

The model's performance is:
            accuracy: 0.8
            precision: 0.8679245283018868
            recall: 0.7796610169491526


### In the context of this problem, what is a false positive?

A false positive would be predicting a dog when it was actually a cat

### In the context of this problem, what is a false negative?

A false negative would be predicting a cat (not a dog) when it was in fact a dog

# You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

### Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here.

### Use the predictions dataset and pandas to help answer the following questions:

In [64]:
import env
import os
import pandas as pd
import acquire
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
import warnings
warnings.filterwarnings("ignore")

In [55]:
df = pd.read_csv('c3.csv')
df.shape
df.actual.mode()

0    No Defect
Name: actual, dtype: object

## An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

Because we are worried about how good our model is when the actual value is positive, we would use recall

In [56]:
subset = df[df.actual == 'Defect']

recall_m1 = (subset.model1 == subset.actual).mean()
recall_m2 = (subset.model2 == subset.actual).mean()
recall_m3 = (subset.model3 == subset.actual).mean()

print('model 1 recall is:', recall_m1)
print('model 2 recall is:', recall_m2)
print('model 3 recall is:', recall_m3)


model 1 recall is: 0.5
model 2 recall is: 0.5625
model 3 recall is: 0.8125


Model 3 is the best model for this case since it has the highest recall (less false negatives)

## Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you they really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

For this scenario we are concerned about false positives, so we will use precision

In [62]:
subset1 = df[df.model1 == 'Defect']
m1prec = (subset1.model1 == subset1.actual).mean()

subset2 = df[df.model2 == 'Defect']
m2prec = (subset2.model2 == subset2.actual).mean()

subset3 = df[df.model3 == 'Defect']
m3prec = (subset3.model3 == subset3.actual).mean()

print('precision of model 1 is:',m1prec)
print('precision of model 2 is:',m2prec)
print('precision of model 3 is:',m3prec)

precision of model 1 is: 0.8
precision of model 2 is: 0.1
precision of model 3 is: 0.13131313131313133


Model 1 is the best model for this case since it has the highest precision

## You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

## At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. First an automated algorithm tags pictures as either a cat or a dog (Phase I). Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II).

Several models have already been developed with the data, and you can find their results here.

Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?
Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend for Phase I? For Phase II?
Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend for Phase I? For Phase II?

In [74]:
# read data from csv and save into dataframe
df = pd.read_csv('gives_you_paws.csv')
df.shape


(5000, 5)

In [75]:
df.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [76]:
df.actual.mode(), df.actual.value_counts()

(0    dog
 Name: actual, dtype: object,
 dog    3254
 cat    1746
 Name: actual, dtype: int64)

Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

In [77]:
# The most common class is dog, so create a series for model5 and append to our original df
model5 = pd.Series('dog', index=range(5000))
df = pd.concat([df, model5], axis=1)
df.rename(columns={0:'baseline'}, inplace=True)

In [78]:
df.head()

Unnamed: 0,actual,model1,model2,model3,model4,baseline
0,cat,cat,dog,cat,dog,dog
1,dog,dog,cat,cat,dog,dog
2,dog,cat,cat,cat,dog,dog
3,dog,dog,dog,cat,dog,dog
4,cat,cat,cat,dog,dog,dog


### In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?

In [79]:
cols = df.columns.tolist()
for col in cols[1:]:
    print('The accuracy of', col,'is', accuracy_score(df['actual'], df[col]))

The accuracy of model1 is 0.8074
The accuracy of model2 is 0.6304
The accuracy of model3 is 0.5096
The accuracy of model4 is 0.7426
The accuracy of baseline is 0.6508


Models 1 and 4 are more accurate than the baseline, models 2 and 3 are less accurate

### Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend for Phase I? For Phase II?

In [80]:
cols = df.columns.tolist()
for col in cols[1:]:
    print('When positive case is dog')
    print('The accuracy of', col,'is', accuracy_score(df.actual, df[col]))
    print('The precision of', col,'is', precision_score(df.actual, df[col], pos_label='dog'))
    print('The recall score of', col,'is', recall_score(df.actual, df[col], pos_label='dog'))
    print('--------------------------------------------------')

When positive case is dog
The accuracy of model1 is 0.8074
The precision of model1 is 0.8900238338440586
The recall score of model1 is 0.803318992009834
--------------------------------------------------
When positive case is dog
The accuracy of model2 is 0.6304
The precision of model2 is 0.8931767337807607
The recall score of model2 is 0.49078057775046097
--------------------------------------------------
When positive case is dog
The accuracy of model3 is 0.5096
The precision of model3 is 0.6598883572567783
The recall score of model3 is 0.5086047940995697
--------------------------------------------------
When positive case is dog
The accuracy of model4 is 0.7426
The precision of model4 is 0.7312485304490948
The recall score of model4 is 0.9557467732022127
--------------------------------------------------
When positive case is dog
The accuracy of baseline is 0.6508
The precision of baseline is 0.6508
The recall score of baseline is 1.0
-----------------------------------------------

CONCLUSION??????

### Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend for Phase I? For Phase II?

In [81]:
cols = df.columns.tolist()
for col in cols[1:]:
    print('When positive case is cat')
    print('The accuracy of', col,'is', accuracy_score(df.actual, df[col]))
    print('The precision of', col,'is', precision_score(df.actual, df[col], pos_label='cat'))
    print('The recall score of', col,'is', recall_score(df.actual, df[col], pos_label='cat'))
    print('--------------------------------------------------')

When positive case is cat
The accuracy of model1 is 0.8074
The precision of model1 is 0.6897721764420747
The recall score of model1 is 0.8150057273768614
--------------------------------------------------
When positive case is cat
The accuracy of model2 is 0.6304
The precision of model2 is 0.4841220423412204
The recall score of model2 is 0.8906071019473081
--------------------------------------------------
When positive case is cat
The accuracy of model3 is 0.5096
The precision of model3 is 0.358346709470305
The recall score of model3 is 0.5114547537227949
--------------------------------------------------
When positive case is cat
The accuracy of model4 is 0.7426
The precision of model4 is 0.8072289156626506
The recall score of model4 is 0.34536082474226804
--------------------------------------------------
When positive case is cat
The accuracy of baseline is 0.6508
The precision of baseline is 0.0
The recall score of baseline is 0.0
--------------------------------------------------

CONCLUSION???

### Follow the links below to read the documentation about each function, then apply those functions to the data from the previous problem.