#### Question 2: Given the following confusion matrix, evaluate (by hand) the model's performance.

|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |


- In the context of this problem, what is a false positive?
- In the context of this problem, what is a false negative?
- How would you describe this model?

##### Answer:

- Positive: Dog
- Negative: Cat

A False positive is predicting a dog when it is actually not a dog (ie. cat).
A False negative is predicting not a dog (ie. cat) when it's actually a dog.

I would describe this model as having a higher FP rate (22%) than FN rate (17%).  This model has lower precision than sensitivity.

### Question 3: 

You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here.

Use the predictions dataset and pandas to help answer the following questions:

An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here?

##### Answer: 
- In this scenario a Positive result would be one in which a defect is detected.  A Negative result would be one in which a defect is not detected.
- A False Positive would be detecting a defect when there is none.
- A False Negative would be not detecting a defect when there is one.
- A True Positive would be detecting a defect when there is actually one.
- A True Negative would be not detecting a defect when there is not actually one.

##### Because of this and the team's desire to identify as many of the defect ducks as possible, the most appropriate approach would be to have more False Positives than False Negatives.  That is to say using an evaluation of higher recall would be best to use in this case.

In [2]:
import pandas as pd
import numpy as np

c3df = pd.read_csv("c3.csv")
c3df.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect


In [56]:
subset = c3df[c3df.actual == 'Defect']
c3df['baseline_prediction'] = 'No Defect'


model1_recall = (subset.model1 == subset.actual).mean()
model2_recall = (subset.model2 == subset.actual).mean()
model3_recall = (subset.model3 == subset.actual).mean()
baseline_recall = (subset.baseline_prediction == subset.actual).mean()


print(f'  model1 recall: {model1_recall:.2%}')
print(f'  model2 recall: {model2_recall:.2%}')
print(f'  model3 recall: {model3_recall:.2%}')
print(f'baseline recall: {baseline_recall:.2%}')

  model1 recall: 50.00%
  model2 recall: 56.25%
  model3 recall: 81.25%
baseline recall: 0.00%


Which model would be the best fit for this use case?

##### Based on the code run above it seems like the best model favoring recall would be model 3.

##### Part 2.
Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

##### Answer: A simple reading of the above scenario would leave me to believe that False Positives are not very costly for the company but False Negatives would be incredibly costly since they would give a customer a defect duck.  Like above they should use recall as the evaluation metric and therefore use model 3 to catch as many defects as possible.

### Question 4: 

You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. First an automated algorithm tags pictures as either a cat or a dog (Phase I). Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II).

Several models have already been developed with the data, and you can find their results here.

Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

In [19]:
pawsdf = pd.read_csv("gives_you_paws.csv")
pawsdf.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [26]:
pawsdf.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64

In [27]:
pawsdf['baseline_prediction'] = 'dog'

a. In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?


In [30]:
model1_accuracy = (pawsdf.model1 == pawsdf.actual).mean()
model2_accuracy = (pawsdf.model2 == pawsdf.actual).mean()
model3_accuracy = (pawsdf.model3 == pawsdf.actual).mean()
model4_accuracy = (pawsdf.model4 == pawsdf.actual).mean()
baseline_accuracy = (pawsdf.baseline_prediction == pawsdf.actual).mean()

print(f'  model1 accuracy: {model1_accuracy:.2%}')
print(f'  model2 accuracy: {model2_accuracy:.2%}')
print(f'  model3 accuracy: {model3_accuracy:.2%}')
print(f'  model4 accuracy: {model4_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')

  model1 accuracy: 80.74%
  model2 accuracy: 63.04%
  model3 accuracy: 50.96%
  model4 accuracy: 74.26%
baseline accuracy: 65.08%


##### Answer: Model 3 is considerably worse than the baseline prediction model, model 2 is almost equivalent but slightly worse than the baseline model, model 4 performs better than the baseline, and model 1 performs the best of all models.

b. Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend for Phase I? For Phase II?


##### Answer Phase I: 
Working on a team that focuses only on dogs, for Phase I my goal would be to find as many dog pictures as possible even if it means having some False Positives. Thus I would need to minimize False Negatives and I should test the models for recall.

In [39]:
subset = pawsdf[pawsdf.actual == 'dog']
pawsdf['baseline_prediction'] = 'dog'

model1_recall = (subset.model1 == subset.actual).mean()
model2_recall = (subset.model2 == subset.actual).mean()
model3_recall = (subset.model3 == subset.actual).mean()
model4_recall = (subset.model4 == subset.actual).mean()
baseline_recall = (subset.baseline_prediction == subset.actual).mean()


print(f'  model1 recall: {model1_recall:.2%}')
print(f'  model2 recall: {model2_recall:.2%}')
print(f'  model3 recall: {model3_recall:.2%}')
print(f'  model4 recall: {model4_recall:.2%}')
print(f'baseline recall: {baseline_recall:.2%}')

  model1 recall: 80.33%
  model2 recall: 49.08%
  model3 recall: 50.86%
  model4 recall: 95.57%
baseline recall: 100.00%


In [60]:
pd.crosstab(pawsdf.model4, pawsdf.actual)

actual,cat,dog
model4,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,603,144
dog,1143,3110


After testing for recall I see that outside the baseline model of predicting ALL pictures are of dogs, Model 4 vastly exceeds all other models and I should use it for Phase I testing.

##### Answer Phase II:
For Phase II I would need to prioritize eliminating False Negatives since I want to be sure about my positive predictions.  Thus I would need to focus on an evaluation metric of precision where dogs were accurately identified.

In [32]:
subset = pawsdf[pawsdf.model1 == 'dog']
model1_precision = (subset.model1 == subset.actual).mean()

subset = pawsdf[pawsdf.model2 == 'dog']
model2_precision = (subset.model2 == subset.actual).mean()

subset = pawsdf[pawsdf.model3 == 'dog']
model3_precision = (subset.model3 == subset.actual).mean()

subset = pawsdf[pawsdf.model4 == 'dog']
model4_precision = (subset.model4 == subset.actual).mean()

subset = pawsdf[pawsdf.baseline_prediction == 'dog']
baseline_precision = (subset.baseline_prediction == subset.actual).mean()

print(f'model1 precision: {model1_precision:.2%}')
print(f'model2 precision: {model2_precision:.2%}')
print(f'model3 precision: {model3_precision:.2%}')
print(f'model4 precision: {model4_precision:.2%}')
print(f'baseline precision: {baseline_precision:.2%}')

model1 precision: 89.00%
model2 precision: 89.32%
model3 precision: 65.99%
model4 precision: 73.12%
baseline precision: 65.08%


After testing for precision I see that for precision when identifying pictures of dogs, Model 2 is slightly better than Model 1.  Either of these would probabaly be acceptable.

In [61]:
pd.crosstab(pawsdf.model2, pawsdf.actual)

actual,cat,dog
model2,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1555,1657
dog,191,1597


c. Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend for Phase I? For Phase II?

##### Answer Phase I: 
Working on a team that focuses only on cats, for Phase I my goal would be to find as many cat pictures as possible even if it means having some False Positives. Thus I would need to minimize False Negatives and I should test the models for recall.

In [57]:
subset = pawsdf[pawsdf.actual == 'cat']
pawsdf['baseline_prediction'] = pawsdf.actual.value_counts().idxmax()

model1_recall = (subset.model1 == subset.actual).mean()
model2_recall = (subset.model2 == subset.actual).mean()
model3_recall = (subset.model3 == subset.actual).mean()
model4_recall = (subset.model3 == subset.actual).mean()
baseline_recall = (subset.baseline_prediction == subset.actual).mean()


print(f'  model1 recall: {model1_recall:.2%}')
print(f'  model2 recall: {model2_recall:.2%}')
print(f'  model3 recall: {model3_recall:.2%}')
print(f'  model4 recall: {model4_recall:.2%}')
print(f'baseline recall: {baseline_recall:.2%}')

  model1 recall: 81.50%
  model2 recall: 89.06%
  model3 recall: 51.15%
  model4 recall: 51.15%
baseline recall: 0.00%


After testing for recall I see that outside the baseline model of predicting ALL pictures are of cats, Model 2 is the best performing model I could use for Phase I when trying to identify as many cats as possible.

##### Answer Phase II:
For Phase II I would need to prioritize eliminating False Negatives since I want to be sure about my positive predictions.  Thus I would need to focus on an evaluation metric of precision where cats were accurately identified.

In [36]:
subset = pawsdf[pawsdf.model1 == 'cat']
model1_precision = (subset.model1 == subset.actual).mean()

subset = pawsdf[pawsdf.model2 == 'cat']
model2_precision = (subset.model2 == subset.actual).mean()

subset = pawsdf[pawsdf.model3 == 'cat']
model3_precision = (subset.model3 == subset.actual).mean()

subset = pawsdf[pawsdf.model4 == 'cat']
model4_precision = (subset.model4 == subset.actual).mean()

subset = pawsdf[pawsdf.baseline_prediction == 'cat']
baseline_precision = (subset.baseline_prediction == subset.actual).mean()

print(f'model1 precision: {model1_precision:.2%}')
print(f'model2 precision: {model2_precision:.2%}')
print(f'model3 precision: {model3_precision:.2%}')
print(f'model4 precision: {model4_precision:.2%}')
print(f'baseline precision: {baseline_precision:.2%}')

model1 precision: 68.98%
model2 precision: 48.41%
model3 precision: 35.83%
model4 precision: 80.72%
baseline precision: 34.92%


After testing for precision I see that for precision when identifying pictures of cats, Model 4 is significantly better than all other models.

In [63]:
# Example from Ryan's explanation showing how to use sklearn for classification_report
# When the row says cat, it's when cat is the positive case, when it says dog, it's when dog is the positive case.

from sklearn.metrics import classification_report

x = classification_report(pawsdf.actual, pawsdf.model1,
                          labels = ['cat', 'dog'],
                          output_dict=True)
pd.DataFrame(x).T

Unnamed: 0,precision,recall,f1-score,support
cat,0.689772,0.815006,0.747178,1746.0
dog,0.890024,0.803319,0.844452,3254.0
accuracy,0.8074,0.8074,0.8074,0.8074
macro avg,0.789898,0.809162,0.795815,5000.0
weighted avg,0.820096,0.8074,0.810484,5000.0


In [64]:
pawsdf.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64