In [104]:
import pandas as pd
import sklearn as sk
from sklearn.metrics import recall_score


#### Given the following confusion matrix, evaluate (by hand) the model's performance.


|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |


In [105]:
true_positives = 46
true_negatives = 34
false_positives = 13
false_negatives = 7


$$ \text{Accuracy} = \frac{(TP + TN)} {(TP + TN + FP + FN)} $$  
How often did we correctly guess


In [106]:
correct = true_positives + true_negatives
total = true_positives + true_negatives + false_positives + false_negatives
accuracy = (correct)/(total)
accuracy

0.8

1. Out of all positive predictions how many are correct. (precision)
2. Out of all actual postive cases, how many are predicted correctly (recall)

$$ \text{Precision} = \frac{TP} {(TP + FP)} $$  
> How likely are we to predict positive predictions correctly  
> "Hey thing don't waste my time, but what you actually return for me.. let that actually be good." - Cassie Kozyrkov

In [107]:
predicted_postives = true_positives + false_positives
precision = true_positives / predicted_postives
precision

0.7796610169491526

$$ \text{Recall or Sensitivity} = \frac{ TP}{ (TP + FN)} $$
> How often did we predict actual postive cases correctly  
> "You don't mind getting the duds in your bag. You want to make sure you aren't missing the diamonds." - Cassie Kozyrkov

In [108]:
actual_postives = true_positives + false_negatives
recall = true_positives / actual_postives
recall

0.8679245283018868

**In the context of this problem, what is a false positive?**
- Predicted dog, but it was cat

**In the context of this problem, what is a false negative?**
 - Predicted cat, but it was dog

**How would you describe this model?**
 - Sensitive

You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here. (https://ds.codeup.com/data/c3.csv)

Use the predictions dataset and pandas to help answer the following questions:


In [109]:
df = pd.read_csv('https://ds.codeup.com/data/c3.csv')
df.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect



An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here?  


> Recall

Which model would be the best fit for this use case?


In [110]:
m1 = pd.crosstab(df.actual, df.model1)
m2 = pd.crosstab(df.actual, df.model2)
m3 = pd.crosstab(df.actual, df.model3)
m3

model3,Defect,No Defect
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,13,3
No Defect,86,98


In [111]:
AP = m1.loc['Defect'].sum()
TP = m1.loc['Defect']['Defect']
m1Recall = TP/AP

AP = m2.loc['Defect'].sum()
TP = m2.loc['Defect']['Defect']
m2Recall = TP/AP

AP = m3.loc['Defect'].sum()
TP = m3.loc['Defect']['Defect']
m3Recall = TP/AP

m1Recall, m2Recall, m3Recall

(0.5, 0.5625, 0.8125)

In [112]:
subset = df[df['actual'] == 'Defect']
(subset.actual == subset.model1).mean()

0.5

> Model 3 is best fit for this use-case

In [113]:
# Using SKlearn
models = list(df.columns[1:])
for model in models:
    rs = recall_score(df.actual == 'Defect', df[model] == 'Defect')
    print(rs)

0.5
0.5625
0.8125



Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect.  
Which evaluation metric would be appropriate here?  


> Accuracy

Which model would be the best fit for this use case?

In [114]:
def get_accuracy(cm, pos, neg):
    TP = cm.loc[pos][pos]
    TN = cm.loc[neg][neg]
    ALL = cm.sum().sum()
    return (TP + TN)/ALL

get_accuracy(m1, 'Defect','No Defect'), \
get_accuracy(m2, 'Defect','No Defect'), \
get_accuracy(m3, 'Defect','No Defect')

(0.95, 0.56, 0.555)

> Model 1

In [115]:
# Using sklearn
pos = 'Defect'
for model in models:
    print(model, sk.metrics.accuracy_score(df.actual == pos, df[model] == pos) )

model1 0.95
model2 0.56
model3 0.555


In [116]:
df.drop(columns='actual')
df.actual.name

'actual'

### Ex 4

You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. First an automated algorithm tags pictures as either a cat or a dog (Phase I). Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II).

Several models have already been developed with the data, and you can find their results here. (https://ds.codeup.com/data/gives_you_paws.csv)



Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:


In [117]:
df = pd.read_csv('https://ds.codeup.com/data/gives_you_paws.csv')
df.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [129]:
df.actual.mode().tolist()

['dog']

In [118]:
df['model_baseline'] = 'dog'
df.head()

Unnamed: 0,actual,model1,model2,model3,model4,model_baseline
0,cat,cat,dog,cat,dog,dog
1,dog,dog,cat,cat,dog,dog
2,dog,cat,cat,cat,dog,dog
3,dog,dog,dog,cat,dog,dog
4,cat,cat,cat,dog,dog,dog



In terms of accuracy, how do the various models compare to the baseline model?  
Are any of the models better than the baseline?


In [119]:
models = df.drop(columns='actual').columns.values.tolist()
models

['model1', 'model2', 'model3', 'model4', 'model_baseline']

In [120]:
pos = 'dog'
neg = 'cat'
for model in models:
    cm = sk.metrics.confusion_matrix(df['actual'], df[model], labels= [pos, neg])
    ct = pd.crosstab(df['actual'], df[model])
    tp = cm[0][0]
    fn = cm[0][1]
    fp = cm[1][0]
    tn = cm[1][1]
    accuracy = (tp + tn) / cm.sum()
    print(model, accuracy)
    print(tp, tn, fp, fn)
    print(cm)
# type(cm)

model1 0.8074
2614 1423 323 640
[[2614  640]
 [ 323 1423]]
model2 0.6304
1597 1555 191 1657
[[1597 1657]
 [ 191 1555]]
model3 0.5096
1655 893 853 1599
[[1655 1599]
 [ 853  893]]
model4 0.7426
3110 603 1143 144
[[3110  144]
 [1143  603]]
model_baseline 0.6508
3254 0 1746 0
[[3254    0]
 [1746    0]]


In [121]:
pd.crosstab(df.actual, df.model1)

model1,cat,dog
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1423,323
dog,640,2614


> all of the models rudely have the same accuracy


Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend for Phase I? For Phase II?



Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend for Phase I? For Phase II?

Sklearn 4

In [122]:
pos = 'dog'
neg = 'cat'
for model in models:
    accuracy = sk.metrics.accuracy_score(df.actual, df[model])
    print(model, 'acc:', accuracy)
    precision = sk.metrics.precision_score(df.actual == pos, df[model] == pos)
    print(model, 'prec:', precision)
    recall = sk.metrics.recall_score(df.actual == pos, df[model] == pos)
    print(model, 'sens:', recall)

model1 acc: 0.8074
model1 prec: 0.8900238338440586
model1 sens: 0.803318992009834
model2 acc: 0.6304
model2 prec: 0.8931767337807607
model2 sens: 0.49078057775046097
model3 acc: 0.5096
model3 prec: 0.6598883572567783
model3 sens: 0.5086047940995697
model4 acc: 0.7426
model4 prec: 0.7312485304490948
model4 sens: 0.9557467732022127
model_baseline acc: 0.6508
model_baseline prec: 0.6508
model_baseline sens: 1.0


In [132]:
x = sk.metrics.classification_report(df.actual, df.model1, 
                                labels=['dog','cat'],
                                output_dict= True )

pd.DataFrame(x).T

Unnamed: 0,precision,recall,f1-score,support
dog,0.890024,0.803319,0.844452,3254.0
cat,0.689772,0.815006,0.747178,1746.0
accuracy,0.8074,0.8074,0.8074,0.8074
macro avg,0.789898,0.809162,0.795815,5000.0
weighted avg,0.820096,0.8074,0.810484,5000.0


<ul>
<li><a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html"><code>sklearn.metrics.accuracy_score</code></a></li>
<li><a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html"><code>sklearn.metrics.precision_score</code></a></li>
<li><a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html"><code>sklearn.metrics.recall_score</code></a></li>
<li><a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html"><code>sklearn.metrics.classification_report</code></a></li>
</ul>