In [1]:
import pandas as pd
import numpy as np

### Given the following confusion matrix, evaluate (by hand) the model's performance.


|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |

> cat = positive class

> dog = negative class

### In the context of this problem, what is a false positive?

> False positive: we predicted a cat, but its a dog

### In the context of this problem, what is a false negative?

> False negative: We predicted a dog, but its a cat

### How would you describe this model?

In [2]:
#true positive is predicting its a cat, and its a cat
tp = 34

#true negative is predicting its a dog, and its a dog
tn = 46

#false positive is predicting its a cat, but its a dog
fp = 7

#false negative is predicting its a dog, but its a cat
fn = 13

In [3]:
print("Cat-classifier (where 'cat' is the positive prediction)")

print("True Positives", tp)
print("False Positives", fp)
print("False Negatives", fn)
print("True Negatives", tn)

print("-------------")

accuracy = (tp + tn) / (tp + tn + fp + fn)

precision = tp / (tp + fp)

recall = tp / (tp + fn)

print("Accuracy is", accuracy)
print("Recall is", round(recall,2))
print("Precision is", round(precision,2))

Cat-classifier (where 'cat' is the positive prediction)
True Positives 34
False Positives 7
False Negatives 13
True Negatives 46
-------------
Accuracy is 0.8
Recall is 0.72
Precision is 0.83


In [4]:
#recall is about real positives. 
#precision is about predictive positives. 

## You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant. Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here. Use the predictions dataset and pandas to help answer the following questions:

### An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible.

In [7]:
df = pd.read_csv('~/Downloads/c3.csv')

In [8]:
df.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   actual  200 non-null    object
 1   model1  200 non-null    object
 2   model2  200 non-null    object
 3   model3  200 non-null    object
dtypes: object(4)
memory usage: 6.4+ KB


In [10]:
#look at values
df.actual.value_counts()

No Defect    184
Defect        16
Name: actual, dtype: int64

## Which evaluation metric would be appropriate here? 

Since we are interested in 'defects', we will asssign it as 'positive class' for the classifier.
- defects = positive class

Quality Control, our internal customer, wants the metric to identify as many defective ducks as possible

Our best metric for Quality Control here is recall 
- i.e how many real positives do we have?
- i.e how many of defective ducks are actually flagged by defective (positive) by the models?
- i.e lets minimize our false negatives

In [None]:
# recall is about real positives
# recall = tp / (tp + fn)

# false negative is when we say it's not defective, but it is 

In [14]:
# Model positives
subset = df [df.actual == 'Defect']
subset

Unnamed: 0,actual,model1,model2,model3
13,Defect,No Defect,Defect,Defect
30,Defect,Defect,No Defect,Defect
65,Defect,Defect,Defect,Defect
70,Defect,Defect,Defect,Defect
74,Defect,No Defect,No Defect,Defect
87,Defect,No Defect,Defect,Defect
118,Defect,No Defect,Defect,No Defect
135,Defect,Defect,No Defect,Defect
140,Defect,No Defect,Defect,Defect
147,Defect,Defect,No Defect,Defect


### Which model would be the best fit for this use case?

In [19]:
(subset.actual == subset.model1).mean()

0.5

In [20]:
#Model 1 recall
model_recall = (subset.actual == subset.model1).mean()

print("Model 1")
print(f"Model recall: {model_recall:.2%}")

Model 1
Model recall: 50.00%


In [21]:
# Model 2 recall
model_recall = (subset.actual == subset.model2).mean()

print("Model 2")
print(f"Model recall: {model_recall:.2%}")

Model 2
Model recall: 56.25%


In [22]:
# Model 3 recall
model_recall = (subset.actual == subset.model3).mean()

print("Model 3")
print(f"Model recall: {model_recall:.2%}")

Model 3
Model recall: 81.25%


Takeaways:

- Quality Control should select a model with higher recall (to avoid false negatives)
- Quality Control should use Model 3


## Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you they really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. 

### Which evaluation metric would be appropriate here? 

> positive case = defect

In [23]:
# false positive is when we say its defective, but its not

In [24]:
# false negative is when we say not defective, but it is 

In [25]:
# precision = tp / (tp + fp)

PR team really wants to minimize the False positives - meaning choose model with highest precision.

### Which model would be the best fit for this use case?

In [26]:
#precision is about positive predictions 

In [29]:
# choose subset of model1 where we only select 'positive predictions'
subset = df [df.model1 == 'Defect']
subset

Unnamed: 0,actual,model1,model2,model3
3,No Defect,Defect,Defect,Defect
30,Defect,Defect,No Defect,Defect
62,No Defect,Defect,No Defect,No Defect
65,Defect,Defect,Defect,Defect
70,Defect,Defect,Defect,Defect
135,Defect,Defect,No Defect,Defect
147,Defect,Defect,No Defect,Defect
163,Defect,Defect,Defect,Defect
194,Defect,Defect,No Defect,Defect
196,Defect,Defect,No Defect,No Defect


In [32]:
# calculate precision
model_precision = (subset.actual == subset.model1).mean()

print("Model 1")
print(f"Model precision: {model_precision:.2%}")

Model 1
Model precision: 80.00%


In [34]:
# choose subset for model2 where we only select 'positive predictions'
subset = df [df.model2 == 'Defect']

# calculate precision
model_precision = (subset.actual == subset.model2).mean()

print("Model 2")
print(f"Model precision: {model_precision:.2%}")

Model 2
Model precision: 10.00%


In [35]:
# choose subset for model3 where we only select 'positive predictions'
subset = df [df.model3 == 'Defect']

# calculate precision
model_precision = (subset.actual == subset.model3).mean()

print("Model 3")
print(f"Model precision: {model_precision:.2%}")

Model 3
Model precision: 13.13%


Takeaway for Marketing:

- Use model 1 since it will minimize the false positive predictions of defects

## You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

### At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. First an automated algorithm tags pictures as either a cat or a dog.

In [36]:
df = pd.read_csv('~/Downloads/gives_you_paws.csv')

In [37]:
df.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


### Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

In [38]:
#look at class distribution
df.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64

In [40]:
#set the most common class as the baseline
df['baseline'] = df.actual.value_counts().idxmax()

In [41]:
df

Unnamed: 0,actual,model1,model2,model3,model4,baseline
0,cat,cat,dog,cat,dog,dog
1,dog,dog,cat,cat,dog,dog
2,dog,cat,cat,cat,dog,dog
3,dog,dog,dog,cat,dog,dog
4,cat,cat,cat,dog,dog,dog
...,...,...,...,...,...,...
4995,dog,dog,dog,dog,dog,dog
4996,dog,dog,cat,cat,dog,dog
4997,dog,cat,cat,dog,dog,dog
4998,cat,cat,cat,cat,dog,dog


### In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?

In [45]:
#baseline accuracy 
(df.actual == df.baseline).mean()

0.6508

In [48]:
#all models accuracy
model_acc = []

for model in df.columns[1:]:
    acc = (df.actual == df[model]).mean()
    model_acc.append([model, acc])

model_acc

[['model1', 0.8074],
 ['model2', 0.6304],
 ['model3', 0.5096],
 ['model4', 0.7426],
 ['baseline', 0.6508]]

In [50]:
#make pretty in df
pd.DataFrame(model_acc, columns=['model','accuracy'])

Unnamed: 0,model,accuracy
0,model1,0.8074
1,model2,0.6304
2,model3,0.5096
3,model4,0.7426
4,baseline,0.6508


Takeways:

- in terms of accuracy, model 1 and model 3 perform better than baseline

## Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend? 

> dog = positive class

> cat = negative class

In [51]:
# precision = tp / (tp + fp)
# recall = tp / (tp + fn)

In [52]:
# false positives are when we say its a dog, but its actually a cat

In [53]:
# false negative is when we say its a cat, but its actually a dog

People have a service to see dog pictures, so what do we want to minimize?
- we want to minimize false positives

Therefore, we want to maximize precision to minimize false positives

In [55]:
#calculate for all models
model_pre = []

for model in df.columns[1:]:
    
    subset = df [df[model] == 'dog']
    
    precision = (subset.actual == subset[model]).mean()

    model_pre.append([model,precision])
    
model_pre

[['model1', 0.8900238338440586],
 ['model2', 0.8931767337807607],
 ['model3', 0.6598883572567783],
 ['model4', 0.7312485304490948],
 ['baseline', 0.6508]]

In [56]:
#make pretty in dataframe
pd.DataFrame(model_pre, columns=['model','precision'])

Unnamed: 0,model,precision
0,model1,0.890024
1,model2,0.893177
2,model3,0.659888
3,model4,0.731249
4,baseline,0.6508


Takeaway for Dog team:
- we had to maximize precision to minimize the false positives
- therefore, we could use model1 since its the highest
- since model1 and model2 are so similar, we could evaluate on an additional metric

## Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend?

> cat = positive class

> dog = negative class

In [57]:
# false positive is when we say its a cat, but its a dog
# false negative is when say its a dog, but its a cat

We want to minimize the false positives, therefore, we will use precision again. 

In [58]:
#calculate for all models
model_pre = []

for model in df.columns[1:]:
    
    subset = df [df[model] == 'cat']
    
    precision = (subset.actual == subset[model]).mean()

    model_pre.append([model,precision])
    
model_pre

[['model1', 0.6897721764420747],
 ['model2', 0.4841220423412204],
 ['model3', 0.358346709470305],
 ['model4', 0.8072289156626506],
 ['baseline', nan]]

In [59]:
#make pretty in df
pd.DataFrame(model_pre, columns=['model','precision'])

Unnamed: 0,model,precision
0,model1,0.689772
1,model2,0.484122
2,model3,0.358347
3,model4,0.807229
4,baseline,


Takeaway for Cat team:
- we had to maximize precision to minimize the false positives
- therefore, we should use model 4

## Follow the links below to read the documentation about each function, then apply those functions to the data from the previous problem.

- sklearn.metrics.accuracy_score
- sklearn.metrics.precision_score
- sklearn.metrics.recall_score
- sklearn.metrics.classification_report

In [60]:
from sklearn.metrics import classification_report

In [62]:
print("Model 1")
pd.DataFrame(classification_report(df.actual, df.model1, 
                      labels=['cat','dog'],
                      output_dict=True))

Model 1


Unnamed: 0,cat,dog,accuracy,macro avg,weighted avg
precision,0.689772,0.890024,0.8074,0.789898,0.820096
recall,0.815006,0.803319,0.8074,0.809162,0.8074
f1-score,0.747178,0.844452,0.8074,0.795815,0.810484
support,1746.0,3254.0,0.8074,5000.0,5000.0


In [63]:
print("Model 2")
pd.DataFrame(classification_report(df.actual, df.model2, 
                      labels=['cat','dog'],
                      output_dict=True))

Model 2


Unnamed: 0,cat,dog,accuracy,macro avg,weighted avg
precision,0.484122,0.893177,0.6304,0.688649,0.750335
recall,0.890607,0.490781,0.6304,0.690694,0.6304
f1-score,0.627269,0.633479,0.6304,0.630374,0.63131
support,1746.0,3254.0,0.6304,5000.0,5000.0


In [64]:
print("Model 3")
pd.DataFrame(classification_report(df.actual, df.model3, 
                      labels=['cat','dog'],
                      output_dict=True))

Model 3


Unnamed: 0,cat,dog,accuracy,macro avg,weighted avg
precision,0.358347,0.659888,0.5096,0.509118,0.55459
recall,0.511455,0.508605,0.5096,0.51003,0.5096
f1-score,0.421425,0.574453,0.5096,0.497939,0.521016
support,1746.0,3254.0,0.5096,5000.0,5000.0


In [65]:
print("Model 4")
pd.DataFrame(classification_report(df.actual, df.model4, 
                      labels=['cat','dog'],
                      output_dict=True))

Model 4


Unnamed: 0,cat,dog,accuracy,macro avg,weighted avg
precision,0.807229,0.731249,0.7426,0.769239,0.757781
recall,0.345361,0.955747,0.7426,0.650554,0.7426
f1-score,0.483755,0.82856,0.7426,0.656157,0.708154
support,1746.0,3254.0,0.7426,5000.0,5000.0
