|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |


In the context of this problem, what is a false positive?

FP depends on what was set as the positive case.  If dog was set as the positive case then a FP would be the lower 
left quadrant, dog predicted but it was actually a cat.  If cat was set as the positive case then the upper right 
quadrant would be the FP.

In the context of this problem, what is a false negative?

Same as first question.  If dog was set as the positive case then the upper right quadrant would be the FN.  If cat
positive then the lower right quadrant would be FN.

How would you describe this model?

I would assume dog is the positive case and say that it is highly accurate, with a high degree of precision and an 
even greater degree of recall.

Hand Calculations under assumption of Dog is positive

tp = 46
tn = 34
fp = 13
fn = 7 

accuracy = (46+34)/(46+34+13+7) = 80/100 = .8
precision = (46)/(46+13) = 46/59 = .7796
recall = 46/(46+7) = 46/53 = .8679

In [2]:
import pandas as pd
import os
from pydataset import data
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns


from sklearn.model_selection import train_test_split

An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to 
identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? 
Which model would be the best fit for this use case?


In [53]:

# relative file path (file in same directory)
file_path = 'c3_use.csv'

# convert to dataframe, data wonky on import due to it reading the header as a row, use header to move index down
df = pd.read_csv(file_path, header=1)



In [54]:
df.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect


An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to 
identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? 
Which model would be the best fit for this use case?

Recall would be the appropriate evaluation metric.  The perspective is concerned with capturing actual data.  Also given it is a manufacturing process there are likely a significant amount of measurements that will with a high degree of confidence indicate defects. 



#Model Fit check
df.actual.mode()
df.actual.value_counts()
#look through value counts of each model
df.model1.value_counts()

Model 1 appears to be the best fit but model3 has the highest recall. So model3

In [55]:
model1_crosstab = pd.crosstab(df.actual, df.model1)
model1_crosstab

model1,Defect,No Defect
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,8,8
No Defect,2,182


In [56]:
def model_evaluation(actual, predicted):
    
    '''
    actual argument first, predicted second for crosstab:
    TP = ULC
    TN = LRC
    FP = LLC
    FN = URC
    
    '''
    crosstab = pd.crosstab(actual, predicted, rownames=['Predicted'], colnames=['Actual'])
    # Extract true positive (TP), true negative (TN), false positive (FP), and false negative (FN)
    tp = crosstab.iloc[0, 0]
    tn = crosstab.iloc[1, 1]
    fp = crosstab.iloc[1, 0]
    fn = crosstab.iloc[0, 1]

    # model check measurements
    accuracy = (tp + tn) / (tp + tn + fp + fn)

    precision = tp / (tp + fp)

    recall = tp / (tp + fn)

    # Print the results
    print(crosstab)
    print()
    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')

model_evaluation(df.actual,df.model1)


Actual     Defect  No Defect
Predicted                   
Defect          8          8
No Defect       2        182

Accuracy: 0.9500
Precision: 0.8000
Recall: 0.5000


In [57]:
model_evaluation(df.actual, df.model2)

Actual     Defect  No Defect
Predicted                   
Defect          9          7
No Defect      81        103

Accuracy: 0.5600
Precision: 0.1000
Recall: 0.5625


In [58]:
model_evaluation(df.actual, df.model3)

Actual     Defect  No Defect
Predicted                   
Defect         13          3
No Defect      86         98

Accuracy: 0.5550
Precision: 0.1313
Recall: 0.8125


Recently several stories in the local news have come out highlighting customers who received a rubber duck with a 
defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a 
defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the 
really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which 
evaluation metric would be appropriate here? Which model would be the best fit for this use case?

Precision applies.  

You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute 
pictures of dogs or cats (or both for an additional fee).

At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step 
process. First an automated algorithm tags pictures as either a cat or a dog (Phase I). Next, the photos that have 
been initially identified are put through another round of review, possibly with some human oversight, before 
being presented to the users (Phase II).

In [59]:
# relative file path (file in same directory)
file_path = 'gives_you_paws.csv'

# convert to dataframe, data wonky on import due to it reading the header as a row, use header to move index down
df = pd.read_csv(file_path)

In [60]:
df.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [61]:
model_evaluation(df.actual, df.model1)

Actual      cat   dog
Predicted            
cat        1423   323
dog         640  2614

Accuracy: 0.8074
Precision: 0.6898
Recall: 0.8150


In [62]:
model_evaluation(df.actual,df.model2)

Actual      cat   dog
Predicted            
cat        1555   191
dog        1657  1597

Accuracy: 0.6304
Precision: 0.4841
Recall: 0.8906


In [63]:
model_evaluation(df.actual,df.model3)

Actual      cat   dog
Predicted            
cat         893   853
dog        1599  1655

Accuracy: 0.5096
Precision: 0.3583
Recall: 0.5115


Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) 
and answer the following questions:

In [76]:
model_evaluation(df.actual,df.model4)

Actual     cat   dog
Predicted           
cat        603  1143
dog        144  3110

Accuracy: 0.7426
Precision: 0.8072
Recall: 0.3454


In [75]:
(df.actual == 'dog').mean()

0.6508

In [None]:
# for model in models:
#     model_evaluation(df.actual,'model')

In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than 
the baseline?

Model1 exceeds accuracy, precision, and recall.  Model three has the highest recall. Model4 has higher accuracy and the highest precision

Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recommend?

Precision; model4

Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recommend?

In [None]:
Recall