### Evaluation Exercises

### 1. Create a new file named ```model_evaluation.py``` or ```model_evaluation.ipynb``` for these exercises.

In [2]:
import pandas as pd
import numpy as np
import os
from pydataset import data

import seaborn as sns
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'

from sklearn.metrics import classification_report

from env import get_db_url

# data('mpg', show_doc=True) # view the documentation for the dataset

**POSITIVE (+)** = insert Positive statement here  
**NEGATIVE (-)** = insert Negative statement here  


**RECALL**    
TP / (TP + FN)  
Use for less **Type II** errors when **FN** is worst outcome  
Maximize for **RECALL** if Cost of **FN** > Cost of **FP**

**PRECISION**    
TP / (TP + FP)  
Use for less **Type I** errors when **FP** is worst outcome  
Maximize for **PRECISION** if Cost of **FP** > Cost of **FN**

**ACCURACY**    
(TP + TN)/(FP+FN+TP+TN)  
total # prediction TRUE / total  
Maximize for **ACCURACY** if neither **RECALL** or **PRECISION** outweigh eachother 


* **Classification Confusion Matrix** (actual_col, prediction_row)(Positive_first, Negative_second)    

|                     | actual Positive (+) | actual Negative(-) |
|---------------------|---------------------|--------------------|
|  pred Positive (+)  |     TP              |     FP (Type I)    |
|  pred Negative (-)  |     FN (Type II)    |     TN             |


* <b>sklearn Confusion Matrix</b> (prediction_col, actual_row)(Negative_first, Positive_second)  
                                        
|                     | pred Negative(-) | pred Positive (+) |
|---------------------|------------------|-------------------|
| actual Negative (-) |        TN        |    FP (Type I)    |
| actual Positive (+) |   FN (Type II)   |         TP        |  


**FP**: We **predicted** it was a **POSITIVE** when it was **actually** a **NEGATIVE**  
*    FP = We **FALSE**LY predicted it was **POSITIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE I** error!  

**FN**: We **predicted** it was a **NEGATIVE** when it was **actually** a **POSITIVE**  
*    FN = We **FALSE**LY predicted it was **NEGATIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE II** error!  

**TP**: We **predicted** it was a **POSITIVE** and it was **actually** a **POSITIVE**  
*   TP = We **TRUE**LY predicted it was **POSITIVE**  
* True = Our prediction was True, it was actually the same as our prediction  

**TN**: We **predicted** it was a **NEGATIVE** and it was **actually** a **NEGATIVE**  
*   TN = We **TRUE**LY predicted it was **NEGATIVE**  
* True = Our prediction was True, it was actually the same as our prediction  

### 2. Given the following confusion matrix, evaluate (by hand) the model's performance.
|              | pred dog  | pred cat  |
|------------  |-----------|-----------|
| actual dog   |        46 |        7  |
| actual cat   |        13 |        34 |

* <b>sklearn Confusion Matrix</b> (prediction_col, actual_row)(Negative_first, Positive_second)  

**NEGATIVE (-)** = dog  
**POSITIVE (+)** = cat

**FP**: We **predicted** it was a **cat** when it was **actually** a **dog**  
*    FP = We **FALSE**LY predicted it was **POSITIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE I** error!  

**FN**: We **predicted** it was a **dog** when it was **actually** a **cat**  
*    FN = We **FALSE**LY predicted it was **NEGATIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE II** error!  

**TP**: We **predicted** it was a **cat** and it was **actually** a **cat**  
*   TP = We **TRUE**LY predicted it was **POSITIVE**  
* True = Our prediction was True, it was actually the same as our prediction  

**TN**: We **predicted** it was a **dog** and it was **actually** a **dog**  
*   TN = We **TRUE**LY predicted it was **NEGATIVE**  
* True = Our prediction was True, it was actually the same as our prediction  

### 2.1 In the context of this problem, what is a false positive?

**FP**: We **predicted** it was a **cat** when it was **actually** a **dog**  
*    FP = We **FALSE**LY predicted it was **POSITIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE I** error!  

### 2.2 In the context of this problem, what is a false negative?

**FN**: We **predicted** it was a **dog** when it was **actually** a **cat**  
*    FN = We **FALSE**LY predicted it was **NEGATIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE II** error!  

### 2.3 How would you describe this model?

* <b>sklearn Confusion Matrix</b> (prediction_col, actual_row)(Negative_first, Positive_second)  

**NEGATIVE (-)** = dog  
**POSITIVE (+)** = cat

* We predicted FP the least amount of times so I would describe it as having been adjusted for Precision in order to have less Type I errors. 
* Although, it may have been adjusted for Accuracy since TN and TP have significantly higher predictions.

### 3. You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here (https://ds.codeup.com/data/c3.csv).

Use the predictions dataset and pandas to help answer the following questions:

In [31]:
c3_df = pd.read_csv('c3.csv')
c3_df.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect


In [32]:
c3_df.shape

(200, 4)

In [168]:
c3_df.actual.value_counts()

No Defect    184
Defect        16
Name: actual, dtype: int64

In [33]:
c3_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   actual  200 non-null    object
 1   model1  200 non-null    object
 2   model2  200 non-null    object
 3   model3  200 non-null    object
dtypes: object(4)
memory usage: 6.4+ KB


In [34]:
c3_df.describe()

Unnamed: 0,actual,model1,model2,model3
count,200,200,200,200
unique,2,2,2,2
top,No Defect,No Defect,No Defect,No Defect
freq,184,190,110,101


In [35]:
c3_df.actual.value_counts(), c3_df.model1.value_counts(), c3_df.model2.value_counts(), c3_df.model3.value_counts()

(No Defect    184
 Defect        16
 Name: actual, dtype: int64,
 No Defect    190
 Defect        10
 Name: model1, dtype: int64,
 No Defect    110
 Defect        90
 Name: model2, dtype: int64,
 No Defect    101
 Defect        99
 Name: model3, dtype: int64)

In [167]:
c3_df.value_counts()

actual     model1     model2     model3     baseline
No Defect  No Defect  No Defect  Defect     Defect      52
                                 No Defect  Defect      50
                      Defect     No Defect  Defect      47
                                 Defect     Defect      33
Defect     No Defect  Defect     Defect     Defect       5
           Defect     No Defect  Defect     Defect       4
                      Defect     Defect     Defect       3
                      No Defect  No Defect  Defect       1
           No Defect  Defect     No Defect  Defect       1
                      No Defect  Defect     Defect       1
                                 No Defect  Defect       1
No Defect  Defect     Defect     Defect     Defect       1
                      No Defect  No Defect  Defect       1
dtype: int64

# <b>MODEL 1</b>

In [36]:
pd.crosstab(c3_df.model1, c3_df.actual)

actual,Defect,No Defect
model1,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,8,2
No Defect,8,182


In [49]:
# MANUAL Accuracy: (TP + TN) / (TP + TN + FP + FN)
(8 + 182) / (8 + 182 + 2 + 8)

0.95

In [60]:
#model accuracy
(c3_df.actual == c3_df.model1).mean()

0.95

In [50]:
# MANUAL Precision: TP / (TP + FP)
8 / (8 + 2)
# FP is more costly than FN

0.8

In [74]:
# model precision
precision_1 = c3_df[c3_df.model1 == 'Defect']
precision_1

Unnamed: 0,actual,model1,model2,model3,baseline
3,No Defect,Defect,Defect,Defect,Defect
30,Defect,Defect,No Defect,Defect,Defect
62,No Defect,Defect,No Defect,No Defect,Defect
65,Defect,Defect,Defect,Defect,Defect
70,Defect,Defect,Defect,Defect,Defect
135,Defect,Defect,No Defect,Defect,Defect
147,Defect,Defect,No Defect,Defect,Defect
163,Defect,Defect,Defect,Defect,Defect
194,Defect,Defect,No Defect,Defect,Defect
196,Defect,Defect,No Defect,No Defect,Defect


In [75]:
(precision_1.model1 == precision_1.actual).mean()

0.8

In [76]:
# MANUAL Recall: TP / (TP + FN)
8 / (8 + 8)
# FN is more costly than FP

0.5

In [77]:
#model recall
recall_1 = c3_df[c3_df.actual == 'Defect']
recall_1

Unnamed: 0,actual,model1,model2,model3,baseline
13,Defect,No Defect,Defect,Defect,Defect
30,Defect,Defect,No Defect,Defect,Defect
65,Defect,Defect,Defect,Defect,Defect
70,Defect,Defect,Defect,Defect,Defect
74,Defect,No Defect,No Defect,Defect,Defect
87,Defect,No Defect,Defect,Defect,Defect
118,Defect,No Defect,Defect,No Defect,Defect
135,Defect,Defect,No Defect,Defect,Defect
140,Defect,No Defect,Defect,Defect,Defect
147,Defect,Defect,No Defect,Defect,Defect


In [78]:
(recall_1.model1 == recall_1.actual).mean()

0.5

In [94]:
print(classification_report(c3_df.actual, c3_df.model1))

              precision    recall  f1-score   support

      Defect       0.80      0.50      0.62        16
   No Defect       0.96      0.99      0.97       184

    accuracy                           0.95       200
   macro avg       0.88      0.74      0.79       200
weighted avg       0.95      0.95      0.94       200



# <b>MODEL 2</b>

In [37]:
pd.crosstab(c3_df.model2, c3_df.actual)

actual,Defect,No Defect
model2,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,9,81
No Defect,7,103


In [52]:
# MANUAL Accuracy: (TP + TN) / (TP + TN + FP + FN)
(9 + 103) / (9 + 103 + 81 + 7)

0.56

In [61]:
#model accuracy
(c3_df.actual == c3_df.model2).mean()

0.56

In [53]:
# MANUAL Precision: TP / (TP + FP)
9 / (9 + 81)
# FP is more costly than FN

0.1

In [72]:
# model precision
precision_2 = c3_df[c3_df.model2 == 'Defect']
precision_2

Unnamed: 0,actual,model1,model2,model3,baseline
0,No Defect,No Defect,Defect,No Defect,Defect
1,No Defect,No Defect,Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect,Defect
3,No Defect,Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect,Defect
7,No Defect,No Defect,Defect,No Defect,Defect
10,No Defect,No Defect,Defect,No Defect,Defect
13,Defect,No Defect,Defect,Defect,Defect
15,No Defect,No Defect,Defect,No Defect,Defect
19,No Defect,No Defect,Defect,Defect,Defect


In [73]:
(precision_2.model2 == precision_2.actual).mean()

0.1

In [82]:
# MANUAL Recall: TP / (TP + FN)
9 / (9 + 7)
# FN is more costly than FP

0.5625

In [83]:
#model recall
recall_2 = c3_df[c3_df.actual == 'Defect']
recall_2

Unnamed: 0,actual,model1,model2,model3,baseline
13,Defect,No Defect,Defect,Defect,Defect
30,Defect,Defect,No Defect,Defect,Defect
65,Defect,Defect,Defect,Defect,Defect
70,Defect,Defect,Defect,Defect,Defect
74,Defect,No Defect,No Defect,Defect,Defect
87,Defect,No Defect,Defect,Defect,Defect
118,Defect,No Defect,Defect,No Defect,Defect
135,Defect,Defect,No Defect,Defect,Defect
140,Defect,No Defect,Defect,Defect,Defect
147,Defect,Defect,No Defect,Defect,Defect


In [84]:
(recall_2.model2 == recall_2.actual).mean()

0.5625

In [93]:
print(classification_report(c3_df.actual, c3_df.model2))

              precision    recall  f1-score   support

      Defect       0.10      0.56      0.17        16
   No Defect       0.94      0.56      0.70       184

    accuracy                           0.56       200
   macro avg       0.52      0.56      0.44       200
weighted avg       0.87      0.56      0.66       200



# <b>MODEL 3</b>

In [38]:
pd.crosstab(c3_df.model3, c3_df.actual)

actual,Defect,No Defect
model3,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,13,86
No Defect,3,98


In [55]:
# MANUAL Accuracy: (TP + TN) / (TP + TN + FP + FN)
(13 + 98) / (13 + 98 + 86 + 3)

0.555

In [62]:
#model accuracy
(c3_df.actual == c3_df.model3).mean()

0.555

In [56]:
# MANUAL Precision: TP / (TP + FP)
13 / (13 + 86)
# FP is more costly than FN

0.13131313131313133

In [87]:
# model precision
precision_3 = c3_df[c3_df.model3 == 'Defect']
precision_3

Unnamed: 0,actual,model1,model2,model3,baseline
1,No Defect,No Defect,Defect,Defect,Defect
3,No Defect,Defect,Defect,Defect,Defect
5,No Defect,No Defect,No Defect,Defect,Defect
9,No Defect,No Defect,No Defect,Defect,Defect
13,Defect,No Defect,Defect,Defect,Defect
16,No Defect,No Defect,No Defect,Defect,Defect
18,No Defect,No Defect,No Defect,Defect,Defect
19,No Defect,No Defect,Defect,Defect,Defect
23,No Defect,No Defect,No Defect,Defect,Defect
26,No Defect,No Defect,No Defect,Defect,Defect


In [88]:
(precision_3.model3 == precision_3.actual).mean()

0.13131313131313133

In [57]:
# MANUAL Recall: TP / (TP + FN)
13 / (13 + 3)
# FN is more costly than FP

0.8125

In [85]:
#model recall
recall_3 = c3_df[c3_df.actual == 'Defect']
recall_3

Unnamed: 0,actual,model1,model2,model3,baseline
13,Defect,No Defect,Defect,Defect,Defect
30,Defect,Defect,No Defect,Defect,Defect
65,Defect,Defect,Defect,Defect,Defect
70,Defect,Defect,Defect,Defect,Defect
74,Defect,No Defect,No Defect,Defect,Defect
87,Defect,No Defect,Defect,Defect,Defect
118,Defect,No Defect,Defect,No Defect,Defect
135,Defect,Defect,No Defect,Defect,Defect
140,Defect,No Defect,Defect,Defect,Defect
147,Defect,Defect,No Defect,Defect,Defect


In [86]:
(recall_3.model3 == recall_3.actual).mean()

0.8125

In [91]:
print(classification_report(c3_df.actual, c3_df.model3))

              precision    recall  f1-score   support

      Defect       0.13      0.81      0.23        16
   No Defect       0.97      0.53      0.69       184

    accuracy                           0.56       200
   macro avg       0.55      0.67      0.46       200
weighted avg       0.90      0.56      0.65       200



# <b>BASELINE</b>

In [59]:
c3_df['baseline'] = 'Defect'
c3_df

Unnamed: 0,actual,model1,model2,model3,baseline
0,No Defect,No Defect,Defect,No Defect,Defect
1,No Defect,No Defect,Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect,Defect
3,No Defect,Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect,Defect
...,...,...,...,...,...
195,No Defect,No Defect,Defect,Defect,Defect
196,Defect,Defect,No Defect,No Defect,Defect
197,No Defect,No Defect,No Defect,No Defect,Defect
198,No Defect,No Defect,Defect,Defect,Defect


In [63]:
#baseline accuracy
(c3_df.actual == c3_df.baseline).mean()

0.08

In [177]:
# turn off pink warning boxes
import warnings
warnings.filterwarnings("ignore")

print(classification_report(c3_df.actual, c3_df.baseline))

              precision    recall  f1-score   support

      Defect       0.08      1.00      0.15        16
   No Defect       0.00      0.00      0.00       184

    accuracy                           0.08       200
   macro avg       0.04      0.50      0.07       200
weighted avg       0.01      0.08      0.01       200



* **Classification Confusion Matrix** (actual_col, prediction_row)(Positive_first, Negative_second)    

|                           | actual Has Defects (+) | actual Has NO Defects (-) |
|---------------------------|------------------------|---------------------------|
|  pred Has Defects (+)     |       TP               |          FP (Type I)      |
|  pred Has NO Defects (-)  |       FN (Type II)     |          TN               |

**POSITIVE (+)** = Has Defects  
**NEGATIVE (-)** = Has NO Defects  

**ACCURACY**    
(TP + TN)/(FP+FN+TP+TN)  
total # prediction TRUE / total  

**PRECISION**    
TP / (TP + FP)  
Use for less **Type I** errors when **FP** is worst outcome  

**RECALL**    
TP / (TP + FN)  
Use for less **Type II** errors when **FN** is worst outcome    


**FP**: We **predicted** it **Has Defects** when it **actually** **Has NO Defects**  
*    FP = We **FALSE**LY predicted it was **POSITIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE I** error!  

**FN**: We **predicted** it **Has NO Defects** when it **actually** **Has Defects**  
*    FN = We **FALSE**LY predicted it was **NEGATIVE**  
* False = Our prediction was False, it was actually the opposite of our prediction  
* Oops... **TYPE II** error!  

**TP**: We **predicted** it **Has Defects** and it **actually** **Has Defects**  
*   TP = We **TRUE**LY predicted it was **POSITIVE**  
* True = Our prediction was True, it was actually the same as our prediction  

**TN**: We **predicted** it **Has NO Defects** and it **actually** **Has NO Defects**  
*   TN = We **TRUE**LY predicted it was **NEGATIVE**  
* True = Our prediction was True, it was actually the same as our prediction  

### 3.1 An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

* **Recall** would be the best evaluation metric  
  **FN (Type II)** error would be our worst outcome because it would be better to over predict defects and be wrong **FP**  
  than to under predict and completely miss the **actual** defects.


* We have been asked to identify the most ducks that have defects.  
  We have set **Has Defects** as our **POSITIVE**.  


* **FN**: We **predicted** it **Has NO Defects** when it **actually** **Has Defects**  
  **FP**: We **predicted** it **Has Defects** when it **actually** **Has NO Defects**  
  
* **Model 3** would be the best fit because it has the highest FP and lowest FN rate  
  making us wrong with a good outcome more often than wrong with a bad outcome.  
  Erroring on the side of caution and ensuring that we:  
  * "identify as many of the ducks that have a defect as possible"

### 3.2 Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

* **Precision** would be the best evaluation metric  
  **FP (Type I)** error would be our worst outcome because we would over predict defects and potentially pay out too many vacations to Hawaii.

* **FN**: We **predicted** it **Has NO Defects** when it **actually** **Has Defects**  
  **FP**: We **predicted** it **Has Defects** when it **actually** **Has NO Defects**  
  
* **Model 1** would be the best fit because it has the highest FN and lowest FP rate.  
  Making us predict less defects that were wrong preventing us from:  
  * "accidentally giving out vacation packages when the duck really doesn't have a defect"

### 4. You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. First an automated algorithm tags pictures as either a cat or a dog (Phase I). Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II).

Several models have already been developed with the data, and you can find their results here (https://ds.codeup.com/data/gives_you_paws.csv).

Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

In [30]:
petpics_df = pd.read_csv('gives_you_paws.csv')
petpics_df.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [39]:
petpics_df.shape

(5000, 5)

In [40]:
petpics_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   actual  5000 non-null   object
 1   model1  5000 non-null   object
 2   model2  5000 non-null   object
 3   model3  5000 non-null   object
 4   model4  5000 non-null   object
dtypes: object(5)
memory usage: 195.4+ KB


In [41]:
petpics_df.describe()

Unnamed: 0,actual,model1,model2,model3,model4
count,5000,5000,5000,5000,5000
unique,2,2,2,2,2
top,dog,dog,cat,dog,dog
freq,3254,2937,3212,2508,4253


# <b>MODEL 1</b>

In [102]:
pd.crosstab(petpics_df.model1, petpics_df.actual)

actual,cat,dog
model1,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1423,640
dog,323,2614


In [103]:
#model accuracy
(petpics_df.actual == petpics_df.model1).mean()

0.8074

In [109]:
# model precision
precision_1 = petpics_df[petpics_df.model1 == 'cat']

In [106]:
(precision_1.model1 == precision_1.actual).mean()

0.6897721764420747

In [108]:
#model recall
recall_1 = petpics_df[petpics_df.actual == 'cat']

In [110]:
(recall_1.model1 == recall_1.actual).mean()

0.8150057273768614

In [111]:
print(classification_report(petpics_df.actual, petpics_df.model1))

              precision    recall  f1-score   support

         cat       0.69      0.82      0.75      1746
         dog       0.89      0.80      0.84      3254

    accuracy                           0.81      5000
   macro avg       0.79      0.81      0.80      5000
weighted avg       0.82      0.81      0.81      5000



# <b>MODEL 2</b>

In [112]:
pd.crosstab(petpics_df.model2, petpics_df.actual)

actual,cat,dog
model2,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1555,1657
dog,191,1597


In [113]:
#model accuracy
(petpics_df.actual == petpics_df.model2).mean()

0.6304

In [118]:
# model precision
precision_2 = petpics_df[petpics_df.model2 == 'cat']

In [119]:
(precision_2.model2 == precision_2.actual).mean()

0.4841220423412204

In [120]:
#model recall
recall_2 = petpics_df[petpics_df.actual == 'cat']

In [121]:
(recall_2.model2 == recall_2.actual).mean()

0.8906071019473081

In [122]:
print(classification_report(petpics_df.actual, petpics_df.model2))

              precision    recall  f1-score   support

         cat       0.48      0.89      0.63      1746
         dog       0.89      0.49      0.63      3254

    accuracy                           0.63      5000
   macro avg       0.69      0.69      0.63      5000
weighted avg       0.75      0.63      0.63      5000



# <b>MODEL 3</b>

In [123]:
pd.crosstab(petpics_df.model3, petpics_df.actual)

actual,cat,dog
model3,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,893,1599
dog,853,1655


In [124]:
#model accuracy
(petpics_df.actual == petpics_df.model3).mean()

0.5096

In [125]:
# model precision
precision_3 = petpics_df[petpics_df.model3 == 'cat']

In [126]:
(precision_3.model3 == precision_3.actual).mean()

0.358346709470305

In [127]:
#model recall
recall_3 = petpics_df[petpics_df.actual == 'cat']

In [128]:
(recall_3.model1 == recall_3.actual).mean()

0.8150057273768614

In [129]:
print(classification_report(petpics_df.actual, petpics_df.model3))

              precision    recall  f1-score   support

         cat       0.36      0.51      0.42      1746
         dog       0.66      0.51      0.57      3254

    accuracy                           0.51      5000
   macro avg       0.51      0.51      0.50      5000
weighted avg       0.55      0.51      0.52      5000



# <b>MODEL 4</b>

In [130]:
pd.crosstab(petpics_df.model4, petpics_df.actual)

actual,cat,dog
model4,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,603,144
dog,1143,3110


In [131]:
#model accuracy
(petpics_df.actual == petpics_df.model4).mean()

0.7426

In [132]:
# model precision
precision_4 = petpics_df[petpics_df.model4 == 'cat']

In [133]:
(precision_4.model4 == precision_4.actual).mean()

0.8072289156626506

In [134]:
#model recall
recall_4 = petpics_df[petpics_df.actual == 'cat']

In [135]:
(recall_4.model4 == recall_4.actual).mean()

0.34536082474226804

In [136]:
print(classification_report(petpics_df.actual, petpics_df.model4))

              precision    recall  f1-score   support

         cat       0.81      0.35      0.48      1746
         dog       0.73      0.96      0.83      3254

    accuracy                           0.74      5000
   macro avg       0.77      0.65      0.66      5000
weighted avg       0.76      0.74      0.71      5000



# <b>BASELINE</b>

In [170]:
petpics_df.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64

In [171]:
petpics_df['baseline'] = 'dog'

In [172]:
#baseline accuracy
(petpics_df.actual == petpics_df.baseline).mean()

0.6508

In [173]:
print(classification_report(petpics_df.actual, petpics_df.baseline))

              precision    recall  f1-score   support

         cat       0.00      0.00      0.00      1746
         dog       0.65      1.00      0.79      3254

    accuracy                           0.65      5000
   macro avg       0.33      0.50      0.39      5000
weighted avg       0.42      0.65      0.51      5000



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


### a. In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?

In [174]:
print(f'Baseline: {(petpics_df.actual == petpics_df.baseline).mean()} model_1 accuracy: {(petpics_df.actual == petpics_df.model1).mean()} model_2 accuracy: {(petpics_df.actual == petpics_df.model2).mean()} model_3 accuracy: {(petpics_df.actual == petpics_df.model3).mean()} model_4 accuracy: {(petpics_df.actual == petpics_df.model4).mean()}')

Baseline: 0.6508 model_1 accuracy: 0.8074 model_2 accuracy: 0.6304 model_3 accuracy: 0.5096 model_4 accuracy: 0.7426


In [175]:
accuracy_list =[(petpics_df.actual == petpics_df.baseline).mean(),
                (petpics_df.actual == petpics_df.model1).mean(),
                (petpics_df.actual == petpics_df.model2).mean(),
                (petpics_df.actual == petpics_df.model3).mean(),
                (petpics_df.actual == petpics_df.model4).mean()]
accuracy_list

[0.6508, 0.8074, 0.6304, 0.5096, 0.7426]

In [176]:
accuracy_library = {'Baseline': (petpics_df.actual == petpics_df.baseline).mean(), 
                   'Model_1 accuracy': (petpics_df.actual == petpics_df.model1).mean(),
                   'Model_2 accuracy': (petpics_df.actual == petpics_df.model2).mean(),
                   'Model_3 accuracy': (petpics_df.actual == petpics_df.model3).mean(),
                   'Model_4 accuracy': (petpics_df.actual == petpics_df.model4).mean()}
accuracy_library

{'Baseline': 0.6508,
 'Model_1 accuracy': 0.8074,
 'Model_2 accuracy': 0.6304,
 'Model_3 accuracy': 0.5096,
 'Model_4 accuracy': 0.7426}

### b. Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recommend?

In [166]:
pd.crosstab(petpics_df.baseline, petpics_df.actual),
pd.crosstab(petpics_df.model1, petpics_df.actual),
pd.crosstab(petpics_df.model2, petpics_df.actual),
pd.crosstab(petpics_df.model3, petpics_df.actual),
pd.crosstab(petpics_df.model4, petpics_df.actual)

actual,cat,dog
model4,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,603,144
dog,1143,3110


In [None]:
# calculate for all models

### c. Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recommend?

### 5. Follow the links below to read the documentation about each function, then apply those functions to the data from the previous problem.
* sklearn.metrics.accuracy_score 
(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)
* sklearn.metrics.precision_score
(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html)
* sklearn.metrics.recall_score
(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html)
* sklearn.metrics.classification_report
(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)