# Model Evaluation
<hr style="border:2px solid red"> </hr>

### 2. Given the following confusion matrix, evaluate (by hand) the model's performance.
|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |

- In the context of this problem, what is a false positive?
    - Assuming positive is a cat and negative is a dog
    - False Positive: The photo is of a dog, but the prediction is a cat
- In the context of this problem, what is a false negative?
- How would you describe this model?

In [1]:
from pydataset import data
import numpy as np
import seaborn as sns
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import math

# import splitting and imputing functions
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer

# turn off pink boxes for demo
import warnings
warnings.filterwarnings("ignore")

# import our own acquire module
import acquire

# Remove limits on viewing dataframes
pd.set_option('display.max_columns', None)

### 3. You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

### Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found in "c3.csv".

### Use the predictions dataset and pandas to help answer the following questions:

#### An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

In [2]:
ducks = pd.read_csv("c3.csv")
ducks

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect
...,...,...,...,...
195,No Defect,No Defect,Defect,Defect
196,Defect,Defect,No Defect,No Defect
197,No Defect,No Defect,No Defect,No Defect
198,No Defect,No Defect,Defect,Defect


### They tell you that they want to identify as many of the ducks that have a defect as possible. 

#### Thinking through the problem...
- What is the positive and negative case?
    - Positive: A toy duck has no defects and makes it to the store
    - Negative: A toy duck has a defect and does not make it to the store
    
- What are the possible outcomes?
    - True Positive: A toy duck has no defects and makes it to the store
    - True Negative: A toy duck has a defect and does not make it to the store
    - False Positive: A toy duck has a defect and makes it to the store
    - False Negative: A toy duck has no defects and does not make it to the store

### Which evaluation metric would be appropriate here? 
- Codeup Cody Creator wants to over identify than under identify. With this in mind, I think Precision would be best because a False Positive is more costly than a False Negative. 

### Which model would be the best fit for this use case?¶





In [3]:
# Alright, time for some legwork, or in this case, a lot of code!
# Which label (actual) appears most frequently in my dataset?
ducks.actual.value_counts()

No Defect    184
Defect        16
Name: actual, dtype: int64

In [4]:
# Model and baseline accuracy:
# First I'll create a new column called 'baseline_prediction
# which will be given the most frequent label from actual (in this case 'No Defect')
ducks['baseline_prediction'] = 'No Defect'

# The data already has 3 columns dedicated to model predictions, so I'll check all three for accuracy:
model1_accuracy = (ducks.actual == ducks.model1).mean()
model2_accuracy = (ducks.actual == ducks.model2).mean()
model3_accuracy = (ducks.actual == ducks.model3).mean()

# And get a base line accuracy for comparison:
baseline_accuracy = (ducks.actual == ducks.baseline_prediction).mean()

print(f'model1 accuracy: {model1_accuracy:.2%}')
print(f'model2 accuracy: {model2_accuracy:.2%}')
print(f'model3 accuracy: {model3_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')

model1 accuracy: 95.00%
model2 accuracy: 56.00%
model3 accuracy: 55.50%
baseline accuracy: 92.00%


In [6]:
# Precision: Where I only look at the subset of a model, where the model made positive predictions
# (i.e. model prediction == 'No Defect')

# Model One
model_subset = ducks[ducks.model1 == 'No Defect']
model_precision = (model_subset.model1 == model_subset.actual).mean()

baseline_subset = ducks[ducks.baseline_prediction == 'No Defect']
baseline_precision = (baseline_subset.baseline_prediction == baseline_subset.actual).mean()

print(f'model precision: {model_precision:.2%}')
print(f'baseline precision: {baseline_precision:.2%}')

model precision: 95.79%
baseline precision: 92.00%


In [7]:
# Model Two
model_subset = ducks[ducks.model2 == 'No Defect']
model_precision = (model_subset.model2 == model_subset.actual).mean()

baseline_subset = ducks[ducks.baseline_prediction == 'No Defect']
baseline_precision = (baseline_subset.baseline_prediction == baseline_subset.actual).mean()

print(f'model precision: {model_precision:.2%}')
print(f'baseline precision: {baseline_precision:.2%}')

model precision: 93.64%
baseline precision: 92.00%


In [8]:
# Model Three
model_subset = ducks[ducks.model3 == 'No Defect']
model_precision = (model_subset.model3 == model_subset.actual).mean()

baseline_subset = ducks[ducks.baseline_prediction == 'No Defect']
baseline_precision = (baseline_subset.baseline_prediction == baseline_subset.actual).mean()

print(f'model precision: {model_precision:.2%}')
print(f'baseline precision: {baseline_precision:.2%}')

model precision: 97.03%
baseline precision: 92.00%


#### Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?