# Evaluation Model Lesson
#### Corey Solitaire
#### 9/14/2020

## 1. Given the following confusion matrix, evaluate (by hand) the model's performance.

|               | actual cat | actual dog |
|:------------  |-----------:|-----------:|
| predicted cat |         34 |          7 |
| predicted dog |         13 |         46 |

   - In the context of this problem, what is a false positive?
   - In the context of this problem, what is a false negative?
   - How would you describe this model?


##  True = Predict Cat         | TP =Cat/Cat          | FP = Cat/Dog
## False= Predict Dog        | FN = Dog/ Cat       | TN = Dog/Dog

#### total observations = 100 pets
#### TP + TN = 80
#### accuracy = 80/100 or 80%

## 2. You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

#### Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here.

#### Use the predictions dataset and pandas to help answer the following questions:

- An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

- Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

# Preliminary Findings:
#### - Baseline = No Defect

        No Defect       Defect

True    TP (no /no)    FP (no/defect)  

False   FN (defect/no) TN (defect/defect)

# Predict Ducks with defects
#### - We want to id as many defects as possible, so reduce type II erros (False Negatives)

#### - Use should use Specificity Method (with model 1) because it indicates how good is our model when the actual value is negative. Like Recall for the negative class

# Predict Hawaii Trip defects
#### - We want to id as many defects a possible to reduce tyep II errors (False Negatives)

#### - We would want to use the False Positive Rate Method (with model 1) because it tells us how likely is it we get a false positive when the actual value is negative?

In [1]:
import pandas as pd
df = pd.read_csv('c3.csv')
df

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect
...,...,...,...,...
195,No Defect,No Defect,Defect,Defect
196,Defect,Defect,No Defect,No Defect
197,No Defect,No Defect,No Defect,No Defect
198,No Defect,No Defect,Defect,Defect


In [3]:
# Actual
df.actual.value_counts()

No Defect    184
Defect        16
Name: actual, dtype: int64

In [4]:
# Model #1
df.model1.value_counts()

No Defect    190
Defect        10
Name: model1, dtype: int64

In [5]:
# Model #2
df.model2.value_counts()

No Defect    110
Defect        90
Name: model2, dtype: int64

In [6]:
# Model #3
df.model3.value_counts()

No Defect    101
Defect        99
Name: model3, dtype: int64

In [48]:
# Confusion Matrtix Model 1
from sklearn.metrics import confusion_matrix
y_true = df.actual
y_pred = df.model1
confusion_matrix(y_true, y_pred)

array([[1423,  323],
       [ 640, 2614]])

- You could expect about 640 False Negatives with model 1

## 3. You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

#### At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. First an automated algorithm tags pictures as either a cat or a dog (Phase I). Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II).

#### Several models have already been developed with the data, and you can find their results here.

In [8]:
import pandas as pd
df = pd.read_csv('gives_you_paws.csv')
df

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog
...,...,...,...,...,...
4995,dog,dog,dog,dog,dog
4996,dog,dog,cat,cat,dog
4997,dog,cat,cat,dog,dog
4998,cat,cat,cat,cat,dog


## Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

- A. In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?

In [9]:
# Actual
df.actual.value_counts()
accuracy = df.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64

In [18]:
# Baseline Model

df.actual.value_counts()

df['baseline_prediction'] = 'dog'

model_accuracy = (df.actual == df.actual).mean()
baseline_accuracy = (df.baseline_prediction == df.actual).mean()

print(f'   model accuracy: {model_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')

   model accuracy: 100.00%
baseline accuracy: 65.08%


### MODEL ONE

In [14]:
#Model 1
df.model1.value_counts()

df['baseline_prediction'] = 'dog'

model_accuracy = (df.model1 == df.actual).mean()
baseline_accuracy = (df.baseline_prediction == df.actual).mean()

print(f'   model accuracy: {model_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')


   model accuracy: 80.74%
baseline accuracy: 65.08%


In [21]:
#Model 1 Accuracy Score

from sklearn.metrics import accuracy_score
y_pred = df.model1
y_true = df.actual
accuracy_score(y_true, y_pred)

accuracy_score(y_true, y_pred)


0.8074

In [28]:
#Model 1 Precision Score

from sklearn.metrics import precision_score
y_pred = df.model1
y_true = df.actual
# provides a range of values
precision_score(y_true, y_pred, average='macro'), precision_score(y_true, y_pred, average='micro'), precision_score(y_true, y_pred, average='weighted')


(0.7898980051430666, 0.8074, 0.8200959550792857)

In [32]:
# Model 1 Recall Score

from sklearn.metrics import recall_score
y_pred = df.model1
y_true = df.actual
recall_score(y_true, y_pred, average='macro'), recall_score(y_true, y_pred, average='micro'), recall_score(y_true, y_pred, average='weighted')

(0.8091623596933477, 0.8074, 0.8074)

In [37]:
# Model 1 Classification Report

from sklearn.metrics import classification_report
y_pred = df.model1
y_true = df.actual
target_names = ['actual', 'model1']
print(classification_report(y_true, y_pred, target_names=target_names))

              precision    recall  f1-score   support

      actual       0.69      0.82      0.75      1746
      model1       0.89      0.80      0.84      3254

    accuracy                           0.81      5000
   macro avg       0.79      0.81      0.80      5000
weighted avg       0.82      0.81      0.81      5000



### MODEL TWO

In [15]:
#Model 2
df.model2.value_counts()

df['baseline_prediction'] = 'dog'

model_accuracy = (df.model2 == df.actual).mean()
baseline_accuracy = (df.baseline_prediction == df.actual).mean()

print(f'   model accuracy: {model_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')

   model accuracy: 63.04%
baseline accuracy: 65.08%


In [23]:
#Model 2 Accuracy Score

from sklearn.metrics import accuracy_score
y_pred = df.model2
y_true = df.actual
accuracy_score(y_true, y_pred)

accuracy_score(y_true, y_pred)

0.6304

In [29]:
#Model 2 Precision Score

from sklearn.metrics import precision_score
y_pred = df.model2
y_true = df.actual
# provides a range of values
precision_score(y_true, y_pred, average='macro'), precision_score(y_true, y_pred, average='micro'), precision_score(y_true, y_pred, average='weighted')


(0.6886493880609905, 0.6304, 0.7503348355300732)

In [33]:
# Model 2 Recall Score

from sklearn.metrics import recall_score
y_pred = df.model2
y_true = df.actual
recall_score(y_true, y_pred, average='macro'), recall_score(y_true, y_pred, average='micro'), recall_score(y_true, y_pred, average='weighted')

(0.6906938398488845, 0.6304, 0.6304)

In [38]:
# Model 2 Classification Report

from sklearn.metrics import classification_report
y_pred = df.model2
y_true = df.actual
target_names = ['actual', 'model2']
print(classification_report(y_true, y_pred, target_names=target_names))

              precision    recall  f1-score   support

      actual       0.48      0.89      0.63      1746
      model2       0.89      0.49      0.63      3254

    accuracy                           0.63      5000
   macro avg       0.69      0.69      0.63      5000
weighted avg       0.75      0.63      0.63      5000



### MODEL THREE

In [16]:
#model 3
df.model3.value_counts()

df['baseline_prediction'] = 'dog'

model_accuracy = (df.model3 == df.actual).mean()
baseline_accuracy = (df.baseline_prediction == df.actual).mean()

print(f'   model accuracy: {model_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')

   model accuracy: 50.96%
baseline accuracy: 65.08%


In [24]:
#Model 3 Accuracy Score

from sklearn.metrics import accuracy_score
y_pred = df.model3
y_true = df.actual
accuracy_score(y_true, y_pred)

accuracy_score(y_true, y_pred)

0.5096

In [30]:
#Model 3 Precision Score

from sklearn.metrics import precision_score
y_pred = df.model3
y_true = df.actual
# provides a range of values
precision_score(y_true, y_pred, average='macro'), precision_score(y_true, y_pred, average='micro'), precision_score(y_true, y_pred, average='weighted')


(0.5091175333635416, 0.5096, 0.5545900138497418)

In [34]:
# Model 3 Recall Score

from sklearn.metrics import recall_score
y_pred = df.model3
y_true = df.actual
recall_score(y_true, y_pred, average='macro'), recall_score(y_true, y_pred, average='micro'), recall_score(y_true, y_pred, average='weighted')

(0.5100297739111823, 0.5096, 0.5095999999999999)

In [39]:
# Model 3 Classification Report

from sklearn.metrics import classification_report
y_pred = df.model3
y_true = df.actual
target_names = ['actual', 'model3']
print(classification_report(y_true, y_pred, target_names=target_names))

              precision    recall  f1-score   support

      actual       0.36      0.51      0.42      1746
      model3       0.66      0.51      0.57      3254

    accuracy                           0.51      5000
   macro avg       0.51      0.51      0.50      5000
weighted avg       0.55      0.51      0.52      5000



### MODEL FOUR

In [22]:
#model 4
df.model4.value_counts()

df['baseline_prediction'] = 'dog'

model_accuracy = (df.model4 == df.actual).mean()
baseline_accuracy = (df.baseline_prediction == df.actual).mean()

print(f'   model accuracy: {model_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')

   model accuracy: 74.26%
baseline accuracy: 65.08%


In [25]:
#Model 4 Accuracy Score

from sklearn.metrics import accuracy_score
y_pred = df.model4
y_true = df.actual
accuracy_score(y_true, y_pred)

accuracy_score(y_true, y_pred)

0.7426

In [31]:
#Model 4 Precision Score

from sklearn.metrics import precision_score
y_pred = df.model1
y_true = df.actual
# provides a range of values
precision_score(y_true, y_pred, average='macro'), precision_score(y_true, y_pred, average='micro'), precision_score(y_true, y_pred, average='weighted')

(0.7898980051430666, 0.8074, 0.8200959550792857)

In [35]:
# Model 4 Recall Score

from sklearn.metrics import recall_score
y_pred = df.model4
y_true = df.actual
recall_score(y_true, y_pred, average='macro'), recall_score(y_true, y_pred, average='micro'), recall_score(y_true, y_pred, average='weighted')

(0.6505537989722403, 0.7426, 0.7426)

In [40]:
# Model 4 Classification Report

from sklearn.metrics import classification_report
y_pred = df.model4
y_true = df.actual
target_names = ['actual', 'model4']
print(classification_report(y_true, y_pred, target_names=target_names))

              precision    recall  f1-score   support

      actual       0.81      0.35      0.48      1746
      model4       0.73      0.96      0.83      3254

    accuracy                           0.74      5000
   macro avg       0.77      0.65      0.66      5000
weighted avg       0.76      0.74      0.71      5000



### - Model #1 and #4 have better accuracies then the baseline model

- B. Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend for Phase I? For Phase II?

### - Model # 1 is the most accurate at predicting 'dog'. If i worked on the dog team that is the model that I would want to move forward with.

- C. Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend for Phase I? For Phase II?

### - Model # 3 is the least accurate at predicting 'dog'. If i worked on the cat team that is the model that I would want to move forward with.

## 4. Follow the links below to read the documentation about each function, then apply those functions to the data from the previous problem.

   - sklearn.metrics.accuracy_score
   - sklearn.metrics.precision_score
   - sklearn.metrics.recall_score
   - sklearn.metrics.classification_report
