In [1]:
import pandas as pd
import numpy as np

### 1. Given the following confusion matrix, evaluate (by hand) the model's performance

In [2]:
confusion_matrix = pd. DataFrame([[34, 7], [13, 46]], 
                                 index=['predicted cat', 'predicted dog'], 
                                 columns=['actual cat', 'actual dog'])
confusion_matrix

Unnamed: 0,actual cat,actual dog
predicted cat,34,7
predicted dog,13,46


### Assuming we are predicing dog, "It will be dog" is the positive prediction.

### 1-a. In the context of this problem, what is a false positive?
The false positive is that the model is predicting a dog but it is actually a cat. 

### 1-b. In the context of this problem, what is a false negative?
The false negative is that the model is predicting a cat but it is actually a dog. 

### 1-c. How would you describe this model? 

In [20]:
print(f"The accuracy of the model is: {(34+46)/100}")
print(f"The precision of the model is: {46/(46+13)}")
print(f"The recall of the model is: {46/(46+7)}")
print(f"The specificity of the model is: {34/(34+13)}")

The accuracy of the model is: 0.8
The precision of the model is: 0.7796610169491526
The recall of the model is: 0.8679245283018868
The specificity of the model is: 0.723404255319149


### 2. You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant. Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here. Use the predictions dataset and pandas to help answer the following questions.

In [24]:
df = pd.read_csv('c3.csv')
df.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect


In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   actual  200 non-null    object
 1   model1  200 non-null    object
 2   model2  200 non-null    object
 3   model3  200 non-null    object
dtypes: object(4)
memory usage: 6.4+ KB


In [26]:
df.isnull().any()

actual    False
model1    False
model2    False
model3    False
dtype: bool

In [29]:
# Baseline model

df.actual.value_counts()

No Defect    184
Defect        16
Name: actual, dtype: int64

**Takeways**:
1. The most common class in actual is 'No Defect'.
2. 'No Defect' is the postive prediction. 
3. 'Defect' is the negative prediction.

### 2-a. An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

* To capture the true negative, `specificity` would be appropriate here: TN/All N.
* The model with the highest `specificity` will be the best fit. 

In [30]:
cross_tab_model_1 = pd.crosstab(df.actual, df.model1)
cross_tab_model_1

model1,Defect,No Defect
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,8,8
No Defect,2,182


In [31]:
cross_tab_model_2 = pd.crosstab(df.actual, df.model2)
cross_tab_model_2

model2,Defect,No Defect
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,9,7
No Defect,81,103


In [32]:
cross_tab_model_3 = pd.crosstab(df.actual, df.model3)
cross_tab_model_3

model3,Defect,No Defect
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,13,3
No Defect,86,98


In [36]:
specificity_model_1 = 8/16
specificity_model_2 = 9/16
specificity_model_3 = 13/16

specificity_model_1, specificity_model_2, specificity_model_3

(0.5, 0.5625, 0.8125)

**Takeaways**:
Model 3 has the highest specificity so it is the best fit. 

### 2-b. Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

* To avoid the false positive, `precision` should be used as the evaluation metric here. 
* The model with the highest `precision` will be the best fit. 

In [35]:
precision_model_1 = 182/190
precision_model_2 = 103/110
precision_model_3 = 98/101

precision_model_1, precision_model_2, precision_model_3

(0.9578947368421052, 0.9363636363636364, 0.9702970297029703)

**Takeways**:
Model 3 has the highest precision so it is the best fit for this use case

### 3. You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. 

1. First an automated algorithm tags pictures as either a cat or a dog (Phase I). 
2. Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II).

Several models have already been developed with the data, and you can find their results here. Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions.

In [38]:
df = pd.read_csv('gives_you_paws.csv')
df.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [39]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   actual  5000 non-null   object
 1   model1  5000 non-null   object
 2   model2  5000 non-null   object
 3   model3  5000 non-null   object
 4   model4  5000 non-null   object
dtypes: object(5)
memory usage: 195.4+ KB


In [40]:
df.isnull().any()

actual    False
model1    False
model2    False
model3    False
model4    False
dtype: bool

In [41]:
# Creat the baseline model according to the actual observations

df.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64

### 3-a. In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?

Accuracy = TP + TN / All

In [42]:
cross_table_model_1 = pd.crosstab(df.actual, df.model1)
cross_table_model_1

model1,cat,dog
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1423,323
dog,640,2614


In [43]:
cross_table_model_2 = pd.crosstab(df.actual, df.model2)
cross_table_model_2

model2,cat,dog
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1555,191
dog,1657,1597


In [44]:
cross_table_model_3 = pd.crosstab(df.actual, df.model3)
cross_table_model_3

model3,cat,dog
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,893,853
dog,1599,1655


In [45]:
cross_table_model_4 = pd.crosstab(df.actual, df.model4)
cross_table_model_4

model4,cat,dog
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,603,1143
dog,144,3110


In [47]:
baseline_model = 3254/5000
accuracy_model_1 = (1423+2614)/5000
accuracy_model_2 = (1555+1597)/5000
accuracy_model_3 = (893+1655)/5000
accuracy_model_4 = (603+3110)/5000

baseline_model, accuracy_model_1, accuracy_model_2, accuracy_model_3, accuracy_model_4

(0.6508, 0.8074, 0.6304, 0.5096, 0.7426)

**Takeways**:
1. Model 1 and Model 4 are better than baseline in term of accuracy
2. Model 2 and Model 3 are worse than baseline in term of accuracy

**Takeaways**:
1. The most common class is dog
2. Predicing dog will be the postivie direction
3. TP = 3254, TN = 0, FP = 1746, FN = 0

### 3-b. Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend for Phase I? For Phase II?
1. The team only wants dog pictures
2. False Negtaive is more tolerated than False Positive
3. Precision is the evalution metric for both Phase I and II. 

In [49]:
precision_model_1 = 2614/(2614+323)
precision_model_2 = 1597/(1597+191)
precision_model_3 = 1655/(1655+853)
precision_model_4 = 3110/(3110+1143)

precision_model_1**2, precision_model_2**2, precision_model_3**2, precision_model_4**2

(0.7921424248104764,
 0.7977646777672678,
 0.43545264404304945,
 0.5347244132839607)

**Takeaways**

I will recommend model 2. 

### 3-c. Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend for Phase I? For Phase II?

1. The team only wants cat pictures.
2. Specificty is the evalution metric for both Phase I and II

In [50]:
specificity_model_1 = 1423/(1423+323)
specificity_model_2 = 1555/(1555+191)
specificity_model_3 = 893/(893+853)
specificity_model_4 = 603/(603+1143)

specificity_model_1**2, specificity_model_2**2, specificity_model_3**2, specificity_model_4**2

(0.6642343356570869,
 0.7931810100389828,
 0.2615859651056448,
 0.11927409926665958)

**Takeaways**

I will recommend model 2. 