## Prediction Analysis

## Import Models and Data

In [18]:
import joblib
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

In [9]:
dt_model = joblib.load('../models/best_decision_tree_model.joblib')
knn_model = joblib.load('../models/best_knn_model.joblib')
best_rf_model = joblib.load('../models/best_random_forest_model.joblib')

In [10]:
x_train = np.load('../data/x_train.npy')
x_test = np.load('../data/x_test.npy')
y_train = np.load('../data/y_train.npy')
y_test = np.load('../data/y_test.npy')

## Approach

- take the best model
- do the following on training data and then testing data
- see the distribution of predicted classes (how many were predicted 1, 2, and 3?)
- see only where the predictions were incorrect
- see any patterns in wrong predictions (ex: when actual answer was 1, it always predicted 2)
- then create custom error rates 
- continue creating custom metrics to give insights into model weaknesses

In [12]:
model = best_rf_model
y_pred = model.predict(x_test)
print(y_pred[:20])
print(y_test[:20])

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 2. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1.]




In [31]:
print(f"Prediction Distribution:\n{pd.Series(y_pred).value_counts()}")
print(f"\nActual Distribution:\n{pd.Series(y_test).value_counts()}")

Prediction Distribution:
1.0    338
2.0     55
3.0     33
Name: count, dtype: int64

Actual Distribution:
1.0    334
2.0     57
3.0     35
Name: count, dtype: int64


### Observation
Some instances which are 2.0 or 3.0 category are being classified as 1.0. In other words, fetuses which are suspect and pathological level (unhealthy) are being classified as healthy. These are examples of false negatives, which are very dangerous, especially in this case.

The effect of false negatives, as discussed in `modeling.ipynb`, is the endangerment of the fetus by not realizing an unhealthy fetus needs medical intervention.

Let's see where the classification is going wrong.

In [40]:
error_count = 0
type_of_error = {}
for pred, true in zip(y_pred, y_test):
    if pred != true:
        key = f"Prediction: {pred} | True: {true}"
        # print(key)
        if key not in type_of_error.keys():
            type_of_error[key] = 1
        else:
            type_of_error[key] += 1
        error_count+=1

print(f"# Misclassifications: {error_count}")
print(type_of_error)
key_with_max_value = max(type_of_error, key=type_of_error.get)
print(f"Most prevalent misclassifiction: {key_with_max_value}")

# Misclassifications: 24
{'Prediction: 2.0 | True: 1.0': 8, 'Prediction: 1.0 | True: 2.0': 12, 'Prediction: 2.0 | True: 3.0': 2, 'Prediction: 3.0 | True: 1.0': 1, 'Prediction: 1.0 | True: 3.0': 1}
Most prevalent misclassifiction: Prediction: 1.0 | True: 2.0


### Observation
The types of misclassifications are:

Type 1: Predicting 1 when truly 2 => 12 occurences

Type 2: Predicting 2 when truly 1 => 8 occurences

Type 3: Predicting 2 when truly 3 => 2 occurences

Type 4: Predicting 3 when truly 1 => 1 occurence

Type 5: Predicting 1 when truly 3 => 1 occurence
<br><br>

This can be translated to:

Type 1: Predicting HEALTHY when SUSPECT => 12 occurences (***)

Type 2: Predicting SUSPECT when HEALTHY => 8 occurences

Type 3: Predicting SUSPECT when PATHOLOGICAL => 2 occurences

Type 4: Predicting PATHOLOGICAL when HEALTHY => 1 occurence

Type 5: Predicting HEALTHY when PATHOLOGICAL => 1 occurence (***)

_(***) indicates dangerous misclassifications_
<br><br>

Let's look into Type 1 misclassifications.

In [None]:
# modify previous code to keep track of pred and true value for each index in y_pred and y_true. 
# pull those cases from x_test and see for any patterns in misclassification type 1.