---
# Validation of best model
---
In this notebook, we will be validating and comparing different models together

Here are the three models that will be validated and for which we will be comparing the results:
1. The best model found, which is a Random Forest model, based on the accuracy score.
2. The best model based on the balanced accuracy score, which is a Decision Tree Classifier model.
3. The client's actual model from which the Churn Score is calculated.


## Results


---

### Importing necessary library

In [109]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from scipy.spatial.distance import cdist
from scipy.spatial import distance

import joblib

---

<center>
    
## Creating new data

</center>

---

### Columns to create in new dataset

In [31]:
cols=[ 'Photochemical Ozone Formation', 
       'Fine Particles',
       'Ecotoxicity for Freshwater Aquatic Ecosystems',
       'Land Use',
       'Water Resource Depletion', 
       'Energy Resource Depletion',
       'Mineral Resource Depletion',
       'Climate Change',
       'Toxicological Effects',
       'Water Eutrophication',   
     ]

### Creating fake data for the dataset

In [129]:
low    = [0.01, 4e-08,	 25.,	50.00, 2., 35.0, 0.00002, 5.0, 8e-08, 0.002,] # Trying to get category 1 or 4
medium = [0.10, 4e-06,	 25.,	50.00, 2., 150., 0.00002, 10.0,	8e-08, 0.002,] # Trying to get category 2
high   = [0.08, 6e-06,	 200.,	5000., 2., 200., 0.00005, 45.0, 8e-07, 0.002,	] # Trying to get category 6 

### Converting data subset to dataframe 

In [130]:
X = pd.DataFrame([low,medium,high], columns=cols)

---

<center>
    
## Hierarchical Clustering Model

</center>

---

### Loading the model's scaler

In [131]:
# Load the model and preprocessors (if saved)
scaler = joblib.load('../model/AC_scaler.joblib')

### Loading the model's Cluster Centroid information file

In [132]:
cluster_centroid = pd.read_csv('ClusterCentroid.csv').set_index('Cluster')
cluster_centroid

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0.464376,2.034674,2.206861,2.084632,0.00642,1.048484,0.633946,2.599758,2.12896,1.252387
1,-0.2038,-0.239374,-0.189214,-0.0868,-0.031428,-0.166742,-0.031002,-0.162803,-0.074419,-0.141245
2,3.619942,1.438369,-0.033701,-0.427224,-0.148684,1.569809,1.185246,0.581186,-0.099992,0.77961
3,0.737668,0.318566,2.626519,-0.160563,15.059754,2.326025,0.971979,1.167362,0.782797,1.985677
4,-0.421125,-0.529051,-0.664455,-0.393064,-0.157448,-0.534639,-0.428574,-0.55794,-0.469326,-0.463279
5,-0.075959,0.240185,1.564974,0.073882,0.265416,0.324282,0.23466,0.146094,0.061934,0.221851
6,1.034506,5.30109,2.017912,6.158613,0.167817,0.897869,0.808724,4.774582,5.096465,2.845932
7,0.84216,0.13188,0.909814,-0.186531,-0.060347,2.217114,0.535102,0.872535,1.327204,11.160533
8,0.431819,0.15982,-0.095927,-0.401756,0.070228,6.253648,1.169243,0.629467,0.235013,-0.198343
9,8.256341,3.411375,3.026491,-0.344657,-0.073471,8.898319,22.341896,3.960796,4.155964,2.299269


### Applying model's scaling to the new dataset

In [133]:
X_scaled = scaler.transform(X)

### Computing the closest distance from each cluster
To find the most appropriate cluster for the new data point

In [139]:
new_cluster_labels = np.argmin(cdist(X_scaled, cluster_centroid, metric='euclidean'), axis=1)
new_cluster_labels

array([1, 2, 6])

---

<center>
    
## Results

</center>

---