
<br>
<center>
<font size='7' style="color:#0D47A1">  <b>Exploratory Analysis & <br><br> Feature Selection</b> </font>
</center>

<hr style= "height:3px;">
<br>


<hr style= "height:1px;">
<font size='6' style="color:#000000">  <b>Content</b> </font>
<a name="content"></a>
<br>
<br>

1. [Abstract](#abstract) 
<br>

2. [Setup](#setup)
<br>

3. [Loading Data](#loading)
<br>

4. [Explore the Data](#exploration)
<br>

5. [Data Cleaning](#cleaning)
<br>

6. [Feature Selection](#features)
<br>

7. [Results & Discussion](#results)
<br>

8. [Perspectives](#perspectives)
<br>

9. [References](#references)
<br>

<hr style= "height:1px;">

<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>1. Abstract</b> </font>
<a name="setup"></a>

[[ Back to Top ]](#content)
<br>
<br>

<font size='4'>

In this Notebook e performed a basic data exploration, cleaning, analysis and feature selection for the Delaney's solubility dataset. After removing anomalous data points from the dataset and performing feature selection we save the new dataset as a csv file to use it for ML models. 

</font> 

<br>
<br>
<font size='5' style="color:#4CAF50">  <b>Purpose</b></font>

<br>
<font size='4'>
    
We will perform data cleaning, data analysis and feature selection of the raw dataset to obtain more accurate ML models. 

</font> 


<br>
<br>


<br>
<br>
<font size='5' style="color:#4CAF50">  <b>Goals</b></font>


 - Remove anomalies from the dataset.
 - Perform a feature importance ranking.
 - Add relevant molecular descriptors as features.
 - Perform a multocollinearity analysis of the features.
 - Obtain a new processed dataset for the ML models.
 

<br>
<br>
<font size='5' style="color:#4CAF50">  <b>Methodology/Plan</b></font>

1. Data Cleaning.
 - Remove anomalies.
2. Feature Selection.
 - Ranking if the origianl features.
 - Add relevant molecular descriptors.
 - Ranking of new features.
3. Generate new pre-processed dataset.
 



<br>
<br>
<br>
<br>
<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>2. Setup</b> </font>
<a name="setup"></a>

[[ Back to Top ]](#content)
<br>
<br>

The following imports are divided by sections according to their role in the notebook. 



In [None]:

# Data Science 
# ==============================================================================
import pandas as pd
import numpy as np


# Sklearn Basic imports
# ==============================================================================
from sklearn import metrics
from sklearn.model_selection import train_test_split


# Sklearn ML model realted imports
# ==============================================================================
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor


# Sklearn anomaly detection and feature selection
# ==============================================================================
from sklearn.ensemble import IsolationForest
from sklearn.inspection import permutation_importance


# Rdkit import for molecular features
# ==============================================================================
!pip install rdkit-pypi
import rdkit
import rdkit.Chem
import rdkit.Chem.Fragments
import rdkit.Chem.Descriptors
import rdkit.Chem.rdchem
from rdkit.Chem import Draw


# Multicollinearity analysis
# ==============================================================================
from scipy.stats import spearmanr
from scipy.cluster import hierarchy


# Ploting 
# ==============================================================================
import matplotlib.pyplot as plt
import seaborn as sns


# Image processing
# ==============================================================================
from PIL import Image
import io


# HTML Widgets
# ==============================================================================
import plotly.graph_objs as go
from ipywidgets import HTML
from ipywidgets import Image, Layout
from ipywidgets import HBox, VBox


In [None]:
import bokeh

<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>3. Loading Data</b> </font>
<a name="loading"></a>

[[ Back to Top ]](#content)

<br>
<br>
<br>


In [None]:

# Load the raw data
# ==============================================================================
df_raw = pd.read_csv('https://raw.githubusercontent.com/LilianaArguello/RIIA_test/main/data/delaney-processed.csv')
df_raw.head()


In [None]:

# Quick overview of the data statistics
# ==============================================================================
df_raw.describe()


In [None]:

# We whould never modify the raw external data so here we take the data we need and store it in our own
# dataframe for further analysis
# ==============================================================================
df = df_raw[['Molecular Weight','Minimum Degree','Number of H-Bond Donors','Number of Rotatable Bonds',
             'Polar Surface Area','Number of Rings','smiles','measured log solubility in mols per litre']]
df.head()


In [None]:

# Let's take a look at the molecules' statistics grouping them by their number of rings
# Having rings is an important molecular descriptor 
# ==============================================================================
df.groupby('Number of Rings').describe()['Molecular Weight']


In [None]:

# Generate the Python Mols objects
# ==============================================================================
df_raw['mol'] = df_raw['smiles'].apply(lambda x: rdkit.Chem.MolFromSmiles(x))


In [None]:

# Generate the images for the molecules
# ==============================================================================
df_raw['img'] = df_raw['mol'].apply(lambda m: Draw.MolToImage(m))


<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>4. Explore the Data</b> </font>
<a name="exploration"></a>

[[ Back to Top ]](#content)

<br>
<br>
<br>


In [None]:

# It is important to know the distribution of the target variable
# ==============================================================================
sol_hist = sns.displot(x='measured log solubility in mols per litre', hue='Number of Rings', 
           palette='rainbow', height=7, data=df, multiple="stack")

sol_hist.fig.set_figwidth(15)
plt.xlim(-12, 2)
plt.show()


In [None]:

# We should also know how our independent varibles are distributed
# ==============================================================================
mw_hist = sns.displot(x='Molecular Weight', hue='Number of Rings', palette='rainbow', height=7,
           data=df, multiple="stack")

mw_hist.fig.set_figwidth(15)
plt.xlim(0,800)
plt.show()


In [None]:

# Here we show the distribution of both the solubility and the molecular weight.
# Both variables grouped by the number of rings in the molecules.
# ==============================================================================
jointplot_mw = sns.jointplot(x='Molecular Weight', y='measured log solubility in mols per litre', 
               hue='Number of Rings', 
               palette='rainbow', height=7,
               data=df)

jointplot_mw.fig.set_figwidth(15)
plt.ylim(-12,2)
plt.show()


In [None]:

# Same analysis as before for the polar surface area
# ==============================================================================
sol_hist = sns.displot(x='Polar Surface Area', hue='Number of Rings', palette='rainbow', height=7,
           data=df, multiple="stack")

sol_hist.fig.set_figwidth(15)
plt.xlim(0,280)
plt.show()


In [None]:

# This violon plot shows the distribution of the measured solubilities as a funciton of molecular weight
# Molecules are again gruped by their number of rings
# ==============================================================================
plt.figure(figsize=(15,8))

mw_boxplot = sns.violinplot(x='Number of Rings', y='measured log solubility in mols per litre',  
             palette='rainbow',
             data=df)


plt.show()


In [None]:

# Here we show the relationship that exist beetwenn our tarhet variablle and the independent variables
# ==============================================================================
sns.set_context("paper")
pairplot = sns.pairplot(df,
             x_vars=['Molecular Weight','Number of H-Bond Donors',
                     'Number of Rotatable Bonds','Polar Surface Area'],
             y_vars=['measured log solubility in mols per litre'],
             hue='Number of Rings',
             palette='rainbow')

pairplot.fig.set_figheight(5)
pairplot.fig.set_figwidth(15)


plt.show()

In [None]:
df_raw

In [None]:
from rdkit.Chem import PandasTools

In [None]:

# Generate the Python Mols objects
# ==============================================================================
df_raw['mol'] = df_raw['smiles'].apply(lambda x: rdkit.Chem.MolFromSmiles(x))

In [None]:

# Generate the images for the molecules
# ==============================================================================
df_raw['img'] = df_raw['mol'].apply(lambda m: rdkit.Chem.Draw.MolToImage(m))

In [None]:

# Single molecule drawing
# ==============================================================================
df_raw.iloc[0]['img']

In [None]:

# Single molecule drawing
# ==============================================================================
df_raw.iloc[1]['img']

In [None]:

# Single molecule drawing
# ==============================================================================
df_raw.iloc[2]['img']

In [None]:

# Generate the interactive figure
# ==============================================================================
fig = go.FigureWidget(
    data=[
        dict(
            type='scattergl',
            x=df_raw['Molecular Weight'],
            y=df_raw['ESOL predicted log solubility in mols per litre'],
            mode='markers',
        )
    ],
)

In [None]:

# See the scatter protion of the figure
# ==============================================================================
scatter = fig.data[0]


In [None]:

# Look at the scatter parameters
# ==============================================================================
scatter


In [None]:

# Titles, size anf font
# ==============================================================================
fig.layout.title = 'Solubility'
fig.layout.titlefont.size = 22
fig.layout.titlefont.family = 'Rockwell'
fig.layout.xaxis.title = 'Molecular Weight'
fig.layout.yaxis.title = 'log solubility in mols per litre'


In [None]:

# Avoid overlaps
# ==============================================================================
N = len(df_raw)
scatter.x = scatter.x + np.random.rand(N) * 10
scatter.y = scatter.y + np.random.rand(N) * 1


In [None]:

# Set the opacity
# ==============================================================================
scatter.marker.opacity = 0.5


In [None]:

# Hovering mode
# ==============================================================================
fig.layout.hovermode = 'closest'


In [None]:

# Process the PIL Image from rdkit
# ==============================================================================
def image_to_byte_array(image:Image):
    imgByteArr = io.BytesIO()
    image.save(imgByteArr, format= 'PNG')
    imgByteArr = imgByteArr.getvalue()
    
    return imgByteArr


In [None]:

# Hovering function
# ==============================================================================
def hover_fn(trace, points, state):

    ind = points.point_inds[0]

    # Update details HTML widget
    details.value = df_raw[['Molecular Weight','smiles']].iloc[ind].to_frame().to_html()

    # Update image widget
    molecule = df_raw['img'][ind]#.replace(' ', '_')
    image_widget.value = image_to_byte_array(molecule)
    

In [None]:

# Pass the hovering function to the scatter plot
# ==============================================================================
scatter.on_hover(hover_fn)


In [None]:

# Pass molecular details to an HTML Box
# ==============================================================================
details = HTML()
details


In [None]:

# Color hue and color scale for the markers
# ==============================================================================
scatter.marker.color      = df_raw['Number of Rings']
scatter.marker.colorscale = 'rainbow'


In [None]:

# Initialize the image widget
# ==============================================================================
image_widget = Image(
    value    = image_to_byte_array(df_raw['img'][0]),
    layout=Layout(height='500px', width='500px')
)

image_widget

In [None]:

# Condense everything into a single HTML widget
# ==============================================================================
VBox([fig, HBox([image_widget, details])])


<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>5. Data Cleaning</b> </font>
<a name="cleaning"></a>

[[ Back to Top ]](#content)

<br>
<br>
<br>



# Multivariate Outlier Analysis: Anomaly Detection


In [None]:

# Here we separate the independent variables for their analysis
# ==============================================================================
X = df[['Molecular Weight','Minimum Degree','Number of H-Bond Donors','Number of Rotatable Bonds',
             'Polar Surface Area','Number of Rings']]
X.head()


In [None]:

# Definition and training of the IsolationForest Model for Anomaly detection
# Pleasenote that this is a unsupervised model and hence thereis no objective way to train it
# The following is a Naive set of parammeters
# ==============================================================================
modelo_isof = IsolationForest(
                n_estimators  = 1000,
                max_samples   ='auto',
                contamination = 0.1,
                random_state  = 0)

modelo_isof.fit(X)


In [None]:

# Prediction from the Anomaly Detection Model
# ==============================================================================
X['anomaly'] = modelo_isof.predict(X)    # Anomaly prediction| 1:Ok | -1:Anomaly


In [None]:

# Number of anomalies predicted by the number of rings in the molecule
# ==============================================================================
anomaly       = X.loc[X['anomaly']==-1]
anomaly_index = list(anomaly.index)
anomaly.groupby('Number of Rings').describe()['Molecular Weight']



In [None]:

# Dataset free of anomalies
# ==============================================================================
df_clean = df.loc[X['anomaly']==1]
df_clean_index = list(df_clean.index)
df_clean.groupby('Number of Rings').describe()['Molecular Weight']


<br>
<br>
<br>

# Comparison of the raw and clean features



In [None]:

# Anomaly-free and raw solubility
# ==============================================================================
sol_hist_clean = sns.displot(x='measured log solubility in mols per litre', hue='Number of Rings', palette='rainbow', height=7,
           data=df_clean, multiple="stack")

sol_hist_clean.fig.set_figwidth(15)

# ==============================================================================

sol_hist = sns.displot(x='measured log solubility in mols per litre', hue='Number of Rings', palette='rainbow', height=7,
           data=df, multiple="stack")

sol_hist.fig.set_figwidth(15)


plt.xlim(-12, 2)
plt.show()


In [None]:

# Anomaly-free and raw molecular weight
# ==============================================================================
mw_hist_clean = sns.displot(x='Molecular Weight', hue='Number of Rings', palette='rainbow', height=7,
           data=df_clean, multiple="stack")

mw_hist_clean.fig.set_figwidth(15)

# ==============================================================================

mw_hist = sns.displot(x='Molecular Weight', hue='Number of Rings', palette='rainbow', height=7,
           data=df, multiple="stack")

mw_hist.fig.set_figwidth(15)
plt.show()


In [None]:

# Anomaly-free and raw molecular distribution of solubilities vs molecular weights
# ==============================================================================
jointplot_mw_clean = sns.jointplot(x='Molecular Weight', y='measured log solubility in mols per litre', hue='Number of Rings', 
           palette='rainbow', height=7,
           data=df_clean)

jointplot_mw_clean.fig.set_figwidth(15)

# ==============================================================================

jointplot_mw = sns.jointplot(x='Molecular Weight', y='measured log solubility in mols per litre', hue='Number of Rings', 
           palette='rainbow', height=7,
           data=df)

jointplot_mw.fig.set_figwidth(15)


plt.ylim(-12,2)
plt.show()


In [None]:

# Anomaly-free and raw molecular polar surface area
# ==============================================================================
sol_hist_clean = sns.displot(x='Polar Surface Area', hue='Number of Rings', palette='rainbow', height=7,
           data=df_clean, multiple="stack")

sol_hist_clean.fig.set_figwidth(15)

# ==============================================================================

sol_hist = sns.displot(x='Polar Surface Area', hue='Number of Rings', palette='rainbow', height=7,
           data=df, multiple="stack")

sol_hist.fig.set_figwidth(15)
plt.show()


In [None]:

# Anomaly-free and raw violinn plots analysis
# ==============================================================================
plt.figure(figsize=(15,8))

mw_boxplot_clean = sns.violinplot(x='Number of Rings', y='measured log solubility in mols per litre',  
             palette='rainbow',
             data=df_clean)

# ==============================================================================

mw_boxplot = sns.violinplot(x='Number of Rings', y='measured log solubility in mols per litre',  
             palette='rainbow',
             data=df)

plt.show()


In [None]:

# Anomaly-free and raw feature distributio analysis
# ==============================================================================
sns.set_context("paper")
pairplot_clean = sns.pairplot(df_clean,
             x_vars=['Molecular Weight','Number of H-Bond Donors',
                     'Number of Rotatable Bonds','Polar Surface Area'],
             y_vars=['measured log solubility in mols per litre'],
             hue='Number of Rings',
             palette='rainbow')

pairplot_clean.fig.set_figheight(5)
pairplot_clean.fig.set_figwidth(15)

# ==============================================================================

pairplot = sns.pairplot(df,
             x_vars=['Molecular Weight','Number of H-Bond Donors',
                     'Number of Rotatable Bonds','Polar Surface Area'],
             y_vars=['measured log solubility in mols per litre'],
             hue='Number of Rings',
             palette='rainbow')

pairplot.fig.set_figheight(5)
pairplot.fig.set_figwidth(15)
plt.show()


<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>6. Feature Selection</b> </font>
<a name="features"></a>

[[ Back to Top ]](#content)

<br>
<br>
<br>



# Feauture Importance for the Original Features


In [None]:

# Here we pick the feautures after data cleaning
# ==============================================================================
X_clean = X.loc[X['anomaly']==1]
X_clean.drop(columns='anomaly', inplace=True);


In [None]:

# Anomaly-free target variable
# ==============================================================================
y_clean = df_clean['measured log solubility in mols per litre']


In [None]:

# Train-Test split for the regressor model
# ==============================================================================
X_train, X_test, y_train, y_test = train_test_split(X_clean, y_clean, test_size=0.2, random_state=0)


In [None]:

# Pre-processing of the features
# ==============================================================================
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


In [None]:

# Setup and training of a simple regressor model
# ==============================================================================
regressor = RandomForestRegressor(n_estimators=1000, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)


In [None]:

# Performance metrics for the regressor model
# ==============================================================================
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))


In [None]:

# Feature ranking using Permutation Feature Importance
# ==============================================================================
feature_ranking = permutation_importance(regressor, X_train, y_train,
                  n_repeats=100, random_state=0)


In [None]:

# Ordering the features by their relevance
# ==============================================================================
perm_sorted_idx = feature_ranking.importances_mean.argsort()


In [None]:

# Feature ranking using the simple Random Forest Regressor
# ==============================================================================
plt.figure(figsize=(15,8))
plt.barh(X_clean.columns, regressor.feature_importances_)
plt.show()


In [None]:

# Feature ranking using the Permutation Feature Importance 
# ==============================================================================
plt.figure(figsize=(15,8))
plt.boxplot(feature_ranking.importances[perm_sorted_idx].T, vert=False,
            labels=X_clean.columns[perm_sorted_idx])
plt.show()


<br>
<br>
<br>

# Exercise: Creating New Features


In [None]:

# Adding molecular objects to the clean dataframe 
# ==============================================================================
df_clean['mol'] = df_clean['smiles'].apply(rdkit.Chem.MolFromSmiles)
df_clean.head()


In [None]:

# Adding the nummber of valence electrons
# ==============================================================================
df_clean['new_feature_1'] = df_clean['mol'].apply(rdkit.Chem.Descriptors.AAA)
df_clean.head()


In [None]:

# Function to calculate the number of aromatic atoms in a molecule
# ==============================================================================
def number_Aromatic_Atoms(mol):
    return sum([1 for _ in mol.GetAromaticAtoms()])


In [None]:

# Adding the number of aromatic atoms for each molecule
# ==============================================================================
df_clean['Number of Aromatic Atoms'] = df_clean['mol'].apply(number_Aromatic_Atoms)
df_clean.head()


In [None]:

# Function to calculate the number of conjugated bonds that are nt part of a ring
# ==============================================================================
def number_Conjugated_bonds(mol):
        return sum([1 for bond in mol.GetBonds() if (bond.GetIsConjugated() and not bond.IsInRing())])


In [None]:

# Adding the number of aromatic atoms for each molecule
# ==============================================================================
df_clean['Number of Conjugated Bonds'] = df_clean['mol'].apply(number_Conjugated_bonds)
df_clean


<br>

## From here on we fit a new model with the new features

In [None]:

# Selecting the set of features
# ==============================================================================
features = df_clean[['Molecular Weight','Polar Surface Area','Number of Rings','Number of Valence Electrons','Number of Aromatic Atoms','Number of Conjugated Bonds']]
features.head()


In [None]:

# Define the target variable
# ==============================================================================
target = df_clean['measured log solubility in mols per litre']


In [None]:

# Train-Test split for the new model
# ==============================================================================
features_train, features_test, target_train, target_test = train_test_split(features, target, 
                                                           test_size=0.2, random_state=0)


In [None]:

# Pre-processing of the new features
# ==============================================================================
features_train = sc.fit_transform(features_train)
features_test = sc.transform(features_test)


In [None]:

# Training and predictions of the new model
# ==============================================================================
new_regressor = RandomForestRegressor(n_estimators=1000, random_state=0)
new_regressor.fit(features_train, target_train)
target_pred = new_regressor.predict(features_test)


In [None]:

# Performance of the new model
# ==============================================================================
print('Mean Absolute Error:', metrics.mean_absolute_error(target_test, target_pred))
print('Mean Squared Error:', metrics.mean_squared_error(target_test, target_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(target_test, target_pred)))


In [None]:

# Ranking of the new model using PFI
# ==============================================================================
new_feature_ranking = permutation_importance(new_regressor, features_train, target_train,
                  n_repeats=100, random_state=0)


In [None]:

# Sorting the PFI results
# ==============================================================================
new_perm_sorted_idx = new_feature_ranking.importances_mean.argsort()


In [None]:

# Ranking of the regressor model
# ==============================================================================
plt.figure(figsize=(15,8))
plt.barh(features.columns, new_regressor.feature_importances_)
plt.show()


In [None]:

# New Model PFI ranking
# ==============================================================================
plt.figure(figsize=(15,8))
plt.boxplot(new_feature_ranking.importances[new_perm_sorted_idx].T, vert=False,
            labels=features.columns[perm_sorted_idx])
plt.show()


<br>
<br>
<br>

# Regaring Mulcollinearity

In [None]:


# Correlation and collinearity analysis between the new features
# ==============================================================================

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))

corr = spearmanr(features).correlation
corr_linkage = hierarchy.ward(corr)

dendro = hierarchy.dendrogram(
    corr_linkage, labels=features.columns, ax=ax1, leaf_rotation=90)

dendro_idx = np.arange(0, len(dendro['ivl']))

ax2.imshow(corr[dendro['leaves'], :][:, dendro['leaves']], cmap='jet_r', )
ax2.set_xticks(dendro_idx)
ax2.set_yticks(dendro_idx)
ax2.set_xticklabels(dendro['ivl'], rotation='vertical')
ax2.set_yticklabels(dendro['ivl'])
fig.tight_layout()
plt.show()


In [None]:

# Definition of the final dataframe
# ==============================================================================
final_df = df_clean[['Molecular Weight','Polar Surface Area','Number of Rings','Number of Rings',
                     'Number of Valence Electrons','Number of Aromatic Atoms','Number of Conjugated Bonds',
                     'measured log solubility in mols per litre']]
final_df.describe()


<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>7. Results & Discussion</b> </font>
<a name="results"></a>

[[ Back to Top ]](#content)

<br>
<br>
<br>


<font size='4'>

The anomalous datapoints remove from the dataset were indeed molecules ith extreme values across one or many dimenions of the original dataset. The most obvous example is the identification as anomalies of the two molecules with a molecularweight ~800 and containing 8 rings. Such data points are indeed higly anomalous would ony induce errors in the training of any ML model. The reader is encourage to explore other points detected as anomalous and see how they present extreme values across different dimenions. 

After running a simple Isolation Forest model for feature selection we remove 3 out 6 of the origianl features adding 3 new ones. The new dataset performs better, as measure by MAE, compare with  the original dataset.

Finally, some collinearity was found in the final dataset, however, no further feature elimination was carried out. We will explore other feature selection stratgies once we build more ML models.


</font> 




<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

<font size='6' style="color:#00A6D6">  <b>8. Perspectives</b> </font>
<a name="perspectives"></a>

[[ Back to Top ]](#content)

<br>
<br>
<br>



    i.   The SMILES for all this work were not sanitized. Add this feature for future work.
    ii.  The Isolation Forest Model used standar parameters. Optimize this model. 
    iii. Explore other molecular descriptors.
    iv.  Expand the analysis for topological representations and descriptors. 

<br>
<br>
<br>
<br>
<br>
<br>
<font size='6' style="color:#00A6D6">  <b>9. References.</b> </font>
<a name="references"></a>

[[ Back to Top ]](#Table-of-contents)

[1] <a href=https://blog.paperspace.com/anomaly-detection-isolation-forest/>Anomaly Detection Using Isolation Forest in Python</a>

[2] <a href=https://christophm.github.io/interpretable-ml-book/feature-importance.html>Permutation Feature Importance</a>

[3] <a href=http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/4-5-Multiple-collinearity.html>Multicollinearity</a>