# Model Reproducibility

In this notebook I will reproduce one of the examples from the publication associated to the model and make sure the Ersilia Model Hub implementation is giving the same results.
The test is explained in the ReadMe File

In [3]:
# In this codeblock I will import the necessary packages and specify the paths to relevant folders

import pandas as pd
import numpy as np
from sklearn.metrics import roc_curve, roc_auc_score, balanced_accuracy_score, confusion_matrix, cohen_kappa_score
import matplotlib.pyplot as plt


prediction_folder_path= r"C:\Users\HP\Desktop\Ersilia_ModelValidation\Data\Output"


In [4]:
# In this codeblock I will load the predictions obtained from the original author's code implementation from the /data folder

author_prediction= pd.read_csv(r"C:\Users\HP\Desktop\Ersilia_ModelValidation\Data\Output\ADME@NCATS_predictions.csv")

In [5]:
author_prediction.head()

Unnamed: 0,smiles,solubility (µg/mL),solubility_original_class,Predicted Class,Probability
0,CCOC(=O)N[C@@H]1CC[C@@H]2[C@@H](C1)C[C@H]1C(=O...,<1,1,1,1.0
1,Clc1cc(Cl)c(OCC#CI)cc1Cl,<1,1,1,1.0
2,c1ccc(-c2ccc(C(c3ccccc3)n3ccnc3)cc2)cc1,<1,1,1,1.0
3,Cc1cc(/C=C/C#N)cc(C)c1Nc1ccnc(Nc2ccc(C#N)cc2)n1,<1,1,1,1.0
4,CN(C/C=C/C#CC(C)(C)C)Cc1cccc2ccccc12,<1,1,1,1.0


## Evaluation Metrics

The model performance was assessed using different sta tistical measures. A receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false positive rate, was used to estimate the predictive power of the classification model. 

The area under the ROC curve (i.e., AUC-ROC) is a numerical value between 0 and 1. The higher the value, the better the predictive power. 

Sensitivity indicates the proportion of true positives correctly predicted as positive. 

Specificity is the ability of the model to correctly predict true negatives as negative.

Balanced accuracy (BACC) is an average of the sensitivity and specificity. 
 
Cohen’s kappa is another metric used in this study that measures agreement between the actual classes and the classes predicted by the classifier.

where TP: true positives; 

FN: false negatives;

TN: true negatives; 

 FP: false positives.

## Author's model evaluation

In [7]:
# In this codeblock I will recreate the figure or value I am to reproduce, for example an AUROC
auc_roc = roc_auc_score(author_prediction['solubility_original_class'], author_prediction['Predicted Class'])
bacc = balanced_accuracy_score(author_prediction['solubility_original_class'], author_prediction['Predicted Class'])
tn, fp, fn, tp = confusion_matrix(author_prediction['solubility_original_class'], author_prediction['Predicted Class']).ravel()

sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
kappa = cohen_kappa_score(author_prediction['solubility_original_class'], author_prediction['Predicted Class'])

In [10]:
# Create a DataFrame to store the metric results
metric_results_adme_ncats = pd.DataFrame({
    'Metric': ['AUC-ROC', 'BACC', 'sensitivity', 'specificity',  'Cohen\'s Kappa'],
    'Score': [auc_roc, bacc, sensitivity ,specificity, kappa]
})

print(metric_results_adme_ncats.T)



               0         1            2            3              4
Metric   AUC-ROC      BACC  sensitivity  specificity  Cohen's Kappa
Score   0.836467  0.836467     0.804878     0.868056       0.614017


## Ersilia eos74bo model evaluation

In [11]:
# In this codeblock I will load the predictions obtained from the Ersilia Model Hub implementation saved in the /data folder
ersilia_prediction= pd.read_csv(r"C:\Users\HP\Desktop\Ersilia_ModelValidation\Data\Output\ersilia_eos74bo_prediction.csv")

In [13]:
ersilia_prediction.head()

Unnamed: 0.1,Unnamed: 0,smiles,solubility (µg/mL),solubility_original_class,prediction,predicted_class
0,0,CCOC(=O)N[C@@H]1CC[C@@H]2[C@@H](C1)C[C@H]1C(=O...,<1,1,0.997,1
1,1,Clc1cc(Cl)c(OCC#CI)cc1Cl,<1,1,1.0,1
2,2,c1ccc(-c2ccc(C(c3ccccc3)n3ccnc3)cc2)cc1,<1,1,0.996,1
3,3,Cc1cc(/C=C/C#N)cc(C)c1Nc1ccnc(Nc2ccc(C#N)cc2)n1,<1,1,1.0,1
4,4,CN(C/C=C/C#CC(C)(C)C)Cc1cccc2ccccc12,<1,1,0.996,1


In [15]:
auc_roc = roc_auc_score(ersilia_prediction['solubility_original_class'], ersilia_prediction['predicted_class'])
bacc = balanced_accuracy_score(ersilia_prediction['solubility_original_class'], ersilia_prediction['predicted_class'])
tn, fp, fn, tp = confusion_matrix(ersilia_prediction['solubility_original_class'], ersilia_prediction['predicted_class']).ravel()

sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
kappa = cohen_kappa_score(ersilia_prediction['solubility_original_class'], ersilia_prediction['predicted_class'])

# Create a DataFrame to store the metric results
metric_results_eos74bo = pd.DataFrame({
    'Metric': ['AUC-ROC', 'BACC', 'sensitivity', 'specificity',  'Kappa'],
    'Score': [auc_roc, bacc, sensitivity ,specificity, kappa]
})

print(metric_results_eos74bo.T)

               0         1            2            3         4
Metric   AUC-ROC      BACC  sensitivity  specificity     Kappa
Score   0.836467  0.836467     0.804878     0.868056  0.614017


In [24]:
# In this codeblock I will compare the orginal implementation vs the EMH result

# Add model names to each DataFrame
metric_results_adme_ncats['Model'] = 'ADME@NCATS'
metric_results_eos74bo['Model'] = 'eos74bo'

# Concatenate the two DataFrames horizontally
comparison_table = pd.concat([metric_results_adme_ncats, metric_results_eos74bo], axis=0)

# Round the values to 2 decimal places
comparison_table_rounded = comparison_table.round(2)

# Transpose the comparison table
comparison_table_transposed = comparison_table_rounded.T

# Print the transposed comparison table
print(comparison_table_transposed)

                 0           1            2            3              4  \
Metric     AUC-ROC        BACC  sensitivity  specificity  Cohen's Kappa   
Score         0.84        0.84          0.8         0.87           0.61   
Model   ADME@NCATS  ADME@NCATS   ADME@NCATS   ADME@NCATS     ADME@NCATS   

              0        1            2            3        4  
Metric  AUC-ROC     BACC  sensitivity  specificity    Kappa  
Score      0.84     0.84          0.8         0.87     0.61  
Model   eos74bo  eos74bo      eos74bo      eos74bo  eos74bo  


## I was able to achieve the same result with the two models when validated with a subset of NPC marketed drugs with 185 dataset