<a href="https://colab.research.google.com/github/ersilia-os/event-fund-ai-drug-discovery/blob/main/notebooks/session4_skills.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title 🔀 Connect to Google Drive
#@markdown This implementation of the Ersilia Model Hub uses Google Drive as a storing system. 
#@markdown The molecules you want to predict must be uploaded as a .csv file in Drive and the output will also be stored in Derive

from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title 🔍 Import Python Packages
#@markdown Run this cell to import the necessary python packages to run the notebook.
%%capture

!pip install standardiser
!pip install rdkit

from standardiser import standardise
from rdkit import Chem

import pandas as pd
import matplotlib.pyplot as plt

# 📩 Input Data


In [None]:
#@markdown ✍ Add the path to your input file in Google Drive
file_path = "drive/MyDrive/ersilia_hub/chembl3882128.csv" #@param {type:"string"}


In [None]:
#@markdown Run this cell to get a visualisation of your data table!

df = pd.read_csv(file_path)
df.head()

In [None]:
#@markdown ✍ Specify the name of the SMILES column. Remember that Python requires exact matching of letters, including lower and upper cases.
smiles_column = "Smiles" #@param {type:"string"}

In [None]:
#@title ♻ Standardise molecules
#@markdown By running this cell you will standardise your molecules according to the rules defined by ChEMBL's [standardiser](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/standardiser.pdf)

mols = [Chem.MolFromSmiles(smi) for smi in df[smiles_column].tolist()]
st_mols = [standardise.run(mol) for mol in mols]
st_smiles = [Chem.MolToSmiles(st_mol) for st_mol in st_mols]
print(st_smiles)

In [None]:
#@markdown ✍ Please specify where you want to save the standard smiles list. Be careful not to overwrite your other files by giving it a different name.
#@markdown By running this cell, you will save the molecules in a.csv file ready to be inputed to the Ersilia Model Hub!
standard_data = "drive/MyDrive/ersilia_hub/chembl3882128_smiles.csv" #@param {type:"string"}
stdf = pd.DataFrame()
stdf["smiles"] = st_smiles
stdf.to_csv(standard_data, index=False)

# 💻 The Ersilia Model Hub

The Ersilia Model Hub is a Python Library developed for UNIX environments (MacOS, Linux). It can be installed and accessed via CLI natively on Linux and MacOS computers and on a Windows Subsystem for Linux.

This notebook implements a selection of models from the Ersilia Model Hub in Colab.

In [None]:
#@title Click on the play button to install Ersilia in this Colab notebook.

%%capture
%env MINICONDA_INSTALLER_SCRIPT=Miniconda3-py37_4.12.0-Linux-x86_64.sh
%env MINICONDA_PREFIX=/usr/local
%env PYTHONPATH={PYTHONPATH}:/usr/local/lib/python3.7/site-packages
%env CONDA_PREFIX=/usr/local
%env CONDA_PREFIX_1=/usr/local
%env CONDA_DIR=/usr/local
%env CONDA_DEFAULT_ENV=base
!wget https://repo.anaconda.com/miniconda/$MINICONDA_INSTALLER_SCRIPT
!chmod +x $MINICONDA_INSTALLER_SCRIPT
!./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX
!python -m pip install git+https://github.com/ersilia-os/ersilia.git
!python -m pip install requests --upgrade
import sys
_ = (sys.path.append("/usr/local/lib/python3.7/site-packages"))

In [None]:
#@markdown ✍ Select the model of interest from the [Ersilia Model Hub](https://ersilia.io/model-hub)
model = "eos96ia" #@param {type:"string"}


In [None]:
#@markdown ✍ Insert the path to the input smiles (.csv with a SMILES column) and the path to the desired output file.
input_smiles = "drive/MyDrive/ersilia_hub/chembl3882128_smiles.csv" #@param {type:"string"}
output_file = "drive/MyDrive/ersilia_hub/chembl3882128_eos96ia.csv" #@param {type:"string"}


In [None]:
#@title ⏬ Fetch the model
import os
os.environ["EOS_ID"] = model
!ersilia fetch $EOS_ID

In [None]:
#@title ⚡ Serve the model
!ersilia serve $EOS_ID

In [None]:
#@title 🚀 Run predictions and store results in the specified Drive folder
os.environ["INPUT"] = input_smiles
os.environ["OUTPUT"] = output_file
!ersilia api -i $INPUT -o $OUTPUT

## 📋 Results interpretation

The Ersilia Model Hub output in a .csv file always contain:

*   key: InChiKey representation of the input molecules
*   input: canonical smiles of the input molecules
*   column 3: results column, contains the predicted values, probabilities... the name of the colum depends on the model

*some models provide more than one output, which you will find in subsequent columns 4, 5 ...*



In [None]:
#@title Check the model output
#@markdown This cell will print the first 5 rows of the Ersilia Model Hub output.

df = pd.read_csv(output_file)
df.head()

In [None]:
#@title Distribution of predictions
#@markdown This cell takes the values in the THIRD column and plots a simple histogram. If you want to plot other columns in a histogram, please change the column number in the code below.

plt.hist(df.iloc[:,2], color="#50285a")
plt.show()

# ❗ If something went wrong:

If you find an error when running a specific model, please open an issue on the [Ersilia Model Hub GitHub](https://github.com/ersilia-os/ersilia/issues) and we will respond as soon as possible.

*Note that you need a GitHub account to post issues*