# ADMET Analysis

Measuring the Absorption, Distribution, Metabolism, Excretion, and Toxicity metrics for the docked compounds is crucial to assess the chances of success or failure of the compounds in clinical trials. To estimate this, I use [ADMET AI](https://admet.ai.greenstonebio.com). The results include a radila plot and a `.csv` file with the predicted properties.

Load the results in a DataFrame:

In [1]:
import pandas as pd

df = pd.read_csv('admet_ai_output.csv')
df.index = df.index+1

This results in many values, and we pick the most important ones to constuct a lead score. This is by taking the mean of all the selected values. Most properties are simple probabilities, so summing them is fine. However some of them are raw numbers. In these case, there are often heuritics-based rules that specific ranges of desired values.

To normalize these type of metrics, we define the following functions:
* `linear_score`: Returns a number between 0 and 1 within the range of a minimum and maximum value. It returns 0 below the minimum and 1 outside it.
* `trapezoid_score`: Return 1 for a specific range of optimum values, and a number between 0 and 1 everywhere else.
* `inverse_prob`: Some values, like toxicity, are probabilities, but we would need to invert them.

In [2]:
def trapezoid_score(x, min_val, low_opt, high_opt, max_val):
    if x <= low_opt:
        return linear_score(x, min_val, low_opt)
    if x >= high_opt:
        return linear_score(x, high_opt, max_val)
    
    return 1.0

def linear_score(x, min_val, max_val):
    if x <= min_val: return 0.0
    if x >= max_val: return 1.0
    m = 1/(max_val-min_val)
    return m*(x-min_val)

def inverse_prob(x):
    return 1.0 - x

Permiability, bioavailability and solubility scores are calculated using:
* Linear score for logS (between -8 and -2) and Caco2 permiability values (between -6 and -4.7)
* Values of 60 and 120 are prioritized for TPSA, with values between 20-60 and 120-160 moderately considered.

The ranges for these values are taken from papers cited in the main text.

In [3]:
df["score_logS"] = df["Solubility_AqSolDB"].apply(lambda x: linear_score(x, -8, -2))
df["score_tpsa"] = df["tpsa"].apply(lambda x: trapezoid_score(x, 20, 60, 120, 160))
df["Caco2_norm"] = df["Caco2_Wang"].apply(lambda x: linear_score(x, -6.0, -4.7))

ADMET AI reports various values for CYP inhibition. We take the inverse probability, and mean, of these for considering metabolism.

In [4]:
cyp_inhib_cols = ["CYP1A2_Veith", "CYP2C19_Veith", "CYP2C9_Veith", "CYP2D6_Veith", "CYP3A4_Veith"]

df["score_CYP_inhibition_mean"] = df[cyp_inhib_cols].map(inverse_prob).mean(axis=1)

ADMET AI also reports probability of the molecules to be toxic. We invert this probability and calculate the mean.

In [5]:
tox_cols = ["hERG", "AMES", "DILI", "ClinTox", "Carcinogens_Lagunin", "Skin_Reaction"]
df["score_toxicity_mean"] = df[tox_cols].map(inverse_prob).mean(axis=1)

We take the mean of all there values (include bioavailability metrics of `HIA_Hou`, `Bioavailability_Ma` and drug-likeness `QED`, which are all simple probabilities) to the get the lead score.

In [6]:
df["lead_score"] = (
      df["score_tpsa"]
    + df["score_logS"]
    + df["QED"]
    +  df["HIA_Hou"]
    +  df["Bioavailability_Ma"]
    + df["Caco2_norm"]
    + df["score_CYP_inhibition_mean"]
    + df["score_toxicity_mean"]
)/8

df_sorted = df.sort_values("lead_score", ascending=False)
df_sorted[['smiles', 'lead_score']]

Unnamed: 0,smiles,lead_score
1,C[C@@H]1Oc2ccccc2O[C@H]1C(=O)N1CCC[C@@H](N2CCN...,0.918034
5,O=C(Cc1ccco1)N1CC[C@@]2(C[C@H](Nc3ncccn3)CCO2)C1,0.897855
2,Cc1nc([C@@H]2CCCN(C(=O)CCc3cccnc3)C2)cc(=O)[nH]1,0.896795
4,Cc1ccc(CNC(=O)N[C@@H]2CCCN(c3ncccn3)C2)cn1,0.877063
8,Cc1nc(C)n([C@H]2CCCN(C(=O)c3cccc4c[nH]nc34)C2)n1,0.845674
3,C[C@@]1(C(=O)N2CCC(c3nc4cc(F)ccc4[nH]3)CC2)CCCCO1,0.844407
7,CO[C@@H](CNc1ncnc(N[C@@H]2CCC[NH2+]C2)n1)c1ccc...,0.790751
9,Cc1cccc(Nc2nc(N)nc(C[N@@H+]3C[C@@H]4CC(=O)N[C@...,0.698908
10,CN(C)c1n[nH]c(-c2cccc(C(=O)NCC[NH+]3CCCCC3)c2)n1,0.695348
6,Cc1cccc(Nc2nc(C[N@@H+]3C[C@@H]4CC(=O)N[C@@H]4C...,0.596646


The 5 top ranked compounds are compound number 1, 5, 2, 4 and 8. These are good candidates for lead optimization.