# Clearance, AstraZeneca

### Dataset Description: Drug clearance is defined as the volume of plasma cleared of a drug over a specified time period and it measures the rate at which the active drug is removed from the body. This is a dataset curated from ChEMBL database containing experimental results on intrinsic clearance, deposited from AstraZeneca. It contains clearance measures from two experiments types, hepatocyte and microsomes. As many studies [2] have shown various clearance outcomes given these two different types, we separate them.

### Task Description: Regression. Given a drug SMILES string, predict the activity of clearance.

### Dataset Statistics: 1,102/1,020 drugs for microsome/hepatocyte clearance.

### Metric: Spearman

## Leaderboard

| Rank | Model                       | Contact      | Link          | #Params   | Spearman      |
|------|-----------------------------|--------------|---------------|-----------|---------------|
| 1    | MapLight + GNN              | Jim Notwell  | GitHub, Paper | N/A       | 0.498 ± 0.009 |
| 2    | MapLight                    | Jim Notwell  | GitHub, Paper | N/A       | 0.466 ± 0.012 |
| 3    | Basic ML                    | Nilavo Boral | GitHub, Paper | N/A       | 0.440 ± 0.003 |
| 4    | ContextPred                 | Kexin Huang  | GitHub, Paper | 2,067,053 | 0.439 ± 0.026 |
| 5    | Chemprop                    | Kyle Swanson | GitHub, Paper | N/A       | 0.431 ± 0.006 |
| 6    | Chemprop-RDKit              | Kyle Swanson | GitHub, Paper | N/A       | 0.430 ± 0.021 |
| 7    | Euclia ML model             | Euclia       | GitHub, Paper | 50        | 0.424 ± 0.008 |
| 8    | AttrMasking                 | Kexin Huang  | GitHub, Paper | 2,067,053 | 0.413 ± 0.028 |
| 9    | NeuralFP                    | Kexin Huang  | GitHub, Paper | 480,193   | 0.401 ± 0.037 |
| 10   | RDKit2D + MLP (DeepPurpose) | Kexin Huang  | GitHub, Paper | 633,409   | 0.382 ± 0.007 |
| 11   | GCN                         | Kexin Huang  | GitHub, Paper | 191,810   | 0.366 ± 0.063 |
| 12   | AttentiveFP                 | Kexin Huang  | GitHub, Paper | 300,806   | 0.289 ± 0.022 |
| 13   | Morgan + MLP (DeepPurpose)  | Kexin Huang  | GitHub, Paper | 1,477,185 | 0.272 ± 0.068 |
| 14   | CNN (DeepPurpose)           | Kexin Huang  | GitHub, Paper | 226,625   | 0.235 ± 0.021 |

In [1]:
import pandas as pd
from deepmol.pipeline import Pipeline

2023-11-21 14:37:16.770368: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-21 14:37:16.800830: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-21 14:37:16.801762: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Skipped loading modules with pytorch-lightning dependency, missing a dependency. No module named 'pytorch_lightning'
Skipped loading some Jax models, missing a dependency. jax requires jaxlib to be installed. See https://github.com/google/jax#installation for installation instructions.


In [2]:
from deepmol.pipeline import VotingPipeline

voting_pipeline = VotingPipeline.load("clearance_hepatocyte/voting_pipeline")

The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.


In [9]:
for pipeline in voting_pipeline.pipelines:
    for step in pipeline.steps:
        try:
            print(step[1].model)
        except:
            print(step[1])


<deepmol.standardizer.chembl_standardizer.ChEMBLStandardizer object at 0x7fad6a3fb910>
<deepmol.compound_featurization.deepchem_featurizers.ConvMolFeat object at 0x7fad6a3fbc70>
<deepchem.models.graph_models.GraphConvModel object at 0x7f87b7f51ea0>
<deepmol.base.transformer.PassThroughTransformer object at 0x7fad6a578940>
<deepmol.compound_featurization.deepchem_featurizers.MolGraphConvFeat object at 0x7f87b7f6c5b0>
<deepchem.models.torch_models.gcn.GCNModel object at 0x7f87b7f6c6d0>
<deepmol.standardizer.chembl_standardizer.ChEMBLStandardizer object at 0x7f87b7f6c4f0>
<deepmol.compound_featurization.deepchem_featurizers.DMPNNFeat object at 0x7f87b7f6cf70>
<deepchem.models.torch_models.dmpnn.DMPNNModel object at 0x7f87b7f6d030>
<deepmol.standardizer.chembl_standardizer.ChEMBLStandardizer object at 0x7f87b7f6cdc0>
<deepmol.compound_featurization.deepchem_featurizers.DMPNNFeat object at 0x7f87b7f6d6c0>
<deepchem.models.torch_models.dmpnn.DMPNNModel object at 0x7f87b7f6d780>
<deepmol.stan

In [2]:
# read results
results = pd.read_csv('clearance_hepatocyte/tdc_test_set_results.txt', sep=',', header=None, dtype={0: int, 1: float, 2: float})
# set columns
results.columns = ['trial_id', 'mean', 'std']
results
# order res

Unnamed: 0,trial_id,mean,std
0,1,0.007,0.0
1,3,,
2,4,-0.041,0.001
3,5,0.382,0.023
4,6,0.068,0.008
5,7,,
6,10,-0.049,0.009
7,12,0.371,0.016
8,13,0.383,0.023
9,14,0.373,0.024


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
8,13,0.383,0.023
3,5,0.382,0.023
9,14,0.373,0.024
7,12,0.371,0.016
51,95,0.314,0.018
10,15,0.299,0.063
15,24,0.292,0.018
47,84,0.28,0.017
27,45,0.265,0.014
30,50,0.254,0.018


In [11]:
# load best trial pipeline (rank #10)
best_trial_id = int(results.iloc[0]['trial_id'])
pipeline = Pipeline.load(f"clearance_hepatocyte/trial_{best_trial_id}/")

NameError: name 'results' is not defined