# Clearance, AstraZeneca

### Dataset Description: Drug clearance is defined as the volume of plasma cleared of a drug over a specified time period and it measures the rate at which the active drug is removed from the body. This is a dataset curated from ChEMBL database containing experimental results on intrinsic clearance, deposited from AstraZeneca. It contains clearance measures from two experiments types, hepatocyte and microsomes. As many studies [2] have shown various clearance outcomes given these two different types, we separate them.

### Task Description: Regression. Given a drug SMILES string, predict the activity of clearance.

### Dataset Statistics: 1,102/1,020 drugs for microsome/hepatocyte clearance.

### Metric: Spearman

## Leaderboard

| Rank | Model                       | Contact           | Link          | #Params   | Spearman      |
|------|-----------------------------|-------------------|---------------|-----------|---------------|
| 1    | MapLight + GNN              | Jim Notwell       | GitHub, Paper | N/A       | 0.630 ± 0.010 |
| 2    | MapLight                    | Jim Notwell       | GitHub, Paper | N/A       | 0.626 ± 0.008 |
| 3    | RFStacker                   | Andrew Li         | GitHub, Paper | 1,858,225 | 0.625 ± 0.002 |
| 4    | Chemprop-RDKit              | Kyle Swanson      | GitHub, Paper | N/A       | 0.599 ± 0.025 |
| 5    | SimGCN                      | Suman Kalyan Bera | GitHub, Paper | 1,103,000 | 0.597 ± 0.025 |
| 6    | RDKit2D + MLP (DeepPurpose) | Kexin Huang       | GitHub, Paper | 633,409   | 0.586 ± 0.014 |
| 7    | AttrMasking                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.585 ± 0.034 |
| 8    | ContextPred                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.578 ± 0.007 |
| 9    | Euclia ML model             | Euclia            | GitHub, Paper | 50        | 0.572 ± 0.010 |
| 10   | Chemprop                    | Kyle Swanson      | GitHub, Paper | N/A       | 0.555 ± 0.022 |
| 11   | GCN                         | Kexin Huang       | GitHub, Paper | 191,810   | 0.532 ± 0.033 |
| 12   | NeuralFP                    | Kexin Huang       | GitHub, Paper | 480,193   | 0.529 ± 0.015 |
| 13   | Basic ML                    | Nilavo Boral      | GitHub, Paper | N/A       | 0.518 ± 0.005 |
| 14   | Morgan + MLP (DeepPurpose)  | Kexin Huang       | GitHub, Paper | 1,477,185 | 0.492 ± 0.020 |
| 15   | AttentiveFP                 | Kexin Huang       | GitHub, Paper | 300,806   | 0.365 ± 0.055 |
| 16   | CNN (DeepPurpose)           | Kexin Huang       | GitHub, Paper | 226,625   | 0.252 ± 0.116 |

In [1]:
import pandas as pd
from deepmol.pipeline import Pipeline

2024-04-19 11:49:17.176536: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-19 11:49:17.275968: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-19 11:49:17.276131: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-19 11:49:17.276209: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-19 11:49:17.293437: I tensorflow/core/platform/cpu_feature_g

In [2]:
# read results
results = pd.read_csv('results_test_set/clearance_microsome_test_set.csv')
# set columns
results.columns = ['trial_id', 'mean', 'std']
results

Unnamed: 0,trial_id,mean,std
0,41.0,0.528,0.018
1,31.0,0.508,0.016
2,61.0,0.522,0.009
3,52.0,0.539,0.018
4,42.0,0.541,0.025
5,82.0,0.531,0.01
6,66.0,0.468,0.022
7,88.0,0.513,0.015
8,35.0,0.504,0.013
9,70.0,0.516,0.015


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
10,voting_pipeline,0.553,0.013
4,42.0,0.541,0.025
3,52.0,0.539,0.018
5,82.0,0.531,0.01
0,41.0,0.528,0.018
2,61.0,0.522,0.009
9,70.0,0.516,0.015
7,88.0,0.513,0.015
1,31.0,0.508,0.016
8,35.0,0.504,0.013


In [4]:
# load best trial pipeline (rank #11)
from deepmol.pipeline import VotingPipeline

pipeline = VotingPipeline.load(f"clearance_microsome/voting_pipeline/")

[11:50:18] Initializing Normalizer
2024-04-19 11:50:27.548985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 668 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:89:00.0, compute capability: 7.5
2024-04-19 11:50:27.551674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13563 MB memory:  -> device: 1, name: Tesla T4, pci bus id: 0000:b1:00.0, compute capability: 7.5
2024-04-19 11:50:27.553764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 13563 MB memory:  -> device: 2, name: Tesla T4, pci bus id: 0000:b3:00.0, compute capability: 7.5
2024-04-19 11:50:27.556213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 12533 MB memory:  -> device: 3, name: Tesla T4, pci bus id: 0000:b4:00.0,

The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.
The model was not restored. The model was probably not trained.


In [6]:
pipeline.pipelines[0].steps

[('standardizer',
  <deepmol.standardizer.chembl_standardizer.ChEMBLStandardizer at 0x7fe950718970>),
 ('padder', <deepmol.base.transformer.DatasetTransformer at 0x7fe84e5879d0>),
 ('model', <deepmol.models.deepchem_models.DeepChemModel at 0x7fe84e43e920>)]

In [10]:
pipeline.pipelines[4].steps

[('standardizer',
  <deepmol.standardizer.chembl_standardizer.ChEMBLStandardizer at 0x7fe6cff956f0>),
 ('padder', <deepmol.base.transformer.DatasetTransformer at 0x7fe6b2b0cc10>),
 ('model', <deepmol.models.deepchem_models.DeepChemModel at 0x7fe6b2b0cbb0>)]

In [11]:
data = pd.DataFrame()
data['standardizer'] = [str(pipeline.pipelines[i].steps[0][1]).split('.')[-1] for i in range(5)]
data['featurizer'] = [str(pipeline.pipelines[i].steps[1][1]).split('.')[-1] for i in range(5)]
#data['scaler'] = [str(pipeline.pipelines[i].steps[2][1]).split('.')[-1] for i in range(5)]
#data['feature_selector'] = [str(pipeline.pipelines[i].steps[3][1]).split('.')[-1] for i in range(5)]
data['model'] = [pipeline.pipelines[i].steps[2][1].model for i in range(5)]
data

Unnamed: 0,standardizer,featurizer,model
0,ChEMBLStandardizer object at 0x7fe950718970>,DatasetTransformer object at 0x7fe84e5879d0>,<deepchem.models.text_cnn.TextCNNModel object ...
1,ChEMBLStandardizer object at 0x7fe6d418ff70>,DatasetTransformer object at 0x7fe6d418e620>,<deepchem.models.text_cnn.TextCNNModel object ...
2,ChEMBLStandardizer object at 0x7fe6d418ea40>,DatasetTransformer object at 0x7fe6cff0b550>,<deepchem.models.text_cnn.TextCNNModel object ...
3,ChEMBLStandardizer object at 0x7fe6cff0b3d0>,DatasetTransformer object at 0x7fe6cff94550>,<deepchem.models.text_cnn.TextCNNModel object ...
4,ChEMBLStandardizer object at 0x7fe6cff956f0>,DatasetTransformer object at 0x7fe6b2b0cc10>,<deepchem.models.text_cnn.TextCNNModel object ...
