# Clearance, AstraZeneca

### Dataset Description: Drug clearance is defined as the volume of plasma cleared of a drug over a specified time period and it measures the rate at which the active drug is removed from the body. This is a dataset curated from ChEMBL database containing experimental results on intrinsic clearance, deposited from AstraZeneca. It contains clearance measures from two experiments types, hepatocyte and microsomes. As many studies [2] have shown various clearance outcomes given these two different types, we separate them.

### Task Description: Regression. Given a drug SMILES string, predict the activity of clearance.

### Dataset Statistics: 1,102/1,020 drugs for microsome/hepatocyte clearance.

### Metric: Spearman

## Leaderboard

| Rank | Model                       | Contact           | Link          | #Params   | Spearman      |
|------|-----------------------------|-------------------|---------------|-----------|---------------|
| 1    | MapLight + GNN              | Jim Notwell       | GitHub, Paper | N/A       | 0.630 ± 0.010 |
| 2    | MapLight                    | Jim Notwell       | GitHub, Paper | N/A       | 0.626 ± 0.008 |
| 3    | RFStacker                   | Andrew Li         | GitHub, Paper | 1,858,225 | 0.625 ± 0.002 |
| 4    | Chemprop-RDKit              | Kyle Swanson      | GitHub, Paper | N/A       | 0.599 ± 0.025 |
| 5    | SimGCN                      | Suman Kalyan Bera | GitHub, Paper | 1,103,000 | 0.597 ± 0.025 |
| 6    | RDKit2D + MLP (DeepPurpose) | Kexin Huang       | GitHub, Paper | 633,409   | 0.586 ± 0.014 |
| 7    | AttrMasking                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.585 ± 0.034 |
| 8    | ContextPred                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.578 ± 0.007 |
| 9    | Euclia ML model             | Euclia            | GitHub, Paper | 50        | 0.572 ± 0.010 |
| 10   | Chemprop                    | Kyle Swanson      | GitHub, Paper | N/A       | 0.555 ± 0.022 |
| 11   | GCN                         | Kexin Huang       | GitHub, Paper | 191,810   | 0.532 ± 0.033 |
| 12   | NeuralFP                    | Kexin Huang       | GitHub, Paper | 480,193   | 0.529 ± 0.015 |
| 13   | Basic ML                    | Nilavo Boral      | GitHub, Paper | N/A       | 0.518 ± 0.005 |
| 14   | Morgan + MLP (DeepPurpose)  | Kexin Huang       | GitHub, Paper | 1,477,185 | 0.492 ± 0.020 |
| 15   | AttentiveFP                 | Kexin Huang       | GitHub, Paper | 300,806   | 0.365 ± 0.055 |
| 16   | CNN (DeepPurpose)           | Kexin Huang       | GitHub, Paper | 226,625   | 0.252 ± 0.116 |

In [1]:
import pandas as pd
from deepmol.pipeline import Pipeline

2023-11-21 14:38:14.301130: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-21 14:38:14.343694: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-21 14:38:14.344396: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Skipped loading modules with pytorch-lightning dependency, missing a dependency. No module named 'pytorch_lightning'
Skipped loading some Jax models, missing a dependency. jax requires jaxlib to be installed. See https://github.com/google/jax#installation for installation instructions.


In [2]:
# read results
results = pd.read_csv('clearance_microsome/tdc_test_set_results.txt', sep=',', header=None, dtype={0: int, 1: float, 2: float})
# set columns
results.columns = ['trial_id', 'mean', 'std']
results
# order res

Unnamed: 0,trial_id,mean,std
0,1,0.258,0.085
1,3,,
2,4,0.019,0.009
3,5,0.585,0.007
4,6,0.045,0.013
5,7,,
6,10,0.046,0.029
7,12,0.587,0.009
8,13,0.579,0.01
9,14,0.58,0.012


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
7,12,0.587,0.009
3,5,0.585,0.007
9,14,0.58,0.012
8,13,0.579,0.01
17,25,0.54,0.019
0,1,0.258,0.085
16,23,0.077,0.105
6,10,0.046,0.029
4,6,0.045,0.013
2,4,0.019,0.009


In [11]:
# load best trial pipeline (rank #6)
best_trial_id = int(results.iloc[0]['trial_id'])
pipeline = Pipeline.load(f"clearance_microsome/trial_{best_trial_id}/")

NameError: name 'results' is not defined