# Bioavailability, Ma et al.

### Dataset Description: Oral bioavailability is defined as “the rate and extent to which the active ingredient or active moiety is absorbed from a drug product and becomes available at the site of action”.

### Task Description: Binary classification. Given a drug SMILES string, predict the activity of bioavailability.

### Dataset Statistics: 640 drugs.

### Metric: AUROC

## Leaderboard

| Rank | Model                       | Contact           | Link          | #Params   | AUROC         |
|------|-----------------------------|-------------------|---------------|-----------|---------------|
| 1    | SimGCN                      | Suman Kalyan Bera | GitHub, Paper | 1,103,000 | 0.748 ± 0.033 |
| 2    | MapLight + GNN              | Jim Notwell       | GitHub, Paper | N/A       | 0.742 ± 0.010 |
| 3    | MapLight                    | Jim Notwell       | GitHub, Paper | N/A       | 0.730 ± 0.010 |
| 4    | ZairaChem                   | Gemma Turon       | GitHub, Paper | N/A       | 0.706 ± 0.031 |
| 5    | RDKit2D + MLP (DeepPurpose) | Kexin Huang       | GitHub, Paper | 633,409   | 0.672 ± 0.021 |
| 6    | ContextPred                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.671 ± 0.026 |
| 7    | Chemprop-RDKit              | Kyle Swanson      | GitHub, Paper | N/A       | 0.667 ± 0.068 |
| 8    | AttentiveFP                 | Kexin Huang       | GitHub, Paper | 300,806   | 0.632 ± 0.039 |
| 9    | NeuralFP                    | Kexin Huang       | GitHub, Paper | 480,193   | 0.632 ± 0.036 |
| 10   | Euclia ML model             | Euclia            | GitHub, Paper | 50        | 0.613 ± 0.015 |
| 11   | CNN (DeepPurpose)           | Kexin Huang       | GitHub, Paper | 226,625   | 0.613 ± 0.013 |
| 12   | Chemprop                    | Kyle Swanson      | GitHub, Paper | N/A       | 0.581 ± 0.024 |
| 13   | Morgan + MLP (DeepPurpose)  | Kexin Huang       | GitHub, Paper | 1,477,185 | 0.581 ± 0.086 |
| 14   | AttrMasking                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.577 ± 0.087 |
| 15   | GCN                         | Kexin Huang       | GitHub, Paper | 191,810   | 0.566 ± 0.115 |
| 16   | Basic ML                    | Nilavo Boral      | GitHub, Paper | N/A       | 0.523 ± 0.011 |

In [1]:
import pandas as pd
from deepmol.pipeline import Pipeline

2023-11-11 21:05:30.113682: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-11 21:05:30.180923: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-11 21:05:30.181485: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
  return torch._C._cuda_getDeviceCount() > 0
Skipped loading modules with pytorch-lightning dependency, missing a dependency. No module named 'pytorch_lightning'
Skipped loading some Jax models, missing a dependency. jax requires jaxlib to be installed. See https://github.com/google/jax#installation for installation instructions.


In [2]:
# read results
results = pd.read_csv('bioavailability/tdc_test_set_results.txt', sep=',', header=None, dtype={0: int, 1: float, 2: float})
# set columns
results.columns = ['trial_id', 'mean', 'std']
results
# order res

Unnamed: 0,trial_id,mean,std
0,0,0.532,0.0
1,2,0.5,0.0
2,8,0.571,0.018
3,11,0.514,0.018
4,12,0.519,0.016
5,13,0.53,0.004
6,14,0.49,0.03
7,15,0.498,0.009
8,16,0.621,0.023
9,18,0.503,0.02


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
39,69,0.645,0.027
18,35,0.632,0.012
8,16,0.621,0.023
27,48,0.608,0.021
26,44,0.604,0.02
50,89,0.584,0.052
28,49,0.577,0.014
34,58,0.576,0.02
52,91,0.573,0.017
2,8,0.571,0.018


In [4]:
# load best trial pipeline (rank #8)
best_trial_id = int(results.iloc[0]['trial_id'])
pipeline = Pipeline.load(f"bioavailability/trial_{best_trial_id}/")

[21:05:33] Initializing Normalizer


In [5]:
pipeline.steps

[('standardizer',
  <deepmol.standardizer.custom_standardizer.CustomStandardizer at 0x7f6f3697fc70>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.MorganFingerprint at 0x7f6f369be740>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f6f36847670>),
 ('feature_selector',
  <deepmol.feature_selection.base_feature_selector.BorutaAlgorithm at 0x7f6f3686b100>),
 ('model',
  SklearnModel(model=MultinomialNB(alpha=0.46480582874453724),
               model_dir='bioavailability/trial_69/model/model.pkl'))]