# CYP2D6 Substrate, Carbon-Mangels et al.

### Dataset Description: CYP2D6 is primarily expressed in the liver. It is also highly expressed in areas of the central nervous system, including the substantia nigra. TDC used a dataset from [1], which merged information on substrates and nonsubstrates from six publications.

### Task Description: Binary Classification. Given a drug SMILES string, predict if it is a substrate to the enzyme.

### Dataset Statistics: 664 drugs.

### Metric: AUPRC

## Leaderboard

| Rank | Model                       | Contact      | Link          | #Params   | AUPRC         |
|------|-----------------------------|--------------|---------------|-----------|---------------|
| 1    | ContextPred                 | Kexin Huang  | GitHub, Paper | 2,067,053 | 0.736 ± 0.024 |
| 2    | MapLight + GNN              | Jim Notwell  | GitHub, Paper | N/A       | 0.720 ± 0.002 |
| 3    | MapLight                    | Jim Notwell  | GitHub, Paper | N/A       | 0.713 ± 0.009 |
| 4    | AttrMasking                 | Kexin Huang  | GitHub, Paper | 2,067,053 | 0.704 ± 0.028 |
| 5    | Chemprop-RDKit              | Kyle Swanson | GitHub, Paper | N/A       | 0.686 ± 0.031 |
| 6    | ZairaChem                   | Gemma Turon  | GitHub, Paper | N/A       | 0.685 ± 0.029 |
| 7    | RDKit2D + MLP (DeepPurpose) | Kexin Huang  | GitHub, Paper | 633,409   | 0.677 ± 0.047 |
| 8    | Morgan + MLP (DeepPurpose)  | Kexin Huang  | GitHub, Paper | 1,477,185 | 0.671 ± 0.066 |
| 9    | Chemprop                    | Kyle Swanson | GitHub, Paper | N/A       | 0.632 ± 0.037 |
| 10   | GCN                         | Kexin Huang  | GitHub, Paper | 191,810   | 0.617 ± 0.039 |
| 11   | AttentiveFP                 | Kexin Huang  | GitHub, Paper | 300,806   | 0.574 ± 0.030 |
| 12   | NeuralFP                    | Kexin Huang  | GitHub, Paper | 480,193   | 0.572 ± 0.062 |
| 13   | Euclia ML model             | Euclia       | GitHub, Paper | 50        | 0.498 ± 0.015 |
| 14   | CNN (DeepPurpose)           | Kexin Huang  | GitHub, Paper | 226,625   | 0.485 ± 0.037 |
| 15   | Basic ML                    | Nilavo Boral | GitHub, Paper | N/A       | 0.478 ± 0.018 |

In [2]:
import pandas as pd
from deepmol.pipeline import Pipeline

2024-02-26 10:26:46.960157: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-26 10:26:47.005052: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-26 10:26:47.005090: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-26 10:26:47.005119: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-26 10:26:47.012920: I tensorflow/core/platform/cpu_feature_g

In [3]:
# read results
results = pd.read_csv('results_test_set/cyp2d6_substrate_test_set.csv')
# set columns
results.columns = ['trial_id', 'mean', 'std']
results
# order res

Unnamed: 0,trial_id,mean,std
0,53.0,0.498,0.003
1,68.0,0.652,0.008
2,55.0,0.592,0.014
3,60.0,0.591,0.01
4,63.0,0.588,0.004
5,56.0,0.569,0.008
6,91.0,0.6,0.008
7,62.0,0.581,0.01
8,70.0,0.731,0.037
9,61.0,0.577,0.004


In [4]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
8,70.0,0.731,0.037
1,68.0,0.652,0.008
10,voting_pipeline,0.639,0.007
6,91.0,0.6,0.008
2,55.0,0.592,0.014
3,60.0,0.591,0.01
4,63.0,0.588,0.004
7,62.0,0.581,0.01
9,61.0,0.577,0.004
5,56.0,0.569,0.008


In [6]:
# load best trial pipeline (rank #...)
pipeline = Pipeline.load(f"cyp2d6_substrate/trial_70/")
pipeline.steps

[('standardizer',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f29267e1360>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_descriptors.TwoDimensionDescriptors at 0x7f29267e10c0>),
 ('scaler', <deepmol.scalers.sklearn_scalers.Binarizer at 0x7f29267e1060>),
 ('feature_selector',
  <deepmol.feature_selection.base_feature_selector.PercentilFS at 0x7f27c3bf2dd0>),
 ('model',
  SklearnModel(model=GradientBoostingClassifier(criterion='squared_error',
                                                learning_rate=0.6277616525761669,
                                                loss='deviance',
                                                max_features='log2',
                                                n_estimators=200),
               model_dir='cyp2d6_substrate/trial_70/model/model.pkl'))]