# CYP P450 2D6 Inhibition, Veith et al.

### Dataset Description: The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, CYP2D6 is primarily expressed in the liver. It is also highly expressed in areas of the central nervous system, including the substantia nigra.

### Task Description: Binary Classification. Given a drug SMILES string, predict CYP2D6 inhibition.

### Dataset Statistics: 13,130 drugs.

### Metric: AUPRC

## Leaderboard

| Rank | Model                       | Contact      | Link          | #Params   | AUPRC         |
|------|-----------------------------|--------------|---------------|-----------|---------------|
| 1    | MapLight + GNN              | Jim Notwell  | GitHub, Paper | N/A       | 0.790 ± 0.001 |
| 2    | ContextPred                 | Kexin Huang  | GitHub, Paper | 2,067,053 | 0.739 ± 0.005 |
| 3    | MapLight                    | Jim Notwell  | GitHub, Paper | N/A       | 0.723 ± 0.003 |
| 4    | AttrMasking                 | Kexin Huang  | GitHub, Paper | 2,067,053 | 0.721 ± 0.009 |
| 5    | Chemprop-RDKit              | Kyle Swanson | GitHub, Paper | N/A       | 0.673 ± 0.007 |
| 6    | Chemprop                    | Kyle Swanson | GitHub, Paper | N/A       | 0.649 ± 0.016 |
| 7    | AttentiveFP                 | Kexin Huang  | GitHub, Paper | 300,806   | 0.646 ± 0.014 |
| 8    | ZairaChem                   | Gemma Turon  | GitHub, Paper | N/A       | 0.644 ± 0.085 |
| 9    | NeuralFP                    | Kexin Huang  | GitHub, Paper | 480,193   | 0.627 ± 0.009 |
| 10   | GCN                         | Kexin Huang  | GitHub, Paper | 191,810   | 0.616 ± 0.020 |
| 11   | RDKit2D + MLP (DeepPurpose) | Kexin Huang  | GitHub, Paper | 633,409   | 0.616 ± 0.007 |
| 12   | Morgan + MLP (DeepPurpose)  | Kexin Huang  | GitHub, Paper | 1,477,185 | 0.587 ± 0.011 |
| 13   | CNN (DeepPurpose)           | Kexin Huang  | GitHub, Paper | 226,625   | 0.544 ± 0.053 |
| 14   | Basic ML                    | Nilavo Boral | GitHub, Paper | N/A       | 0.358 ± 0.000 |
| 15   | Euclia ML model             | Euclia       | GitHub, Paper | 50        | 0.348 ± 0.004 |

In [1]:
import pandas as pd
from deepmol.pipeline import Pipeline

2024-04-19 11:39:10.918429: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-19 11:39:11.007896: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-19 11:39:11.007996: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-19 11:39:11.008053: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-19 11:39:11.023608: I tensorflow/core/platform/cpu_feature_g

In [2]:
# read results
results = pd.read_csv('results_test_set/cyp2d6_test_set.csv')
# set columns
results.columns = ['trial_id', 'mean', 'std']
results
# order res

Unnamed: 0,trial_id,mean,std
0,14.0,0.685,0.0
1,52.0,0.406,0.002
2,80.0,0.654,0.003
3,41.0,0.402,0.009
4,17.0,0.663,0.001
5,74.0,0.658,0.001
6,15.0,0.424,0.003
7,79.0,0.659,0.002
8,81.0,0.654,0.001
9,85.0,0.658,0.003


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
0,14.0,0.685,0.0
10,voting_pipeline,0.681,0.001
4,17.0,0.663,0.001
7,79.0,0.659,0.002
9,85.0,0.658,0.003
5,74.0,0.658,0.001
2,80.0,0.654,0.003
8,81.0,0.654,0.001
6,15.0,0.424,0.003
1,52.0,0.406,0.002


In [4]:
# load best trial pipeline (rank #5)
pipeline = Pipeline.load(f"cyp2d6/trial_14/")
pipeline.steps

[11:39:17] Initializing Normalizer


[('standardizer',
  <deepmol.standardizer.custom_standardizer.CustomStandardizer at 0x7f213a823460>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.RDKFingerprint at 0x7f213a9a2e60>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f213a859660>),
 ('feature_selector',
  <deepmol.feature_selection.base_feature_selector.BorutaAlgorithm at 0x7f213a891060>),
 ('model',
  SklearnModel(model=StackingClassifier(estimators=[('lr', LogisticRegression()),
                                                    ('svc', SVC()),
                                                    ('rfr',
                                                     RandomForestClassifier()),
                                                    ('gbr',
                                                     GradientBoostingClassifier())],
                                        final_estimator=MLPClassifier()),
               model_dir='cyp2d6/trial_14/model/model.pkl'))]