# CYP P450 2C9 Inhibition, Veith et al.

### Dataset Description: The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, the CYP P450 2C9 plays a major role in the oxidation of both xenobiotic and endogenous compounds.

### Task Description: Binary Classification. Given a drug SMILES string, predict CYP2C9 inhibition.

### Dataset Statistics: 12,092 drugs.

### Metric: AUPRC

## Leaderboard

| Rank | Model                                    | Contact         | Link          | #Params   | AUPRC         |
|------|------------------------------------------|-----------------|---------------|-----------|---------------|
| 1    | MapLight + GNN                           | Jim Notwell     | GitHub, Paper | N/A       | 0.859 ± 0.001 |
| 2    | ContextPred                              | Kexin Huang     | GitHub, Paper | 2,067,053 | 0.839 ± 0.003 |
| 3    | AttrMasking                              | Kexin Huang     | GitHub, Paper | 2,067,053 | 0.829 ± 0.003 |
| 4    | ZairaChem                                | Gemma Turon     | GitHub, Paper | N/A       | 0.786 ± 0.004 |
| 5    | MapLight                                 | Jim Notwell     | GitHub, Paper | N/A       | 0.783 ± 0.002 |
| 6    | Chemprop-RDKit                           | Kyle Swanson    | GitHub, Paper | N/A       | 0.777 ± 0.003 |
| 7    | ColorRefinement + Weighted Ensemble LGBM | Parker Burchett | GitHub, Paper | 68        | 0.767 ± 0.003 |
| 8    | Chemprop                                 | Kyle Swanson    | GitHub, Paper | N/A       | 0.754 ± 0.002 |
| 9    | AttentiveFP                              | Kexin Huang     | GitHub, Paper | 300,806   | 0.749 ± 0.004 |
| 10   | RDKit2D + MLP (DeepPurpose)              | Kexin Huang     | GitHub, Paper | 633,409   | 0.742 ± 0.006 |
| 11   | NeuralFP                                 | Kexin Huang     | GitHub, Paper | 480,193   | 0.739 ± 0.010 |
| 12   | GCN                                      | Kexin Huang     | GitHub, Paper | 191,810   | 0.735 ± 0.004 |
| 13   | Morgan + MLP (DeepPurpose)               | Kexin Huang     | GitHub, Paper | 1,477,185 | 0.715 ± 0.004 |
| 14   | CNN (DeepPurpose)                        | Kexin Huang     | GitHub, Paper | 226,625   | 0.713 ± 0.006 |
| 15   | Basic ML                                 | Nilavo Boral    | GitHub, Paper | N/A       | 0.556 ± 0.000 |
| 16   | Euclia ML model                          | Euclia          | GitHub, Paper | 50        | 0.536 ± 0.003 |

In [1]:
import pandas as pd
from deepmol.pipeline import Pipeline

2024-04-19 11:27:36.106657: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-19 11:27:37.510647: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-19 11:27:37.510740: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-19 11:27:37.510814: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-19 11:27:37.763608: I tensorflow/core/platform/cpu_feature_g

In [2]:
# read results
results = pd.read_csv('results_test_set/cyp2c9_test_set.csv')
# set columns
results.columns = ['trial_id', 'mean', 'std']
results

Unnamed: 0,trial_id,mean,std
0,83.0,0.59,0.0
1,13.0,0.712,0.004
2,75.0,0.698,0.006
3,52.0,0.538,0.0
4,0.0,0.71,0.003
5,91.0,0.536,0.0
6,74.0,0.537,0.0
7,61.0,0.537,0.0
8,71.0,0.536,0.0
9,70.0,0.536,0.0


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
10,voting_pipeline,0.758,0.002
1,13.0,0.712,0.004
4,0.0,0.71,0.003
2,75.0,0.698,0.006
0,83.0,0.59,0.0
3,52.0,0.538,0.0
6,74.0,0.537,0.0
7,61.0,0.537,0.0
5,91.0,0.536,0.0
8,71.0,0.536,0.0


In [4]:
# load best trial pipeline (rank #8)
from deepmol.pipeline import VotingPipeline

pipeline = VotingPipeline.load(f"cyp2c9/voting_pipeline/")

[11:28:17] Initializing Normalizer


In [5]:
pipeline.pipelines[0].steps

[('standardizer',
  <deepmol.standardizer.basic_standardizer.BasicStandardizer at 0x7fa4f20603d0>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.MorganFingerprint at 0x7fa4f21e9d80>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa4f209d5d0>),
 ('feature_selector',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa4f20d9030>),
 ('model',
  SklearnModel(model=SVC(C=9.821880599799835, degree=5, gamma=0.0392617789396249),
               model_dir='cyp2c9/voting_pipeline/trial_83/model/model.pkl'))]

In [6]:
pipeline.pipelines[1].steps

[('standardizer',
  <deepmol.standardizer.custom_standardizer.CustomStandardizer at 0x7fa4f21eb640>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.AtomPairFingerprint at 0x7fa3933eded0>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa3933edf60>),
 ('feature_selector',
  <deepmol.feature_selection.base_feature_selector.LowVarianceFS at 0x7fa3933ee050>),
 ('model',
  SklearnModel(model=GradientBoostingClassifier(criterion='squared_error',
                                                learning_rate=0.1637515741456934,
                                                loss='deviance',
                                                max_features='sqrt',
                                                n_estimators=150),
               model_dir='cyp2c9/voting_pipeline/trial_13/model/model.pkl'))]

In [7]:
pipeline.pipelines[2].steps

[('standardizer',
  <deepmol.standardizer.basic_standardizer.BasicStandardizer at 0x7fa393696d10>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.MorganFingerprint at 0x7fa3901c5900>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa3901c5a20>),
 ('feature_selector',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa3901c5a80>),
 ('model',
  SklearnModel(model=GradientBoostingClassifier(criterion='squared_error',
                                                learning_rate=0.3738315737081547,
                                                loss='deviance',
                                                max_features='auto',
                                                n_estimators=200),
               model_dir='cyp2c9/voting_pipeline/trial_75/model/model.pkl'))]

In [8]:
pipeline.pipelines[3].steps

[('standardizer',
  <deepmol.standardizer.basic_standardizer.BasicStandardizer at 0x7fa3901c57e0>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.MorganFingerprint at 0x7fa39020d660>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa39020d780>),
 ('feature_selector',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa39020d7e0>),
 ('model',
  SklearnModel(model=RidgeClassifierCV(alphas=6.12470421788726),
               model_dir='cyp2c9/voting_pipeline/trial_52/model/model.pkl'))]

In [9]:
pipeline.pipelines[4].steps

[('standardizer',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa39020d540>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.AtomPairFingerprint at 0x7fa39020d990>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7fa39020dab0>),
 ('feature_selector',
  <deepmol.feature_selection.base_feature_selector.LowVarianceFS at 0x7fa39020db10>),
 ('model',
  SklearnModel(model=GradientBoostingClassifier(criterion='squared_error',
                                                learning_rate=0.2028992286878416,
                                                loss='deviance',
                                                max_features='sqrt',
                                                n_estimators=150),
               model_dir='cyp2c9/voting_pipeline/trial_0/model/model.pkl'))]

In [14]:
data = pd.DataFrame()
data['standardizer'] = [str(pipeline.pipelines[i].steps[0][1]).split('.')[-1] for i in range(5)]
data['featurizer'] = [str(pipeline.pipelines[i].steps[1][1]).split('.')[-1] for i in range(5)]
data['scaler'] = [str(pipeline.pipelines[i].steps[2][1]).split('.')[-1] for i in range(5)]
data['feature_selector'] = [str(pipeline.pipelines[i].steps[3][1]).split('.')[-1] for i in range(5)]
data['model'] = [str(pipeline.pipelines[i].steps[4][1]) for i in range(5)]
data

Unnamed: 0,standardizer,featurizer,scaler,feature_selector,model
0,BasicStandardizer object at 0x7fa4f20603d0>,MorganFingerprint object at 0x7fa4f21e9d80>,PassThroughTransformer object at 0x7fa4f209d5d0>,PassThroughTransformer object at 0x7fa4f20d9030>,"SklearnModel(model=SVC(C=9.821880599799835, de..."
1,CustomStandardizer object at 0x7fa4f21eb640>,AtomPairFingerprint object at 0x7fa3933eded0>,PassThroughTransformer object at 0x7fa3933edf60>,LowVarianceFS object at 0x7fa3933ee050>,SklearnModel(model=GradientBoostingClassifier(...
2,BasicStandardizer object at 0x7fa393696d10>,MorganFingerprint object at 0x7fa3901c5900>,PassThroughTransformer object at 0x7fa3901c5a20>,PassThroughTransformer object at 0x7fa3901c5a80>,SklearnModel(model=GradientBoostingClassifier(...
3,BasicStandardizer object at 0x7fa3901c57e0>,MorganFingerprint object at 0x7fa39020d660>,PassThroughTransformer object at 0x7fa39020d780>,PassThroughTransformer object at 0x7fa39020d7e0>,SklearnModel(model=RidgeClassifierCV(alphas=6....
4,PassThroughTransformer object at 0x7fa39020d540>,AtomPairFingerprint object at 0x7fa39020d990>,PassThroughTransformer object at 0x7fa39020dab0>,LowVarianceFS object at 0x7fa39020db10>,SklearnModel(model=GradientBoostingClassifier(...
