# Pgp (P-glycoprotein) Inhibition, Broccatelli et al.

### Dataset Description: P-glycoprotein (Pgp) is an ABC transporter protein involved in intestinal absorption, drug metabolism, and brain penetration, and its inhibition can seriously alter a drug's bioavailability and safety. In addition, inhibitors of Pgp can be used to overcome multidrug resistance.

### Task Description: Binary classification. Given a drug SMILES string, predict the activity of Pgp inhibition.

### Dataset Statistics: 1,212 drugs.

### Metric: AUROC

## Leaderboard

| Rank | Model                       | Contact           | Link          | #Params   | AUROC         |
|------|-----------------------------|-------------------|---------------|-----------|---------------|
| 1    | MapLight + GNN              | Jim Notwell       | GitHub, Paper | N/A       | 0.938 ± 0.002 |
| 2    | ZairaChem                   | Gemma Turon       | GitHub, Paper | N/A       | 0.935 ± 0.006 |
| 3    | MapLight                    | Jim Notwell       | GitHub, Paper | N/A       | 0.930 ± 0.002 |
| 4    | SimGCN                      | Suman Kalyan Bera | GitHub, Paper | 1,103,000 | 0.929 ± 0.010 |
| 5    | AttrMasking                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.929 ± 0.006 |
| 6    | ContextPred                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.923 ± 0.005 |
| 7    | RDKit2D + MLP (DeepPurpose) | Kexin Huang       | GitHub, Paper | 633,409   | 0.918 ± 0.007 |
| 8    | CNN (DeepPurpose)           | Kexin Huang       | GitHub, Paper | 226,625   | 0.908 ± 0.012 |
| 9    | NeuralFP                    | Kexin Huang       | GitHub, Paper | 480,193   | 0.902 ± 0.020 |
| 10   | GCN                         | Kexin Huang       | GitHub, Paper | 191,810   | 0.895 ± 0.021 |
| 11   | AttentiveFP                 | Kexin Huang       | GitHub, Paper | 300,806   | 0.892 ± 0.012 |
| 12   | Chemprop-RDKit              | Kyle Swanson      | GitHub, Paper | N/A       | 0.886 ± 0.016 |
| 13   | Morgan + MLP (DeepPurpose)  | Kexin Huang       | GitHub, Paper | 1,477,185 | 0.880 ± 0.006 |
| 14   | Chemprop                    | Kyle Swanson      | GitHub, Paper | N/A       | 0.860 ± 0.036 |
| 15   | Euclia ML model             | Euclia            | GitHub, Paper | 50        | 0.845 ± 0.003 |
| 16   | Basic ML                    | Nilavo Boral      | GitHub, Paper | N/A       | 0.818 ± 0.000 |

In [1]:
import pandas as pd
from deepmol.pipeline import Pipeline

2024-02-01 08:58:32.112743: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-01 08:58:32.172237: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-01 08:58:32.172284: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-01 08:58:32.172336: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-01 08:58:32.183361: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-01 08:58:32.184131: I tensorflow/core/platform/cpu_feature_guard.cc:182] This Tens

In [2]:
pipeline = Pipeline.load("pgp/trial_36")

[08:59:14] Initializing Normalizer


In [3]:
pipeline.steps

[('standardizer',
  <deepmol.standardizer.custom_standardizer.CustomStandardizer at 0x7f8ff9661420>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.MorganFingerprint at 0x7f8e51154f40>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f8e511ca3b0>),
 ('feature_selector',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f8e50d59e10>),
 ('model',
  SklearnModel(model=LogisticRegression(C=0.03619332830645828),
               model_dir='pgp/trial_36/model/model.pkl'))]

In [2]:
# read results
results = pd.read_csv('pgp/tdc_test_set_results.txt', sep=',', header=None, dtype={0: int, 1: float, 2: float})
# set columns
results.columns = ['trial_id', 'mean', 'std']
results
# order res

Unnamed: 0,trial_id,mean,std
0,0,0.704,0.023
1,2,0.682,0.034
2,5,0.583,0.006
3,6,0.536,0.072
4,8,0.760,0.047
...,...,...,...
62,94,0.725,0.009
63,95,0.504,0.007
64,96,0.577,0.009
65,98,0.803,0.020


In [3]:
# order results by mean (std in case of tie)
results = results.sort_values(by=['mean', 'std'], ascending=False)
results

Unnamed: 0,trial_id,mean,std
21,43,0.846,0.014
20,42,0.844,0.011
36,64,0.839,0.014
35,63,0.832,0.017
51,82,0.832,0.008
...,...,...,...
64,96,0.577,0.009
3,6,0.536,0.072
63,95,0.504,0.007
10,18,0.500,0.000


In [4]:
# load best trial pipeline (rank #15)
best_trial_id = int(results.iloc[0]['trial_id'])
pipeline = Pipeline.load(f"pgp/trial_{best_trial_id}/")

[14:49:28] Initializing Normalizer


FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpr5ctodq7/model.pkl'