# hERG blockers

### Dataset Description: Human ether-à-go-go related gene (hERG) is crucial for the coordination of the heart's beating. Thus, if a drug blocks the hERG, it could lead to severe adverse effects. Therefore, reliable prediction of hERG liability in the early stages of drug design is quite important to reduce the risk of cardiotoxicity-related attritions in the later development stages.

### Task Description: Binary classification. Given a drug SMILES string, predict whether it blocks (1) or not blocks (0).

### Dataset Statistics: 648 drugs.

### Metric: AUROC

## Leaderboard

| Rank | Model                       | Contact           | Link          | #Params   | AUROC         |
|------|-----------------------------|-------------------|---------------|-----------|---------------|
| 1    | MapLight + GNN              | Jim Notwell       | GitHub, Paper | N/A       | 0.880 ± 0.002 |
| 2    | SimGCN                      | Suman Kalyan Bera | GitHub, Paper | 1,103,000 | 0.874 ± 0.014 |
| 3    | MapLight                    | Jim Notwell       | GitHub, Paper | N/A       | 0.871 ± 0.004 |
| 4    | ZairaChem                   | Gemma Turon       | GitHub, Paper | N/A       | 0.856 ± 0.009 |
| 5    | RDKit2D + MLP (DeepPurpose) | Kexin Huang       | GitHub, Paper | 633,409   | 0.841 ± 0.020 |
| 6    | Chemprop-RDKit              | Kyle Swanson      | GitHub, Paper | N/A       | 0.840 ± 0.007 |
| 7    | AttentiveFP                 | Kexin Huang       | GitHub, Paper | 300,806   | 0.825 ± 0.007 |
| 8    | AttrMasking                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.778 ± 0.046 |
| 9    | ContextPred                 | Kexin Huang       | GitHub, Paper | 2,067,053 | 0.756 ± 0.023 |
| 10   | CNN (DeepPurpose)           | Kexin Huang       | GitHub, Paper | 226,625   | 0.754 ± 0.037 |
| 11   | Euclia ML model             | Euclia            | GitHub, Paper | 50        | 0.749 ± 0.032 |
| 12   | GCN                         | Kexin Huang       | GitHub, Paper | 191,810   | 0.738 ± 0.038 |
| 13   | Morgan + MLP (DeepPurpose)  | Kexin Huang       | GitHub, Paper | 1,477,185 | 0.736 ± 0.023 |
| 14   | NeuralFP                    | Kexin Huang       | GitHub, Paper | 480,193   | 0.722 ± 0.034 |
| 15   | Chemprop                    | Kyle Swanson      | GitHub, Paper | N/A       | 0.721 ± 0.045 |
| 16   | Basic ML                    | Nilavo Boral      | GitHub, Paper | N/A       | 0.715 ± 0.011 |

In [2]:
import pandas as pd
from deepmol.pipeline import Pipeline

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
2024-01-24 13:51:21.430261: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-24 13:51:21.503248: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-24 13:51:21.503298: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-24 13:51:21.5

In [3]:
# read results
results = pd.read_csv('herg_test_set.csv', sep=',')
# set columns
results
# order res

Unnamed: 0,Trial,Average,Std
0,89.0,0.763,0.015
1,36.0,0.648,0.012
2,32.0,0.658,0.008
3,33.0,0.665,0.003
4,91.0,0.746,0.019
5,26.0,0.638,0.007
6,77.0,0.726,0.016
7,35.0,0.671,0.013
8,84.0,0.63,0.018
9,80.0,0.692,0.022


In [4]:
pipeline = Pipeline.load('herg/trial_89')

2024-01-24 13:51:55.067171: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


In [7]:
pipeline.steps[-1][1].model

<deepchem.models.graph_models.GraphConvModel at 0x7f67790dfac0>

In [4]:
# load best trial pipeline (rank #20)
best_trial_id = int(results.iloc[0]['trial_id'])
pipeline = Pipeline.load(f"herg/trial_{best_trial_id}/")

[21:04:06] Initializing Normalizer
2023-11-11 21:04:06.125985: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2023-11-11 21:04:06.126034: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: JOAOPC
2023-11-11 21:04:06.126094: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: JOAOPC
2023-11-11 21:04:06.126213: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 525.147.5
2023-11-11 21:04:06.126235: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 525.147.5
2023-11-11 21:04:06.126240: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:309] kernel version seems to match DSO: 525.147.5


In [5]:
pipeline.steps

[('label_encoder',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f97dd83e740>),
 ('standardizer',
  <deepmol.standardizer.custom_standardizer.CustomStandardizer at 0x7f97dd83e440>),
 ('featurizer',
  <deepmol.compound_featurization.rdkit_fingerprints.RDKFingerprint at 0x7f97dd83e860>),
 ('scaler',
  <deepmol.base.transformer.PassThroughTransformer at 0x7f97dda3d3f0>),
 ('model',
  KerasModel(model_builder=<function keras_1d_cnn_model_builder at 0x7f970fc90d30>,
             model_dir='/tmp/tmp1rk748t7'))]