# MoleculeNet Benchmarking
By [Nathan C. Frey](https://ncfrey.github.io/) | [Twitter](https://twitter.com/nc_frey)
 
This notebook shows how to benchmark model performance on a dataset from MoleculeNet and collect validation and test set metrics for the [MoleculeNet Leaderboard](https://github.com/deepchem/moleculenet).

As an example, we use Bayesian optimization to tune the hyperparameters of a random forest and a GraphConv model to predict solubilities of small molecules from the [Delaney dataset](https://pubs.acs.org/doi/10.1021/ci034243x).

### Install condacolab
`condacolab` installs `mamba` or `miniconda` and automatically restarts the kernel. You'll see a "session crashed unexpectedly" message, but you can safely ignore this.

In [1]:
!pip install -q condacolab
import condacolab
condacolab.install_mambaforge()  #miniconda()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:34
🔁 Restarting kernel...


### Setup Colab environment to run DeepChem and MoleculeNet
Install any additional dependencies required by your model or dataset.

In [1]:
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [2]:
!conda --version

conda 4.9.2


In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0


Install DeepChem and dependencies. Add any dependencies that are unique to your model or dataset.

In [None]:
# !mamba install openmm pdbfixer -q  # dependencies for AtomicConvs and PDBbind datasets

In [None]:
!conda install -c conda-forge rdkit
!pip install tensorflow~=2.4 

# Dependencies for benchmarking
!conda install -c dglteam dgl-cuda11.0 dgllife
!pip install hyperopt torch
!pip install --pre deepchem

In [5]:
import rdkit
import deepchem as dc

### Clone the `moleculenet` repo

Clone the `moleculenet` repo or your fork and branch of the `moleculenet` repo to access a benchmarking script.

In [6]:
%cd /content

/content


In [7]:
!git clone https://github.com/deepchem/moleculenet.git
# !git clone --single-branch --branch <branch-name> https://github.com/<username>/moleculenet.git
# !(cd moleculenet && git pull)

Cloning into 'moleculenet'...
remote: Enumerating objects: 366, done.[K
remote: Counting objects: 100% (96/96), done.[K
remote: Compressing objects: 100% (76/76), done.[K
remote: Total 366 (delta 47), reused 49 (delta 18), pack-reused 270[K
Receiving objects: 100% (366/366), 59.19 KiB | 11.84 MiB/s, done.
Resolving deltas: 100% (189/189), done.


In [8]:
%cd /content/moleculenet/examples

/content/moleculenet/examples


### Run benchmarking script with hyperparameter search
The `--help` option displays the possible arguments and default values. For a benchmarking run, you should run a hyperparameter search with Bayesian optimization by specifying `-hs` and setting the relevant parameters like `-r` and `-nt`. The moleculenet scripts use the [hyperopt library](http://hyperopt.github.io/hyperopt/) for hyperparam optimization. The hyperparam search space can be modified in the molnet python script if desired.

If you don't specify the result path argument (`-p`), the results will be saved in `moleculenet/examples/results`. A folder is created for each trial of the hyperparameter search, and inside you will see `configure.json` (with the hyperparams for that run) and `eval.txt` with the validation and test set metrics.

In [9]:
!python fingerprint.py --help

2021-04-27 19:24:42.042724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
usage: Examples for MoleculeNet with fingerprint [-h]
                                                 [-d {BACE_classification,BACE_regression,BBBP,ClinTox,Delaney,HOPV,SIDER,Lipo}]
                                                 [-m {RF}] [-f {ECFP}]
                                                 [-p RESULT_PATH]
                                                 [-r NUM_RUNS] [-hs]
                                                 [-nt NUM_TRIALS]

optional arguments:
  -h, --help            show this help message and exit
  -d {BACE_classification,BACE_regression,BBBP,ClinTox,Delaney,HOPV,SIDER,Lipo}, --dataset {BACE_classification,BACE_regression,BBBP,ClinTox,Delaney,HOPV,SIDER,Lipo}
                        Dataset to use
  -m {RF}, --model {RF}
                        Options include 1) random forest (RF) (default: RF)
  -f {ECFP}, --fe

Let's run some simple benchmarks on the [Delaney drug solubility dataset](https://pubs.acs.org/doi/10.1021/ci034243x) of predicted log solubility in $mol/L$ and see how two different methods perform: 1) a random forest trained on 1024-bit circular fingerprints, and 2) a [GraphConv](https://arxiv.org/abs/1609.02907).

In [10]:
!python fingerprint.py -d Delaney -hs
!cat results/eval.txt

2021-04-27 19:26:25.286420: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Created directory results
Start hyperparameter search with Bayesian optimization for 16 trials
Created directory results/1
Created directory results/2
Created directory results/3
Created directory results/4
Created directory results/5
Created directory results/6
Created directory results/7
Created directory results/8
Created directory results/9
Created directory results/10
Created directory results/11
Created directory results/12
Created directory results/13
Created directory results/14
Created directory results/15
Created directory results/16
100% 16/16 [21:50<00:00, 81.92s/trial, best loss: 1.7806047351134193]
Best val rmse: 1.7806 +- 0.0080
Test rmse: 1.6901 +- 0.0010


After running 16 different models (and training each one 3 times to collect statistics) and using Bayesian optimization, we find an optimal set of model hyperparameters (saved in `examples/configure.json`). We have a validation RMSE of 1.78 $\pm$ 0.008 and a test RMSE of 1.69 $\pm$ 0.001. 

Next, let's look at the same dataset with GraphConv models.

In [11]:
!python gnn.py --help

2021-04-27 19:52:57.121764: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
usage: Examples for MoleculeNet with GNN [-h]
                                         [-d {BACE_classification,BACE_regression,BBBP,ClinTox,Delaney,HOPV,SIDER,Lipo}]
                                         [-m {GCN}] [-f {GC}] [-p RESULT_PATH]
                                         [-r NUM_RUNS] [-pa PATIENCE] [-hs]
                                         [-nt NUM_TRIALS]

optional arguments:
  -h, --help            show this help message and exit
  -d {BACE_classification,BACE_regression,BBBP,ClinTox,Delaney,HOPV,SIDER,Lipo}, --dataset {BACE_classification,BACE_regression,BBBP,ClinTox,Delaney,HOPV,SIDER,Lipo}
                        Dataset to use
  -m {GCN}, --model {GCN}
                        Options include 1) Graph Convolutional Network (GCN)
                        (default: GCN)
  -f {GC}, --featurizer {GC}
                     

In [10]:
!rm -rf results

In [11]:
!python gnn.py -d Delaney -hs
!cat results/eval.txt

2021-04-27 20:07:32.956194: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Created directory results
Start hyperparameter search with Bayesian optimization for 16 trials
Created directory results/1
  0% 0/16 [00:00<?, ?trial/s, best loss=?]Failed to featurize datapoint 934, C. Appending empty array

Exception message: zero-size array to reduction operation maximum which has no identity

  return array(a, dtype, copy=False, order=order)

DGL backend not selected or invalid.  Assuming PyTorch for now.
Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable.  Valid options are: pytorch, mxnet, tensorflow (all lowercase)
  0% 0/16 [00:05<?, ?trial/s, best loss=?]Using backend: pytorch
Created directory results/2
  6% 1/16 [01:49<27:27, 109.81s/trial, best loss: 1.5391837572656486]Failed to featurize datapoint 934, C. Appending empty array



The GraphConv model is quite a bit more sophisticated than a random forest, so it's good to see that it performs much better in predicting solubility. And unlike random forests, which have comparatively simple hyperparameters to tune, here we tuned the dropout, learning rate, number of hidden layers and features within each layer, so Bayesian optimization is critical to really evaluate the model performance.

### Run benchmark with `configure.json`
Alternatively, you can simply add a `<DATASET_NAME>.json` file in a folder called `'configures/<MODEL_NAME>_<FEATURIZER_NAME>.json` that specifies the hyperparameters and runs the benchmark without a hyperparameter search.

In [12]:
!python fingerprint.py -d Delaney 

2021-04-27 20:58:48.902857: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Directory results already exists.
Use the manually specified hyperparameters
Val metric for 3 runs: 1.7685 +- 0.0115
Test metric for 3 runs: 1.7255 +- 0.0266
