In [None]:
!pip install deepchem
!pip install 'deepchem[torch]'
!pip install 'deepchem[tensorflow]'
!pip install pandas
!pip install hyperopt

In [None]:
import deepchem as dc
import pandas as pd
tasks, datasets, transformers = dc.molnet.load_tox21(featurizer='GraphConv', reload=False)
train_dataset, valid_dataset, test_dataset = datasets

Okay, at the outset, you (or future me) may be wondering what a "featurizer" is. Per DeepChem documentation: 

> "\[A] 'featurizer' is chunk of code which transforms raw input data into a processed form suitable for machine learning. Machine learning methods often need data to be pre-chewed for them to process. Think of this like a mama penguin chewing up food so the baby penguin can digest it easily."

Fair enough.

This wonderful forum post [here](https://forum.deepchem.io/t/what-is-a-featurizer/833#:~:text=In%20computer%20vision%2C%20your,That%E2%80%99s%20what%20featurization%20is) also puts it in layman's terms: "Before you can apply machine learning to molecules, you need to decide how to represent them. That’s what featurization is."

In the context of the code above, `featurizer='GraphConv'` indicates that we will be using the `ConvMolFeaturizer`, which interoperates with graph convolution models that inherit `KerasModel`.

DeepChem provides a classification model [`GraphConvModel`](https://github.com/deepchem/deepchem/blob/master/deepchem/models/graph_models.py) that works out of the box. 

Here's a high-level rundown of a few default hyperparameters:
- width of channels for the Graph Convolution Layers = 64
- number of tasks = 12
- number of atom features = 75
- dropout = 0.0
- mode = classification
- batch size = 100

Further, it looks like we're using cross entropy loss (`SoftmaxCrossEntropy`) and ADAM optimizer (`tf.keras.optimizers.Adam`)

In [None]:
n_tasks = len(tasks)
model = dc.models.GraphConvModel(n_tasks, mode='classification')
model.fit(train_dataset, nb_epoch=50)

In [None]:
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
print('Training set score:', model.evaluate(train_dataset, [metric], transformers))
print('Test set score:', model.evaluate(test_dataset, [metric], transformers))

It's pretty accurate for 6 lines of code, but where's the fun in calling functions?

Let's see if we can build on top of this model by integrating **hyperparameter optimization**.

To define the search space, I decided to consult experts in the field (and by that I mean look at existing literature) to see what hyperparameters other groups have utilized.

[Fout et al. (2017)](https://proceedings.neurips.cc/paper_files/paper/2017/file/f507783927f2ec2737ba40afbd17efb5-Paper.pdf) used the following search space during the validation stage when using GCNs for protien interface prediction:
- Edge distance feature RBF kernel standard deviation (2 to 32)
- Negative to positive example ratio (1:1 to 20:1)
- Number of convolutional layers (1 to 6)
- Number of filters (8 to 2000)
- Neighborhood size (2 to 26)
- Pairwise residue representation (elementwise sumproduct vs concatenation)
- Number of dense layers after merging (0 to 4)
- Optimization algorithm (stochastic gradient descent, RMSProp, ADAM, Momentum)
- Learning rate (0.01 to 1)
- Dropout probability (0.3 to 0.8)
- Minibatch size (64 or 128 examples)
- Number of epochs (50 to 1000) 

Below, I've tried to replicate the search space for hyperparameters relevant to our project. Note that I have fixed the optimizer to be ADAM. Future steps will likely involve testing different optimization algorithms.

The discrete options for many hyperparameters (particularly `graph_conv_layers`) are essentially arbitrary (powers of two!); thus, yet another future step is to incorporate options that are *not* subject to the whims of the human mind. 


Another interesting finding is that "Automatic model selection as in Bergstra et al. failed to outperform the best manual search results." Perhaps hyperopt may not be the *absolute* best option here, but for a proof of concept, let us proceed.

In [None]:
from hyperopt import hp, fmin, tpe, Trials
search_space = {
    'graph_conv_layers': hp.choice('graph_conv_layers',[[64], [64, 64], [128, 128], [16, 64, 128, 64, 16]]),
    'dense_layer_size': hp.choice('dense_layer_sizes', [64, 128]),
    'dropouts': hp.uniform('dropout',low=0.1, high=0.5),
    'batch_sizes': hp.choice('batch_sizes', [64, 100, 128]),
    'epochs' : hp.choice('epochs', [50, 250, 750, 1000])
}

We then declarate a function to be minimized by the hyperopt based on the structure provided by [DeepChem](https://github.com/deepchem/deepchem/blob/master/examples/tutorials/Advanced_model_training_using_hyperopt.ipynb).

For clarity, all instances of `MultitaskClassifier` in the example have been replaced with `GraphConvModel` as the latter is relevant to this project. 

In [None]:
import tempfile
#tempfile is used to save the best checkpoint later in the program.

metric = dc.metrics.Metric(dc.metrics.roc_auc_score)

def fm(args):
  save_dir = tempfile.mkdtemp()
  model = dc.models.GraphConvModel(
    n_tasks=len(tasks),
    graph_conv_layers=args['graph_conv_layers'],
    dense_layer_size=args['dense_layer_size'],
    dropouts=args['dropouts'],
    mode="classification",
    number_atom_features=75,
    n_classes=2,
    batch_size=args['batch_sizes'],
    batch_normalize=True,
    uncertainty=False,
  )
  
  #validation callback that saves the best checkpoint, i.e the one with the maximum score.
  validation=dc.models.ValidationCallback(valid_dataset, 1000, [metric],save_dir=save_dir,transformers=transformers,save_on_minimum=False)
  
  model.fit(train_dataset, nb_epoch=25,callbacks=validation)

  #restoring the best checkpoint and passing the negative of its validation score to be minimized.
  model.restore(model_dir=save_dir)
  valid_score = model.evaluate(valid_dataset, [metric], transformers)

  return -1*valid_score['roc_auc_score']

In [None]:
trials=Trials()
best = fmin(fm,
    		space= search_space,
    		algo=tpe.suggest,
    		max_evals=10,
    		trials = trials)

FINALLY! Drumroll please. The best hyperparameters found by the hyperopt are...

In [None]:
print(f"Best: {best}")

Anticlimactic. Regardless, let's throw these parameters back into our model and see if we notice any improvements at all.

In [None]:
model = dc.models.GraphConvModel(
    n_tasks=len(tasks),
    graph_conv_layers=[128, 128],
    dense_layer_size=64,
    dropouts=0.39,
    mode="classification",
    number_atom_features=75,
    n_classes=2,
    batch_size=100,
    batch_normalize=True,
    uncertainty=False,
)
model.fit(train_dataset, nb_epoch=750)

And the stats:

In [None]:
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
print('Training set score:', model.evaluate(train_dataset, [metric], transformers))
print('Test set score:', model.evaluate(test_dataset, [metric], transformers))

Despite adding drouput, it looks like the optimizations we made have overfit the data :(. This is certainly something to keep an eye out for. Let's proceed.

Optimization has no doubt been fun, even if the results may be, uh, unexpected. We've definitely now learned how to optimize hyperparameters (in theory), so we can add it to our arsenal.

But there's nothing like creating your own GNN. Let's delve into the depths of GNNs.

Because Robert Frost is on my mind, why not take the road not yet taken and write our model in PyTorch rather than TensorFlow. This may require us to modify which featurizers we use among other things, but a headache *should* be worth it. I think.

Jump to `graph_classification.ipynb` for next steps.