# MoleculeACE - ChEMBL cliff training - MLP based methods

Once the desired encoders have been pre-trained using the accompanying encoder_pretraining and are placed in the [name of corresponding folder], we proceed to training with the datasets included in MoleculeACE, in preparation for evaluation [1].  
The following ChEMBL datasets were chosen, as specified by the criteria in the accompanying thesis publication, [2].  

* ChEMBL234 - Dopamine D3 receptor
* ChEMBL4203 - Dual specificity protein kinase
* ChEMBL2047 - Farnesoid X receptor
* ChEMBL4616 - Ghrelin receptor
* ChEMBL264 - Histamine H3 receptor
* ChEMBL2835 - Janus kinase 1
* ChEMBL4792 - Orexin receptor 2

## Setup

In [1]:
import os.path
import glob
import os

try:
    from google.colab import drive
    drive.mount('/content/drive')
    _home = 'drive/MyDrive/tlacamr'
except ImportError:
    _home = '~'
finally:
    project_root = os.path.join(_home, 'tlacamr')

print(project_root)
%cd $project_root

Mounted at /content/drive
drive/MyDrive/tlacamr/tlacamr
/content/drive/MyDrive/tlacamr/tlacamr


In [2]:
%%capture
!pip install .
### install statement should look like this once repo is public
###!pip install git+https://github.com/my-user/my-repo

In [3]:
## optional wandb login
import wandb
wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

## Model imports




# Training

## No Pretraining

### Classification

#### MLP 2048

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/classification/mlp_based=MLP_2048

#### MLP 1024

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/classification/mlp_based=MLP_1024

#### MLP 256

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/classification/mlp_based=MLP_256

#### Halfstep MLP 2048

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/classification/mlp_based=halfstepMLP_2048

#### Halfstep MLP 1024

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/classification/mlp_based=halfstepMLP_1024

### Regression

#### MLP 2048

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/regression/mlp_based=MLP_2048

#### MLP 1024

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/regression/mlp_based=MLP_1024

#### MLP 256

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/regression/mlp_based=MLP_256

#### Halfstep MLP 2048

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/regression/mlp_based=halfstepMLP_2048

#### Halfstep MLP 1024

In [None]:
%%capture
!python3 src/train.py +experiment/property_prediction/train/regression/mlp_based=halfstepMLP_1024

## Pretrained

In [6]:
pt_encoder_dir = os.path.join('src', 'models', 'pretrained', 'encoders')
print(pt_encoder_dir)

src/models/pretrained/encoders


In [8]:
## Set directories
classification_pt_dir = os.path.join(pt_encoder_dir, 'classification', 'halfstep_single', 'checkpoints')
classification_pt_path = glob.glob(os.path.join(classification_pt_dir, 'last.ckpt'))[0]
print(classification_pt_path)

reconstruction_pt_dir = os.path.join(pt_encoder_dir, 'reconstruction', 'halfstep_single_AE', 'checkpoints')
reconstruction_pt_path = glob.glob(os.path.join(reconstruction_pt_dir, 'last.ckpt'))[0]
print(reconstruction_pt_path)

src/models/pretrained/encoders/classification/halfstep_single/checkpoints/last.ckpt
src/models/pretrained/encoders/reconstruction/halfstep_single_AE/checkpoints/last.ckpt


## Classification

### MLP - Halfstep 2048 single

In [9]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/mlp_based=pt_HSE_2048 \
  ++model.net.pretrained_encoder_ckpt=$classification_pt_path

### MLP - Halfstep 2048 single layernorm


In [10]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/mlp_based=pt_HSE_ln_2048 \
  ++model.net.pretrained_encoder_ckpt=$classification_pt_path

### Autoencoder - Halfstep 2048 single

In [13]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/mlp_based=pt_recon_HSE_2048 \
  ++model.net.pretrained_encoder_ckpt=$reconstruction_pt_path

### Halfstep Autoencoder 2048 single + layernorm

In [14]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/mlp_based=pt_recon_ln_HSE_2048 \
  ++model.net.pretrained_encoder_ckpt=$reconstruction_pt_path

## Regression

### Halfstep 2048 single

In [11]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/mlp_based=pt_HSE_2048 \
  ++model.net.pretrained_encoder_ckpt=$regression_pt_path

### Halfstep 2048 single layernorm

In [12]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/mlp_based=pt_HSE_ln_2048 \
  ++model.net.pretrained_encoder_ckpt=$regression_pt_path

### Halfstep Autoencoder 2048 single


In [15]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/mlp_based=pt_recon_HSE_2048 \
  ++model.net.pretrained_encoder_ckpt=$reconstruction_pt_path

### Halfstep Autoencoder 2048 single + layernorm

In [16]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/mlp_based=pt_recon_ln_HSE_2048 \
  ++model.net.pretrained_encoder_ckpt=$reconstruction_pt_path

### Refs

[1] Derek van Tilborg, Alisa Alenicheva, and Francesca Grisoni.“Exposing the Limitations of Molecular Machine Learning with Activity Cliffs”. In: Journal of Chemical Information and Modeling 62.23 (Dec. 2022), pp. 5938–5951. DOI: 10.1021/acs.jcim.2c01073. URL: https://doi.
org/10.1021/acs.jcim.2c01073.   
[2] César Miguel Valdez Córdova. Towards learning activity cliff-aware molecular representations. Publication pending.