# MoleculeACE - ChEMBL cliff training - Activity Cliff aware methods

Once the desired encoders have been pre-trained using the accompanying encoder_pretraining and are placed in the [name of corresponding folder], we proceed to training with the datasets included in MoleculeACE, in preparation for evaluation [1].  
The following ChEMBL datasets were chosen, as specified by the criteria in the accompanying thesis publication, [2].  

* ChEMBL234 - Dopamine D3 receptor
* ChEMBL4203 - Dual specificity protein kinase
* ChEMBL2047 - Farnesoid X receptor
* ChEMBL4616 - Ghrelin receptor
* ChEMBL264 - Histamine H3 receptor
* ChEMBL2835 - Janus kinase 1
* ChEMBL4792 - Orexin receptor 2

## Setup

In [None]:
import os.path

try:
    from google.colab import drive
    drive.mount('/content/drive')
    _home = 'drive/MyDrive/tlacamr'
except ImportError:
    _home = '~'
finally:
    project_root = os.path.join(_home, 'tlacamr')

print(project_root)
%cd $project_root

Mounted at /content/drive
drive/MyDrive/tlacamr/tlacamr
/content/drive/MyDrive/tlacamr/tlacamr


In [None]:
%%capture
!pip install .
### install statement should look like this once repo is public
###!pip install git+https://github.com/my-user/my-repo

In [None]:
## optional wandb login
import wandb
wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

## Model imports




In [None]:
import os
import glob
pt_encoder_dir = os.path.join('src', 'models', 'pretrained', 'encoders')
print(pt_encoder_dir)

## Set paths
## Set directories
classification_joint_dir = os.path.join(pt_encoder_dir, 'classification', 'halfstep_joint', 'checkpoints')
classification_joint_path = glob.glob(os.path.join(classification_joint_dir, 'last.ckpt'))[0]
print(classification_joint_path)

# Halfstep Joint AE
halfstep_joint_ae_dir = os.path.join(pt_encoder_dir, 'reconstruction', 'halfstep_joint_AE', 'checkpoints')
halfstep_joint_ae_path = glob.glob(os.path.join(halfstep_joint_ae_dir, 'last.ckpt'))[0]
print(halfstep_joint_ae_path)

## siamese classification, latent distances
# Manhattan
manhattan_dir = os.path.join(pt_encoder_dir, 'classification', 'siamese_manhattan', 'checkpoints')
manhattan_path = glob.glob(os.path.join(manhattan_dir, 'last.ckpt'))[0]
print(manhattan_path)

# Hadamard
hadamard_dir = os.path.join(pt_encoder_dir, 'classification', 'siamese_hadamard', 'checkpoints')
hadamard_path = glob.glob(os.path.join(hadamard_dir, 'last.ckpt'))[0]
print(hadamard_path)

# Euclidean
euclidean_dir = os.path.join(pt_encoder_dir, 'classification', 'siamese_euclidean', 'checkpoints')
euclidean_path = glob.glob(os.path.join(euclidean_dir, 'last.ckpt'))[0]
print(euclidean_path)

# Cosine
cosine_dir = os.path.join(pt_encoder_dir, 'classification', 'siamese_cosine', 'checkpoints')
cosine_path = glob.glob(os.path.join(cosine_dir, 'last.ckpt'))[0]
print(cosine_path)

# vanilla recon Halfstep Siamese AE
siamese_ae_dir = os.path.join(pt_encoder_dir, 'reconstruction', 'halfstep_siamese_AE', 'checkpoints')
siamese_ae_path = glob.glob(os.path.join(siamese_ae_dir, 'last.ckpt'))[0]
print(siamese_ae_path)

## "custom" loss
siamese_ae_siamac = os.path.join(pt_encoder_dir, 'semi_supervision', 'siamese_AE_siamac', 'checkpoints')
siamese_ae_siamac_path = glob.glob(os.path.join(siamese_ae_siamac, 'last.ckpt'))[0]
print(siamese_ae_siamac_path)

## Self supervised methods
siamese_ae_ncs = os.path.join(pt_encoder_dir, 'self_supervision', 'siamese_AE_ncs', 'checkpoints')
siamese_ae_ncs_path = glob.glob(os.path.join(siamese_ae_ncs, 'last.ckpt'))[0]
print(siamese_ae_ncs_path)

simsiam_dir = os.path.join(pt_encoder_dir, 'self_supervision', 'simsiam', 'checkpoints')
simsiam_path = glob.glob(os.path.join(simsiam_dir, 'last.ckpt'))[0]
print(simsiam_path)

## positive sets
siamese_ae_ncs_pos = os.path.join(pt_encoder_dir, 'self_supervision', 'siamese_AE_ncs_positives', 'checkpoints')
siamese_ae_ncs_pos_path = glob.glob(os.path.join(siamese_ae_ncs_pos, 'last.ckpt'))[0]
print(siamese_ae_ncs_pos_path)

siamese_ae_siamac_pos = os.path.join(pt_encoder_dir, 'semi_supervision', 'siamese_AE_siamac_positives', 'checkpoints')
siamese_ae_siamac_pos_path = glob.glob(os.path.join(siamese_ae_siamac_pos, 'last.ckpt'))[0]
print(siamese_ae_siamac_pos_path)

simsiam_pos_dir = os.path.join(pt_encoder_dir, 'self_supervision', 'simsiam_positives', 'checkpoints')
simsiam_pos_path = glob.glob(os.path.join(simsiam_pos_dir, 'last.ckpt'))[0]
print(simsiam_pos_path)

src/models/pretrained/encoders
src/models/pretrained/encoders/classification/halfstep_joint/checkpoints/last.ckpt
src/models/pretrained/encoders/reconstruction/halfstep_joint_AE/checkpoints/last.ckpt
src/models/pretrained/encoders/classification/siamese_manhattan/checkpoints/last.ckpt
src/models/pretrained/encoders/classification/siamese_hadamard/checkpoints/last.ckpt
src/models/pretrained/encoders/classification/siamese_euclidean/checkpoints/last.ckpt
src/models/pretrained/encoders/classification/siamese_cosine/checkpoints/last.ckpt
src/models/pretrained/encoders/reconstruction/halfstep_siamese_AE/checkpoints/last.ckpt
src/models/pretrained/encoders/semi_supervision/siamese_AE_siamac/checkpoints/last.ckpt
src/models/pretrained/encoders/self_supervision/siamese_AE_ncs/checkpoints/last.ckpt
src/models/pretrained/encoders/self_supervision/simsiam/checkpoints/last.ckpt
src/models/pretrained/encoders/self_supervision/siamese_AE_ncs_positives/checkpoints/last.ckpt
src/models/pretrained/enco

## Classification

### Joint Classification 2048

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_joint_1024 \
  ++model.net.pretrained_encoder_ckpt=$classification_joint_path

### Joint AutoEncoder 2048

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_joint_recon_1024 \
  ++model.net.pretrained_encoder_ckpt=$halfstep_joint_ae_path

### Siamese classification - manhattan distance

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_manhattan \
  ++model.net.pretrained_encoder_ckpt=$manhattan_path

### Siamese classification - hadamard product

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_hadamard \
  ++model.net.pretrained_encoder_ckpt=$hadamard_path

### Siamese classification - euclidean distance

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_euclidean \
  ++model.net.pretrained_encoder_ckpt=$euclidean_path

### Siamese classification - cosine similarity

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_cosine \
  ++model.net.pretrained_encoder_ckpt=$cosine_path

### Siamese AutoEncoder - naive two pass BCE, symmetric loss

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_ae_naive\
  ++model.net.pretrained_encoder_ckpt=$halfstep_siamese_ae_path

### Siamese autoencoder - negative cosine similarity

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_ae_ncs \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_ncs_path

### Siamese autoencoder - NCS all positives

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_ae_ncs_allpos \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_ncs_pos_path

### Siamese autoencoder - SiamACLoss

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_ae_siamac \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_siamac_path

## Siamese autoencoder - SiamACLoss all positives

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_siamese_ae_siamac_allpos \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_siamac_pos_path

### SimSiam

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_simsiam\
  ++model.net.pretrained_encoder_ckpt=$simsiam_path

### SimSiam - all positives

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/classification/ac_based=pt_simsiam_allpos \
  ++model.net.pretrained_encoder_ckpt=$simsiam_pos_path

## Regression

### Joint classification 2048

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_joint_1024 \
  ++model.net.pretrained_encoder_ckpt=$classification_joint_path

### Joint AutoEncoder 2048

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_joint_recon_1024 \
  ++model.net.pretrained_encoder_ckpt=$halfstep_joint_ae_path

### Siamese classification - Manhattan distance

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_manhattan \
  ++model.net.pretrained_encoder_ckpt=$manhattan_path

### Siamese classification - Hadamard product

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_hadamard \
  ++model.net.pretrained_encoder_ckpt=$hadamard_path

### Siamese classification - Euclidean distance

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_euclidean \
  ++model.net.pretrained_encoder_ckpt=$euclidean_path

### Siamese classification - Cosine similarity

In [15]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_cosine \
  ++model.net.pretrained_encoder_ckpt=$cosine_path

### Siamese AutoEncoder - Naive two pass BCE, symmetric loss

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_ae_naive\
  ++model.net.pretrained_encoder_ckpt=$halfstep_siamese_ae_path

### Siamese autoencoder - negative cosine similarity

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_ae_ncs \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_ncs_path

### Siamese autoencoder - ncs all positives

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_ae_ncs_allpos \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_ncs_pos_path

### Siamese autoencoder - SiamACLoss

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_ae_siamac \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_siamac_path

### Siamese autoencoder - all positives

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_siamese_ae_siamac_allpos \
  ++model.net.pretrained_encoder_ckpt=$siamese_ae_siamac_pos_path

### SimSiam

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_simsiam\
  ++model.net.pretrained_encoder_ckpt=$simsiam_path

### SimSiam - all positives

In [None]:
%%capture
!python3 src/train.py \
  +experiment/property_prediction/train/regression/ac_based=pt_simsiam_allpos \
  ++model.net.pretrained_encoder_ckpt=$simsiam_pos_path

## Kill box, run after pretraining to save colab resources

In [None]:
#from google.colab import runtime
#runtime.unassign()

## Refs

[1] Derek van Tilborg, Alisa Alenicheva, and Francesca Grisoni.“Exposing the Limitations of Molecular Machine Learning with Activity Cliffs”. In: Journal of Chemical Information and Modeling 62.23 (Dec. 2022), pp. 5938–5951. DOI: 10.1021/acs.jcim.2c01073. URL: https://doi.
org/10.1021/acs.jcim.2c01073.   
[2] César Miguel Valdez Córdova. Towards learning activity cliff-aware molecular representations. Publication pending.