Once the desired encoders have been pre-trained using the accompanying encoder_pretraining and are placed in the [name of corresponding folder], we proceed to evaluating with MoleculeACE [ref].  
The following ChEMBL datasets were chosen, as specified by the criteria in the accompanying thesis publication, [tlacamr ref].  

* ChEMBL234 - Dopamine D3 receptor
* ChEMBL4203 - Dual specificity protein kinase
* ChEMBL2047 - Farnesoid X receptor
* ChEMBL4616 - Ghrelin receptor
* ChEMBL264 - Histamine H3 receptor
* ChEMBL2835 - Janus kinase 1
* ChEMBL4792 - Orexin receptor 2

## Setup

In [None]:
import os.path

try:
    from google.colab import drive
    drive.mount('/content/drive')
    _home = 'drive/MyDrive/tlacamr'
except ImportError:
    _home = '~'
finally:
    project_root = os.path.join(_home, 'tlacamr')

print(project_root)

In [None]:
%cd $project_root
!pip install .
### install statement should look like this once repo is public
###!pip install git+https://github.com/my-user/my-repo

## Evaluation

### Classification

#### MLP 256

In [None]:
!HYDRA_FULL_ERROR=1 python3 src/train.py experiment=property_prediction/jointautoencoder/classification/ChEMBL234 ++trainer.accelerator=gpu

#### MLP 2048

### Regression

#### MLP 256

In [None]:
!HYDRA_FULL_ERROR=1 python3 src/train.py experiment=property_prediction/jointautoencoder/regression/ChEMBL234 ++trainer.accelerator=gpu

#### MLP 2048

## deprecated

In [None]:
from MoleculeACE import MLP, Data, Descriptors, calc_rmse, calc_cliff_rmse, get_benchmark_config

import datamol as dm
import torch
from molfeat.calc import FP_FUNCS, FPCalculator
from molfeat.trans.concat import FeatConcat
from molfeat.trans import MoleculeTransformer

In [None]:
datasets = 'CHEMBL234_Ki', 'CHEMBL4203_Ki', 'CHEMBL2047_EC50', 'CHEMBL4616_EC50', 'CHEMBL264_Ki', 'CHEMBL2835_Ki', 'CHEMBL4792_Ki'
algorithm = MLP
dataset = 'CHEMBL4203_Ki'
data = Data(dataset)
descriptor = Descriptors.ECFP
# Load data

# Get the already optimized hyperparameters
hyperparameters = get_benchmark_config(dataset, algorithm, descriptor)

In [None]:
train_smiles = data.smiles_train
test_smiles = data.smiles_test
featurizer = MoleculeTransformer(FPCalculator('ecfp', length=2048, radius=4))
featurized_train = torch.as_tensor(featurizer(train_smiles), dtype = torch.float32)
featurized_test = torch.as_tensor(featurizer(test_smiles), dtype=torch.float32)

In [None]:
# Train and use a model for prediction
model = algorithm(**hyperparameters)

model.train(data.x_train, data.y_train)
y_hat = model.predict(data.x_test)

# Evaluate your model on activity cliff compounds
rmse = calc_rmse(data.y_test, y_hat)
rmse_cliff = calc_cliff_rmse(y_test_pred=y_hat, y_test=data.y_test, cliff_mols_test=data.cliff_mols_test)

print(f"rmse: {rmse}")
print(f"rmse_cliff: {rmse_cliff}")