In this project we compare a number of molecular representations(vectorizers) to determine what is the most suitable way to represent a molecule as vector when intermolecular interactions are at most interest. In this study we use solvation energy as a target value and solvent and solute molecules as input. The data is obtained from MNSol Database.
Please read the following file for reproducibility, models availability and other comments
The training data is written to Runs folder and the results are stored in Run_results (due to large file sizes Run_results is available for manual download from Yandex Disk) including losses plot, normalization parameters, run_log and comments. The links to each result folder are presented below
All training files are presented in Training_files in the format Solvent_Solute_NN. Examples of getting best val models for LinNet and ResNet are below
# LinNet
from my_nets.LinearNet import LinearNet3
from my_nets.net_func import load_ckp
import torch
in_feat = 207 # specify length
model = LinearNet3(in_features=in_feat)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
best_model, *other = load_ckp('Examples/data/best_Class_Morgan_Lin1', model, optimizer)
# ResNet
from my_nets.ResNET import ResNet1D
from my_nets.net_func import load_ckp
import torch
Res_Dict = {'base_filters':2, 'kernel_size':3, 'stride':2, 'groups':1, 'n_block':3, 'n_classes':1, 'use_bn':True, 'use_do':True, 'verbose':False}
model = ResNet1D(in_channels=1, **Res_Dict)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
best_model, *other = load_ckp('Run_Results/ResNet/Class_Morgan_2_124_Res1/best/best_val_model.pt', model, optimizer)
KRR_training - all KRR experiments are sequentially carried out in this file. The results are available at Google Drive.
# KRR
import pickle as pkl
with open(project_path('/Run_results/KRR/Class_Morgan_KRR1/best_models.pkl'), 'rb') as f:
KRR_Class_Morgan_kernels = pkl.load(f) # dictionary of kernel names and the models
KRR_Class_Morgan = KRR_Class_Morgan_kernels['laplacian']
Experiments on another datasets (Acree and FreeSolv), Gsolv distribution and feature permutation importance are presented in following Jupyter Notebook.
A file with some useful function used along all the project.
A folder with .py files each of which trains the network with some parameters. All KRR training is in one file.
A package with some .py files to create and train networks
Create_dataset - A file that contains functions to create dataset using given vectorizers
net_func - A file that contains functions train network and other useful functions
LinearNet - A file that contains Linear Network used for training
ResNET- A file that contains 1D ResNET used for training. The model is adopted from hsd1503
A package vectorizers.py that contains vectorizers functions used in this project
A folder with tables used for various functions and vectorizers
A folder with some files used to prepare data (tables, dicts, ...)
zero tensor with length one to train models without any information either on solvent or solute.
Three layer classification, described in MNSol Database.
taken from MNSol database calculated parameter of Total Exposed Surface Area. More info in MNSol Database.
properties of solvent: nD, alpha, beta, gamma, epsilon, phi, psi. Sometimes called Abraham descriptors.
calculated morgan fingerprints bit vector, described here
If troubles with installation try
pip install rdkit-pypi
Bag of Bonds.
scipy install problems solved here: https://stackoverflow.com/a/69710042/13835675
Bag of Bonds for bonded atoms only
Bag of Bonds with addition of Angles and Torsion angles between bonded atoms
Smooth Overlap of Atomic Positions, thoroughly described here
Solvent➡️ ⬇️Solute |
Blank | Class | Macro | Morgan | JustBonds | BoB | BAT | SOAP |
---|---|---|---|---|---|---|---|---|
Blank | Class Blank | Macro Blank | Morgan Blank | JB Blank | BoB Blank | BAT_Blank | SOAP_Blank | |
Class | Blank Class | Class Class | Macro Class | Morgan Class | JB Class | BoB Class | BAT_Class | SOAP_Class |
TESA | Blank TESA | Class TESA | Macro TESA | Morgan TESA | JB TESA | BoB TESA | BAT_TESA | SOAP_TESA |
Morgan | Blank Morgan | Class Morgan | Macro Morgan | Morgan Morgan | JB Morgan | BoB Morgan | BAT_Morgan | SOAP_Morgan |
JustBonds | Blank JB | Class JB | Macro JB | Morgan JB | JB JB | BoB JB | BAT JB | SOAP JB |
BoB | Blank BoB | Class BoB | Macro BoB | Morgan BoB | JB BoB | BoB BoB | BAT_BoB | SOAP_BoB |
BAT | Blank BAT | Class_BAT | Macro_BAT | Morgan_BAT | JB BAT | BoB_BAT | BAT_BAT | SOAP_BAT |
SOAP | Blank SOAP | Class_SOAP | Macro_SOAP | Morgan_SOAP | JB SOAP | BoB_SOAP | BAT_SOAP | SOAP_SOAP |
Solvent➡️ ⬇️Solute |
Blank | Class | Macro | Morgan | JustBonds | BoB | BAT | SOAP |
---|---|---|---|---|---|---|---|---|
Blank | Class Blank | Macro Blank | Morgan Blank | JB Blank | BoB Blank | BAT_Blank | SOAP_Blank | |
Class | Blank Class | Class Class | Macro Class | Morgan Class | JB Class | BoB Class | BAT_Class | SOAP_Class |
TESA | Blank TESA | Class TESA | Macro TESA | Morgan TESA | JB TESA | BoB TESA | BAT_TESA | SOAP_TESA |
Morgan | Blank Morgan | Class Morgan | Macro Morgan | Morgan Morgan | JB Morgan | BoB Morgan | BAT_Morgan | SOAP_Morgan |
JustBonds | Blank JB | Class JB | Macro JB | Morgan JB | JB JB | BoB JB | BAT JB | SOAP JB |
BoB | Blank BoB | Class BoB | Macro BoB | Morgan BoB | JB BoB | BoB BoB | BAT_BoB | SOAP_BoB |
BAT | Blank BAT | Class_BAT | Macro_BAT | Morgan_BAT | JB BAT | BoB_BAT | BAT_BAT | SOAP_BAT |
SOAP | Blank SOAP | Class_SOAP | Macro_SOAP | Morgan_SOAP | JB SOAP | BoB_SOAP | BAT_SOAP | SOAP_SOAP |
Solvent➡️ ⬇️Solute |
Blank | Class | Macro | Morgan | JustBonds | BoB | BAT | SOAP |
---|---|---|---|---|---|---|---|---|
Blank | Class Blank | Macro Blank | Morgan Blank | JB Blank | BoB Blank | BAT Blank | SOAP Blank | |
Class | Blank Class | Class Class | Macro Class | Morgan Class | JB Class | BoB Class | BAT Class | SOAP Class |
TESA | Blank TESA | Class TESA | Macro TESA | Morgan TESA | JB TESA | BoB TESA | BAT TESA | SOAP TESA |
Morgan | Blank Morgan | Class Morgan | Macro Morgan | Morgan Morgan | JB Morgan | BoB Morgan | BAT Morgan | SOAP Morgan |
JustBonds | Blank JB | Class JB | Macro JB | Morgan JB | JB JB | BoB JB | BAT JB | SOAP JB |
BoB | Blank BoB | Class BoB | Macro BoB | Morgan BoB | JB BoB | BoB BoB | BAT BoB | SOAP BoB |
BAT | Blank BAT | Class BAT | Macro_BAT | Morgan_BAT | JB BAT | BoB BAT | BAT BAT | SOAP BAT |
SOAP | Blank SOAP | Class SOAP | Macro_SOAP | Morgan_SOAP | JB SOAP | BoB SOAP | BAT SOAP | SOAP SOAP |