Skip to content

EduNetArchive/Isaev_NN-Solvation

 
 

Repository files navigation

Comparative study of various molecular representation for intermolecular interaction predictions

In this project we compare a number of molecular representations(vectorizers) to determine what is the most suitable way to represent a molecule as vector when intermolecular interactions are at most interest. In this study we use solvation energy as a target value and solvent and solute molecules as input. The data is obtained from MNSol Database.

Please read the following file for reproducibility, models availability and other comments

Training

The training data is written to Runs folder and the results are stored in Run_results (due to large file sizes Run_results is available for manual download from Yandex Disk) including losses plot, normalization parameters, run_log and comments. The links to each result folder are presented below

Neural Networks

All training files are presented in Training_files in the format Solvent_Solute_NN. Examples of getting best val models for LinNet and ResNet are below

# LinNet

from my_nets.LinearNet import LinearNet3
from my_nets.net_func import load_ckp
import torch

in_feat = 207  # specify length 
model = LinearNet3(in_features=in_feat)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
best_model, *other = load_ckp('Examples/data/best_Class_Morgan_Lin1', model, optimizer)
# ResNet

from my_nets.ResNET import ResNet1D
from my_nets.net_func import load_ckp
import torch

Res_Dict = {'base_filters':2, 'kernel_size':3, 'stride':2, 'groups':1, 'n_block':3, 'n_classes':1, 'use_bn':True, 'use_do':True, 'verbose':False}
model = ResNet1D(in_channels=1, **Res_Dict)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
best_model, *other = load_ckp('Run_Results/ResNet/Class_Morgan_2_124_Res1/best/best_val_model.pt', model, optimizer)

Kernel Ridge Regression

KRR_training - all KRR experiments are sequentially carried out in this file. The results are available at Google Drive.

# KRR

import pickle as pkl

with open(project_path('/Run_results/KRR/Class_Morgan_KRR1/best_models.pkl'), 'rb') as f:
    KRR_Class_Morgan_kernels = pkl.load(f)  # dictionary of kernel names and the models
KRR_Class_Morgan = KRR_Class_Morgan_kernels['laplacian']

Other experiments

Experiments on another datasets (Acree and FreeSolv), Gsolv distribution and feature permutation importance are presented in following Jupyter Notebook.

Repository structure

A file with some useful function used along all the project.

A folder with .py files each of which trains the network with some parameters. All KRR training is in one file.

A package with some .py files to create and train networks

    Create_dataset - A file that contains functions to create dataset using given vectorizers

    net_func - A file that contains functions train network and other useful functions

    LinearNet - A file that contains Linear Network used for training

    ResNET- A file that contains 1D ResNET used for training. The model is adopted from hsd1503

A package vectorizers.py that contains vectorizers functions used in this project

A folder with tables used for various functions and vectorizers

A folder with some files used to prepare data (tables, dicts, ...)

Vectorizers

Blank

zero tensor with length one to train models without any information either on solvent or solute.

Classification

Three layer classification, described in MNSol Database.

Solute_TESA

taken from MNSol database calculated parameter of Total Exposed Surface Area. More info in MNSol Database.

Solvent_Macro_props

properties of solvent: nD, alpha, beta, gamma, epsilon, phi, psi. Sometimes called Abraham descriptors.

MorganFingerprints

calculated morgan fingerprints bit vector, described here

If troubles with installation try

pip install rdkit-pypi

BoB

Bag of Bonds.

scipy install problems solved here: https://stackoverflow.com/a/69710042/13835675

JustBonds (JB)

Bag of Bonds for bonded atoms only

BAT

Bag of Bonds with addition of Angles and Torsion angles between bonded atoms

SOAP

Smooth Overlap of Atomic Positions, thoroughly described here

Table with Solvent-Solute Experiment links

Kernel Ridge Regression

Solvent➡️
⬇️Solute
Blank Class Macro Morgan JustBonds BoB BAT SOAP
Blank Class Blank Macro Blank Morgan Blank JB Blank BoB Blank BAT_Blank SOAP_Blank
Class Blank Class Class Class Macro Class Morgan Class JB Class BoB Class BAT_Class SOAP_Class
TESA Blank TESA Class TESA Macro TESA Morgan TESA JB TESA BoB TESA BAT_TESA SOAP_TESA
Morgan Blank Morgan Class Morgan Macro Morgan Morgan Morgan JB Morgan BoB Morgan BAT_Morgan SOAP_Morgan
JustBonds Blank JB Class JB Macro JB Morgan JB JB JB BoB JB BAT JB SOAP JB
BoB Blank BoB Class BoB Macro BoB Morgan BoB JB BoB BoB BoB BAT_BoB SOAP_BoB
BAT Blank BAT Class_BAT Macro_BAT Morgan_BAT JB BAT BoB_BAT BAT_BAT SOAP_BAT
SOAP Blank SOAP Class_SOAP Macro_SOAP Morgan_SOAP JB SOAP BoB_SOAP BAT_SOAP SOAP_SOAP

Linear

Solvent➡️
⬇️Solute
Blank Class Macro Morgan JustBonds BoB BAT SOAP
Blank Class Blank Macro Blank Morgan Blank JB Blank BoB Blank BAT_Blank SOAP_Blank
Class Blank Class Class Class Macro Class Morgan Class JB Class BoB Class BAT_Class SOAP_Class
TESA Blank TESA Class TESA Macro TESA Morgan TESA JB TESA BoB TESA BAT_TESA SOAP_TESA
Morgan Blank Morgan Class Morgan Macro Morgan Morgan Morgan JB Morgan BoB Morgan BAT_Morgan SOAP_Morgan
JustBonds Blank JB Class JB Macro JB Morgan JB JB JB BoB JB BAT JB SOAP JB
BoB Blank BoB Class BoB Macro BoB Morgan BoB JB BoB BoB BoB BAT_BoB SOAP_BoB
BAT Blank BAT Class_BAT Macro_BAT Morgan_BAT JB BAT BoB_BAT BAT_BAT SOAP_BAT
SOAP Blank SOAP Class_SOAP Macro_SOAP Morgan_SOAP JB SOAP BoB_SOAP BAT_SOAP SOAP_SOAP

ResNET

Solvent➡️
⬇️Solute
Blank Class Macro Morgan JustBonds BoB BAT SOAP
Blank Class Blank Macro Blank Morgan Blank JB Blank BoB Blank BAT Blank SOAP Blank
Class Blank Class Class Class Macro Class Morgan Class JB Class BoB Class BAT Class SOAP Class
TESA Blank TESA Class TESA Macro TESA Morgan TESA JB TESA BoB TESA BAT TESA SOAP TESA
Morgan Blank Morgan Class Morgan Macro Morgan Morgan Morgan JB Morgan BoB Morgan BAT Morgan SOAP Morgan
JustBonds Blank JB Class JB Macro JB Morgan JB JB JB BoB JB BAT JB SOAP JB
BoB Blank BoB Class BoB Macro BoB Morgan BoB JB BoB BoB BoB BAT BoB SOAP BoB
BAT Blank BAT Class BAT Macro_BAT Morgan_BAT JB BAT BoB BAT BAT BAT SOAP BAT
SOAP Blank SOAP Class SOAP Macro_SOAP Morgan_SOAP JB SOAP BoB SOAP BAT SOAP SOAP SOAP

The End

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%