# "Предсказание синтезируемости молекул лекарств с помощью глубокого обучения", Агеев Андрей

Задание:

Воспользуйтесь вариационным автокодировщиком для генерации молекул - изучите репозиторий https://github.com/aspuru-guzik-group/chemical_vae.

Закодируйте, декодируйте и предскажите свойства следующих молекул:

1. Cc1ccc(S2(=O)=NC(=O)Nc3ccccc32)cc1

2. CN(Cc1ccc2c(c1)C(=O)CC2)C(=O)OC(C)(C)C

3. COC(=O)C1CCC(Oc2ccc(NC(=O)C(=O)NN)cn2)CC1

Добавляем необходимые пакеты

In [1]:
import warnings
warnings.filterwarnings('ignore')
# tensorflow backend
from os import environ
environ['KERAS_BACKEND'] = 'tensorflow'
# vae stuff
from chemvae.vae_utils import VAEUtils
from chemvae import mol_utils as mu
# import scientific py
import numpy as np
import pandas as pd
# rdkit stuff
from rdkit.Chem import AllChem as Chem
from rdkit.Chem import PandasTools
# plotting stuff
import matplotlib.pyplot as plt
import matplotlib as mpl
from IPython.display import SVG, display
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

Using TensorFlow backend.


Загружаем модель

In [2]:
vae = VAEUtils(directory='chemical_vae-master/models/zinc_properties')

Using standarized functions? True
Standarization: estimating mu and std values ...done!


## Decode/Encode 

Основная схема выглядит так:

smiles <i class="fa fa-arrow-right" aria-hidden="true"></i> x <i class="fa fa-arrow-right" aria-hidden="true"></i> z <i class="fa fa-arrow-right" aria-hidden="true"></i> x_r <i class="fa fa-arrow-right" aria-hidden="true"></i> smiles_r

In [3]:
smiles_1 = mu.canon_smiles('Cc1ccc(S2(=O)=NC(=O)Nc3ccccc32)cc1')

X_1 = vae.smiles_to_hot(smiles_1,canonize_smiles=True)
z_1 = vae.encode(X_1)
X_r_1= vae.decode(z_1)

print('{:20s} : {}'.format('Input',smiles_1))
print('{:20s} : {}'.format('Reconstruction',vae.hot_to_smiles(X_r_1,strip=True)[0]))

print('{:20s} : {} with norm {:.3f}'.format('Z representation',z_1.shape, np.linalg.norm(z_1)))

Input                : Cc1ccc(S2(=O)=NC(=O)Nc3ccccc32)cc1
Reconstruction       : C(cccc(C[n+]2=NC(=O)Nc3ccccc32)cc1
Z representation     : (1, 196) with norm 10.280


In [4]:
smiles_2 = mu.canon_smiles('CN(Cc1ccc2c(c1)C(=O)CC2)C(=O)OC(C)(C)C')

X_2 = vae.smiles_to_hot(smiles_2,canonize_smiles=True)
z_2 = vae.encode(X_2)
X_r_2= vae.decode(z_2)

print('{:20s} : {}'.format('Input',smiles_2))
print('{:20s} : {}'.format('Reconstruction',vae.hot_to_smiles(X_r_2,strip=True)[0]))

print('{:20s} : {} with norm {:.3f}'.format('Z representation',z_2.shape, np.linalg.norm(z_2)))

Input                : CN(Cc1ccc2c(c1)C(=O)CC2)C(=O)OC(C)(C)C
Reconstruction       : CN(C)cccc2c(c1)C(=O)CC2)C(=O)OC(C)(C)C
Z representation     : (1, 196) with norm 13.437


In [5]:
smiles_3 = mu.canon_smiles('COC(=O)C1CCC(Oc2ccc(NC(=O)C(=O)NN)cn2)CC1')

X_3 = vae.smiles_to_hot(smiles_3,canonize_smiles=True)
z_3 = vae.encode(X_3)
X_r_3= vae.decode(z_3)

print('{:20s} : {}'.format('Input',smiles_3))


print('{:20s} : {}'.format('Reconstruction',vae.hot_to_smiles(X_r_3,strip=True)[0]))

print('{:20s} : {} with norm {:.3f}'.format('Z representation',z_3.shape, np.linalg.norm(z_3)))

Input                : COC(=O)C1CCC(Oc2ccc(NC(=O)C(=O)NN)cn2)CC1
Reconstruction       : COC(=O)C1CCC(Oc2ccc(NC(=O)C(=O)N=Ocn2)CC1
Z representation     : (1, 196) with norm 13.116


## property preditor


Предскажем следующие свойства: 

Water−octanol partition coefficient (logP);

synthetic accessibility score (SAS);

Quantitative Estimation of Drug-likeness (QED)

In [6]:
print('Properties (qed,SAS,logP):')
y_1 = vae.predict_prop_Z(z_1)[0]
print(y_1)

Properties (qed,SAS,logP):
[0.72313255 2.4103725  3.1467233 ]


In [7]:
print('Properties (qed,SAS,logP):')
y_2 = vae.predict_prop_Z(z_2)[0]
print(y_2)

Properties (qed,SAS,logP):
[0.81158835 2.2198553  2.4382758 ]


In [8]:
print('Properties (qed,SAS,logP):')
y_3 = vae.predict_prop_Z(z_3)[0]
print(y_3)

Properties (qed,SAS,logP):
[0.75315255 2.4784982  0.05034626]
