## Initial imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob, os, pickle
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from chython import smiles

# Model rebuilding

The hyperparameters of the models are stored in *trials.all* and *trials.best* file after optimization. While these tables are not difficult to read from a human perspective, rebuilding any model manually is not the most interesting process. To help with tat, we provide the utiliity in the rebuilder module to rebuild any model reported in the abovementioned files.

In [70]:
# getting the best regression model for 
best_regr = pd.read_table("lambda/Model_Epipi_regr/trials.best", sep=" ").iloc[0]

from doptools.cli.rebuilder import rebuild_from_file

pipeline, trial = rebuild_from_file("lambda/circus", "lambda/Model_Epipi_regr", best_regr["trial"])
pipeline

In [73]:
photoswitches = pd.read_table("../examples/photoswitches.csv", sep=",", index_col=0)
data_lambda = photoswitches[pd.notnull(photoswitches['E isomer pi-pi* wavelength in nm'])]
lambda_mols = [smiles(s) for s in data_lambda.SMILES]
[m.canonicalize() for m in lambda_mols]
[m.clean2d() for m in lambda_mols]
y = np.array(data_lambda["E isomer pi-pi* wavelength in nm"])
pipeline.fit(lambda_mols, y)

The *pipeline* object now contains the trained model, from the descriptor generation to numerical prediction. Next, let's give it another azo-dye to predict the absorption maximum. The actual value of $\lambda_{max}$ for it in 462.

In [75]:
ext_mol = smiles("C(C)N(CC)c1ccc(cc1)N=Nc2ccc(cc2)C(=O)C")
ext_mol.clean2d()

ext_pred = pipeline.predict([ext_mol])
ext_pred

array([417.48361672])

## Model interpretation

The regression models built in DOPtools with fragment descriptors can be interpreted by the built-in ColorAtom methodology. 

In ColorAtom, the fragment weights are calcualted as partial derivatives of predictions, and all atoms participating is these fragments are assigned these weights. After summarizing these wights over all the framgnets, the atomic contribution can be then visualizaed. Below is an example of such interpretation on the previously introduced molecule.

In [83]:
from doptools import ColorAtom

ca = ColorAtom()
ca.set_pipeline(pipeline) 
# it is necessary to indicate what pipeline, containing 
# the fragmentor, preprocessing and model, is used 
# for interpretation

ca.output_html(ext_mol)


The results are output as an HTML table containing SVG images for the molecule. The color indicates the contribution - red for negative (towards lower value), green for positive (towards higher value). The intensity of color indicates the magnitue of the contribution, the brighter the color, the more the atom contributes. Please note that using the same CA object to calculate contributions for different molecule will not retain the scale, thus the intensity may not be used for comparison between molecules.