## Initial imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob, os, pickle
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from chython import smiles

## Model rebuilding

The hyperparameters of the models are stored in *trials.all* and *trials.best* file after optimization. While these tables are not difficult to read from a human perspective, rebuilding any model manually is not the most interesting process. To help with tat, we provide the utiliity in the rebuilder module to rebuild any model reported in the abovementioned files.

In [2]:
# getting the best regression model for 
best_regr = pd.read_table("lambda/Model_Epipi_regr/trials.best", sep=" ").iloc[0]

from doptools.cli.rebuilder import rebuild_from_file

pipeline, trial = rebuild_from_file("lambda/circus", "lambda/Model_Epipi_regr", best_regr["trial"])
pipeline

In [3]:
photoswitches = pd.read_table("../examples/photoswitches.csv", sep=",", index_col=0)
data_lambda = photoswitches[pd.notnull(photoswitches['E isomer pi-pi* wavelength in nm'])]
lambda_mols = [smiles(s) for s in data_lambda.SMILES]
[m.canonicalize() for m in lambda_mols]
[m.clean2d() for m in lambda_mols]
y = np.array(data_lambda["E isomer pi-pi* wavelength in nm"])
pipeline.fit(lambda_mols, y)

The *pipeline* object now contains the trained model, from the descriptor generation to numerical prediction. Next, let's give it another azo-dye to predict the absorption maximum. The actual value of $\lambda_{max}$ for it in 462.

In [4]:
ext_mol = smiles("C(C)N(CC)c1ccc(cc1)N=Nc2ccc(cc2)C(=O)C")
ext_mol.clean2d()

ext_pred = pipeline.predict([ext_mol])
ext_pred

array([417.48361672])

## Model interpretation

The regression models built in DOPtools with fragment descriptors can be interpreted by the built-in ColorAtom methodology. 

In ColorAtom, the fragment weights are calcualted as partial derivatives of predictions, and all atoms participating is these fragments are assigned these weights. After summarizing these wights over all the framgnets, the atomic contribution can be then visualizaed. Below is an example of such interpretation on the previously introduced molecule.

In [13]:
from doptools import ColorAtom

ca = ColorAtom()
ca.set_pipeline(pipeline) 
# it is necessary to indicate what pipeline, containing 
# the fragmentor, preprocessing and model, is used 
# for interpretation

ca.output_html(ext_mol)


1 2.31110604149643
2 3.603022420398645
3 2.069050316940775
4 2.31110604149643
5 3.603022420398645
6 -2.8175926615344906
7 -7.462919125847293
8 -12.300232235686508
9 -15.480960461744871
10 -12.300232235686508
11 -7.462919125847293
12 -15.74790241410409
13 -16.798420807368245
14 -18.852328407913603
15 -19.424748715199087
16 -16.62276317781874
17 -15.173190169837824
18 -16.62276317781874
19 -19.424748715199087
20 -11.926300483502303
21 -7.989748637059563
22 -5.573124890940733


The results are output as an HTML table containing SVG images for the molecule. The color indicates the contribution - red for negative (towards lower value), green for positive (towards higher value). The intensity of color indicates the magnitue of the contribution, the brighter the color, the more the atom contributes. 

To compare the contributions of different molecules, the user may visualize the colorbar with the indication of the scale of the contributions. To do that, the parameter "colorbar" need to be given.

In [8]:
ca.output_html(ext_mol, colorbar=True)

1 2.31110604149643
2 3.603022420398645
3 2.069050316940775
4 2.31110604149643
5 3.603022420398645
6 -2.8175926615344906
7 -7.462919125847293
8 -12.300232235686508
9 -15.480960461744871
10 -12.300232235686508
11 -7.462919125847293
12 -15.74790241410409
13 -16.798420807368245
14 -18.852328407913603
15 -19.424748715199087
16 -16.62276317781874
17 -15.173190169837824
18 -16.62276317781874
19 -19.424748715199087
20 -11.926300483502303
21 -7.989748637059563
22 -5.573124890940733


In [9]:
ext_mol2 = smiles("C(C)N(CC)c1ccc(cc1)N=Nc2ccc(nc2)C(=O)O")
ext_mol2.clean2d()
ca.output_html(ext_mol2, colorbar=True)

1 5.8908347201605125
2 4.749947317492172
3 7.301075246254641
4 5.8908347201605125
5 4.749947317492172
6 3.505258478415385
7 -1.5535328357753997
8 -6.064336904004108
9 -9.328692100612557
10 -6.064336904004108
11 -1.5535328357753997
12 -8.228458269736507
13 -9.279188620967489
14 -8.505607958194844
15 -5.530763567091128
16 -0.5475352312178074
17 -4.119723035435811
18 -1.4208749144179365
19 -7.126939450766429
20 0.010281619969930489
21 -3.2723236722969773
22 -0.5731334976148332


The color scheme of the visualization may also be changed, either by using the predefined colormaps in Matplotlib, or defining your own.

In [14]:
from matplotlib.colors import LinearSegmentedColormap, ListedColormap

# defining a new colormap, going from blue for negative to red for positive
RWB = LinearSegmentedColormap.from_list("RWB", ["#0571b0","#92c5de","#f7f7f7","#f4a582","#ca0020"])

ca2 = ColorAtom(colormap=RWB)
ca2.set_pipeline(pipeline) 
ca2.output_html(ext_mol, colorbar=True)

1 2.31110604149643
2 3.603022420398645
3 2.069050316940775
4 2.31110604149643
5 3.603022420398645
6 -2.8175926615344906
7 -7.462919125847293
8 -12.300232235686508
9 -15.480960461744871
10 -12.300232235686508
11 -7.462919125847293
12 -15.74790241410409
13 -16.798420807368245
14 -18.852328407913603
15 -19.424748715199087
16 -16.62276317781874
17 -15.173190169837824
18 -16.62276317781874
19 -19.424748715199087
20 -11.926300483502303
21 -7.989748637059563
22 -5.573124890940733


## Classification case

In [None]:
best_regr = pd.read_table("lambda/Model_Epipi_regr/trials.best", sep=" ").iloc[0]

from doptools.cli.rebuilder import rebuild_from_file

pipeline, trial = rebuild_from_file("lambda/circus", "lambda/Model_Epipi_regr", best_regr["trial"])
pipeline