# Multitask learning did not perform excitingly well. Can transfer learning do better?
The multitask learning tests were only marginally promising, likely due to the poor data overlap.

In [1]:
import deepchem as dc
import numpy as np
import pandas as pd
import optuna

import cytoxnet.dataprep.io as io
import cytoxnet.dataprep.dataprep as dataprep
import cytoxnet.dataprep.featurize as feat
from cytoxnet.models.models import ToxModel
import cytoxnet.models.opt as opt

## Prepare the two datasets - rat and algea
We must extract the independant test set from the algea data.

In [2]:
rat_df = io.load_data('zhu_rat_LD50')
algea_df = io.load_data('lunghini_algea_EC50')

In [3]:
rat_f = feat.add_features(rat_df, method='ConvMolFeaturizer')
algea_f = feat.add_features(algea_df, method='ConvMolFeaturizer')

In [4]:
rat = dataprep.convert_to_dataset(
    rat_f,
    X_col='ConvMolFeaturizer',
    y_col=[
        'rat_LD50'
    ]
)
algea = dataprep.convert_to_dataset(
    algea_f,
    X_col='ConvMolFeaturizer',
    y_col=[
        'algea_EC50'
    ]
)

In [5]:
rat_normed, rat_transformations = dataprep.data_transformation(
    rat, transformations = ['NormalizationTransformer'],
    to_transform = ['y']
)
algea_normed, algea_transformations = dataprep.data_transformation(
    algea, transformations = ['NormalizationTransformer'],
    to_transform = ['y']
)

In [6]:
algea_dev, algea_test = dataprep.data_splitting(algea_normed, split_type='tt')

Retrieve optimum hyperparameters for single task graph models.

In [48]:
study = optuna.load_study(
    study_name='opt',
    storage="sqlite:///graph_r.db"
)

In [49]:
study_results = study.trials_dataframe()



In [50]:
params = study.best_params

In [51]:
params

{'batch_size': 50,
 'dense_layer_size': 76,
 'dropout': 0.0024988702928001455,
 'graph_conv_layers': [128, 128, 128],
 'number_atom_features': 25}

In [52]:
study.best_value

0.5364451592127468

### Train a baseline graph model

In [30]:
baseline = ToxModel(
    'GraphCNN',
    mode='regression',
    transformers = algea_transformations,
    **params
)



In [31]:
baseline.fit(algea_dev, nb_epoch=100)

  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." %

0.06913197994232177

In [32]:
baseline.evaluate(algea_test, ['r2_score', 'mean_squared_error'], untransform=True)



{'metric-1': 0.37941328545557207, 'metric-2': 2.797050558108971}

In [44]:
baseline.visualize('pair_predict', algea_test, untransform=True)

(288, 1)


> We know that the graphs can outperform RFR fir single task models from our model screening, howvere this is not the case for the algea dataset which is quite small, and graphs need a lot of data. In this case the baseline R2 is 0.38, quite a bit worse than the RFR base;ine.

### Train a transfer model

We are still using the same hyperparameters.

In [36]:
transfer = ToxModel(
    'GraphCNN',
    mode='regression',
    transformers = rat_transformations,
    **params
)



In this case we trian on rat data, which has a lot of examples, and for longer. The hope is that the foundation relationship between the compound graphs and toxicity is mostly independant of the species, it is just to what extent the species is affected that is different.

In [37]:
transfer.fit(rat_normed, nb_epoch=200)

  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." % value)
  "shape. This may consume a large amount of memory." %

0.027182610034942628

Switch to new target

In [38]:
transfer.transformers = algea_transformations

Fix all but the dense output layer - this allows the learned relationship between molecule and toxic function learned from tyhe rat data to remain, but the exact relationship/mechanism between the toxicity and the species of interest to be retrained.

In [39]:
for layer in transfer._model.model.layers[:-1]:
    layer.trainable = False

In [45]:
transfer.fit(algea_dev, nb_epoch=100)

0.021781232357025147

In [46]:
transfer.evaluate(algea_test, ['r2_score', 'mean_squared_error'], untransform=True)

{'metric-1': 0.39858004705650507, 'metric-2': 2.7106639178915977}

In [47]:
transfer.visualize('pair_predict', algea_test, untransform=True)

(288, 1)


> We can recover a bit of the accuracy, up to an R2 of 0.399. This indicates that we can transfer knowledge between species, but better curation of data and an optimization of the fixing strategy (here we decided to fix all but the output regressor, but maybe another scheme of fixing layers would be optimum) since it is still not better than the RFR natively better suited for smaller datasets.