# Scaffold-based reinforcement learning and molecule generation

In some cases you might have idea of a scaffold or fragments that your new molecules should contain. In this case, it usefull to do the [reinforcement learning](rl_optimization.ipynb) and the [molecule generation](generation.ipynb) with preselected fragments (single fragment or combination of multiple fragments).

In this tutorial, we show an example with a single pyrazine and a combination of a pyrazine and a thiophene for the smiles- and graph-based transformer.

To understand this tutorial and to have all necessary files, we expect the user to be familiar with the basic tutorials on [data preprocessing](../datasets.ipynb), [reinforcement learning](rl_optimization.ipynb) and the [molecule generation](generation.ipynb).

In [1]:
import sys
sys.path.append('..')
from utils import smilesToGrid

In [2]:
frags = ['c1cnccn1', 'c1cnccn1.c1ccsc1' ]  

smilesToGrid(frags)

MolGridWidget()

# Building the environment

In this example, we build both the graph- and smiles-based models with the same vocabularies, pretrained and finetuned generators and QSAR models as in the [general RL example](../rl_optimization.ipynb).

First we build the environment which is identique for both models,

In [3]:
from drugex.training.scorers.properties import Property
from drugex.training.scorers.modifiers import ClippedScore
from drugex.training.environment import DrugExEnvironment
from drugex.training.rewards import WeightedSum

from qsprpred.scorers.predictor import Predictor


# QSAR model for A1 - inactive target
scorer_a1 = Predictor.fromFile('../jupyter/models',  type="REG", name="A1", algorithm='RF', target='P30542', scale=False)
scorer_a1.setModifier(ClippedScore(lower_x=6.5, upper_x=6.5+2))

# QSAR model for A2 - active target
scorer_a3 = Predictor.fromFile('../jupyter/models',  type="REG", name="A3", algorithm='RF', target='P0DMS8', scale=False)
scorer_a1.setModifier(ClippedScore(lower_x=6.5-2, upper_x=6.5))

# QED and SAscore
qed = Property("QED", modifier=ClippedScore(lower_x=0, upper_x=1.0))
sascore = Property("SA", modifier=ClippedScore(lower_x=4.5, upper_x=0))

# Create environment
scorers = [scorer_a1, scorer_a3, qed, sascore]
thresholds = [0.99, 0.99, 0.5, 0.5]
environment = DrugExEnvironment(scorers, thresholds, reward_scheme=WeightedSum())

# Graph-based Transformer
## Data Preprocessing

We use the same encoder as in [Preparing Data for the Graph-Based Transformer](../datasets.ipynb) to create molecules from the fragments and encode fragment-molecule pairs, with a small modifications:
1. Instead of using a `fragmenter` we create dummy molecules from the fragments with `dummyMolsFromFragments` 
2. Set `splitter` to `None`, `n_proc` and `chunk_size` to 1 

In [4]:
import os
from drugex.data.datasets import GraphFragDataSet
from drugex.molecules.converters.dummy_molecules import dummyMolsFromFragments
from drugex.data.fragments import FragmentCorpusEncoder, GraphFragmentEncoder
from drugex.data.corpus.vocabulary import VocGraph

fragmenter = dummyMolsFromFragments()
splitter = None

encoder = FragmentCorpusEncoder(
    fragmenter=fragmenter, 
    encoder=GraphFragmentEncoder(
        VocGraph(n_frags=4) 
    ),
    pairs_splitter=splitter, 
    n_proc=1,
    chunk_size=1
)

graph_input_folder = "../jupyter/data/sets/graph/"
if not os.path.exists(graph_input_folder):
    os.makedirs(graph_input_folder)
    
dataset = GraphFragDataSet(f"{graph_input_folder}/scaffold_graph.tsv", rewrite=True)

In [5]:
encoder.apply(list(frags), encodingCollectors=[dataset])

Creating fragment-molecule pairs (batch processing):   0%|          | 0/2 [00:00<?, ?it/s]

Encoding fragment-molecule pairs. (batch processing):   0%|          | 0/2 [00:00<?, ?it/s]

## Reinforcement learning

Then we can build the explorer composed of the agent, the prior and the enviroment.

In [6]:
from drugex.training.models.explorer import GraphExplorer
from drugex.training.models.transform import GraphModel
from drugex.data.corpus.vocabulary import VocGraph

GPUS = gpus=(1,)

vocabulary = VocGraph.fromFile('../jupyter/models/finetuned/graph/ligand_finetuned.vocab')
finetuned = GraphModel(voc_trg=vocabulary, use_gpus=GPUS)
finetuned.loadStatesFromFile('../jupyter/models/finetuned/graph/chembl_ligand.pkg')
pretrained = GraphModel(voc_trg=vocabulary, use_gpus=GPUS)
pretrained.loadStatesFromFile('../jupyter/models/pretrained/graph/chembl27/chembl27_graph.pkg')

explorer = GraphExplorer(agent=pretrained, env=environment, mutate=finetuned, epsilon=0.1, use_gpus=GPUS)

But used only the selected scaffolds as input fragments for training and validation. As the initial set only contains two inputs, they are sampled 100 times to create the training set and 100*0.2=20 to create the test set.

In [7]:
from drugex.data.datasets import GraphFragDataSet

data_path = '../jupyter/data/sets/graph/scaffold_graph.tsv'
train_loader = GraphFragDataSet(data_path).asDataLoader(batch_size=1024, n_samples=100)
test_loader = GraphFragDataSet(data_path).asDataLoader(batch_size=1024, n_samples=100, n_samples_ratio=0.2)

After that we can finally start the training loop:

In [8]:
from drugex.training.monitors import FileMonitor

monitor = FileMonitor("../jupyter/models/reinforced/graph/scaffold_rl", verbose=True) 
explorer.fit(train_loader, test_loader, monitor=monitor, epochs=3)

Fitting graph explorer:   0%|          | 0/3 [00:00<?, ?it/s]

Iterating over training batches:   0%|          | 0/1 [00:00<?, ?it/s]

Iterating over validation batches:   0%|          | 0/1 [00:00<?, ?it/s]

  reward = torch.Tensor(reward).to(src.device)


Iterating over training batches:   0%|          | 0/1 [00:00<?, ?it/s]

Iterating over validation batches:   0%|          | 0/1 [00:00<?, ?it/s]



Iterating over training batches:   0%|          | 0/1 [00:00<?, ?it/s]

Iterating over validation batches:   0%|          | 0/1 [00:00<?, ?it/s]



We look that all created molecules include either a pyrazine or a pyrazine and a thiophene.

In [9]:
import pandas as pd 

df_smiles = pd.read_csv('../jupyter//models/reinforced/graph/scaffold_rl_smiles.tsv', sep='\t')
smilesToGrid(df_smiles.Smiles.tolist())

MolGridWidget()

## *de novo* generation

Once we have the optimized model (not the case in tutorial as for speed is set to 3 instead of 1000), it can be used to sample *novel*  mocules.

In [10]:
reinforced = GraphModel(voc_trg=VocGraph.fromFile('../jupyter/data/sets/graph/scaffold_graph.tsv.vocab'), use_gpus=GPUS)
reinforced.loadStatesFromFile('../jupyter/models/reinforced/graph/scaffold_rl.pkg')

gen_loader = GraphFragDataSet('../jupyter/data/sets/graph/scaffold_graph.tsv').asDataLoader(batch_size=1024)

Novel molecules can be generated with the modified scores (to evaluate to ratio of desired molecules by model),

In [11]:
frags_, smiles, scores = reinforced.evaluate(gen_loader, repeat=100, method=environment)
scores['SMILES'], scores['Frags'] = smiles, frags_



In [12]:
scores.head()

Unnamed: 0,A1,A3,QED,SA,DESIRE,VALID,SMILES,Frags
0,5.88248,6.004125,0.52499,0.140957,0,1,CC1=NC(=O)ON(c2ccsc2)C(c2ccc(Br)cc2)C1=Cc1cnccn1,c1ccsc1.c1cnccn1
1,5.831961,6.26479,0.788553,0.27027,0,1,C=CCNS(=O)(=O)C1=C(C)N(C(=O)c2cnccn2)CC=C1CC,c1cnccn1
2,5.861358,6.163688,0.760486,0.405917,0,1,CC(C)(C)c1ccc(-c2csc(Nc3cnccn3)n2)s1,c1ccsc1.c1cnccn1
3,5.874601,6.168355,0.759365,0.607481,1,1,O=C(Nc1cccc2ccccc12)c1cnccn1,c1cnccn1
4,5.925557,6.170076,0.79665,0.327925,0,1,Cc1noc(C)c1-c1cnc2c(n1)CCN(Cc1ncc[nH]1)C2,c1cnccn1


or without applying the modifiers to better evaluate the predicted properties.

In [13]:
for scorer in environment.scorers:
    scorer.modifier=None
frags_, smiles, scores = reinforced.evaluate(gen_loader, repeat=100, method=environment)
scores['SMILES'], scores['Frags'] = smiles, frags_
scores = scores.drop('DESIRE', axis=1) # without the modifiers the desirability is meaningless
scores.head()



Unnamed: 0,A1,A3,QED,SA,VALID,SMILES,Frags
0,5.912962,6.575354,0.783331,2.743669,1,Cc1cc(C)n(-c2nc(-c3cncc(N4CCNCC4)n3)cs2)n1,c1cnccn1
1,5.863997,6.268625,0.925185,3.695431,1,CC1=C(NCc2cnccn2)C(=O)C(c2ccsc2)N(C)N1C,c1ccsc1.c1cnccn1
2,5.837393,6.237465,0.730419,3.053752,1,O=C(NC1CCCCC1NCc1ccc2ccccc2n1)c1cnccn1,c1cnccn1
3,6.004742,6.395255,0.346125,5.158187,1,NC(Cc1cccs1)C(=O)N1CC2CC(Nc3nc(C4CC4)nc4c3cnn4...,c1ccsc1.c1cnccn1
4,5.927445,6.096273,0.723122,2.469134,1,O=C(NCc1cccs1)c1cc(CNc2cnccn2)on1,c1ccsc1.c1cnccn1


# SMILES-based Transformer
## Data prepocessing

We use the same encoder as in [Preparing Data for the SMILES-Based Transformer](../datasets.ipynb) to create molecules from the fragments and encode fragment-molecule pairs, with a small modifications:
1. Instead of using a `fragmenter` we create dummy molecules from the fragments with `dummyMolsFromFragments` 
2. Set `splitter` to `None`, `min_len` to 2, `n_proc` and `chunk_size` to 1 

In [23]:
import os
from drugex.data.datasets import SmilesFragDataSet
from drugex.molecules.converters.dummy_molecules import dummyMolsFromFragments
from drugex.data.fragments import FragmentCorpusEncoder, SequenceFragmentEncoder
from drugex.data.corpus.vocabulary import VocSmiles

fragmenter = dummyMolsFromFragments()
splitter = None

encoder = FragmentCorpusEncoder(
    fragmenter=fragmenter, 
    encoder=SequenceFragmentEncoder(
        VocSmiles(min_len=2) 
    ),
    pairs_splitter=splitter, 
    n_proc=1,
    chunk_size=1
)

smiles_input_folder = "data/sets/smiles/"
if not os.path.exists(smiles_input_folder):
    os.makedirs(smiles_input_folder)
    
dataset = SmilesFragDataSet(f"{smiles_input_folder}/scaffold_smi.tsv", rewrite=True)

TypeError: __init__() missing 1 required positional argument: 'encode_frags'

In [None]:
encoder.apply(list(frags), encodingCollectors=[dataset])

Creating fragment-molecule pairs (batch processing): 100%|██████████| 2/2 [00:00<00:00, 36.00it/s]
Encoding fragment-molecule pairs. (batch processing): 100%|██████████| 2/2 [00:00<00:00, 39.80it/s]


## Reinforcement learning

!!! Does not work yet as we do not have pretrained/finetuned SMILES-based transformer models available !!!!

and the explorer composed of the agent, the prior and the enviroment.

In [None]:
from drugex.training.models.explorer import SmilesExplorer
from drugex.training.models.transform import GPT2Model
from drugex.data.corpus.vocabulary import VocSmiles

GPUS = gpus=(0,1)

vocabulary = VocSmiles.fromFile('../data/models/finetuned/smiles/ligand_finetuned.vocab')
finetuned = GPT2Model(voc_trg=vocabulary, use_gpus=GPUS)
finetuned.loadStatesFromFile('../data/models/finetuned/smiles/chembl_ligand.pkg')
pretrained = GPT2Model(voc_trg=vocabulary, use_gpus=GPUS)
pretrained.loadStatesFromFile('../jupyter/models/pretrained/smiles/chembl27/chembl27_graph.pkg')

explorer = SmilesExplorer(agent=pretrained, env=environment, mutate=finetuned, epsilon=0.1, use_gpus=GPUS)

FileNotFoundError: [Errno 2] No such file or directory: '../data/models/finetuned/smiles/ligand_finetuned.vocab'

But used only the selected scaffolds as input fragments for training and validation. As the initial set only contains two inputs, they are sampled 100 times to create the training set and 100*0.2=20 to create the test set.

In [None]:
from drugex.data.datasets import SmilesFragDataSet

data_path = 'data/sets/smiles/scaffold_smi.tsv'
train_loader = SmilesFragDataSet(data_path).asDataLoader(batch_size=1024, n_samples=100)
test_loader = SmilesFragDataSet(data_path).asDataLoader(batch_size=1024, n_samples=100, n_samples_ratio=0.2)

After that we can finally start the training loop:

In [None]:
from drugex.training.monitors import FileMonitor

monitor = FileMonitor("data/models/reinforced/smiles/scaffold_rl", verbose=True) 
explorer.fit(train_loader, test_loader, monitor=monitor, epochs=3)

Batch: 100%|██████████| 1/1 [00:04<00:00,  4.75s/it]
Batch: 100%|██████████| 1/1 [00:01<00:00,  1.61s/it]
Batch: 100%|██████████| 1/1 [00:01<00:00,  1.90s/it]
100%|██████████| 3/3 [00:14<00:00,  4.75s/it]
