# **De novo design with REINVENT4**
By the end of this lesson, you will be able to:

*  Understand the fundamentals of reinforcement learning for molecular generation
*  Explain how REINVENT4 uses RL to optimize molecular properties  
*  Configure REINVENT4 for different molecular design tasks
*  Analyze the impact of sigma and diverstiy filter on learning
*  Apply sampling and scoring workflows to evaluate generated molecules
*  Implement transfer learning for specific drug discovery targets




First, we need to download REINVENT4 and install the required packages. After the installation, Colab needs to restart the session, so you must confirm it before continuing with the cell below. The process takes about 5 minutes.

In [None]:
!wget https://uni-muenster.sciebo.de/s/T6PHKsMkJfBxa3F/download
!unzip -q download -d /content/
%cd REINVENT4
%pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cpu
%pip install -r reinvent_requirements.txt

In [None]:
%pip install mols2grid
import os
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Descriptors
import mols2grid

To understand REINVENT's parameters, we will train two models with the same reward function but with different sigma values. The sigma adjusts the balance between the prior model and the model currently under training. In other words, it is the weight of the reward function. A larger weight means the reward function has more impact on training.  

To illustrate sigma more clearly, we have chosen a somewhat unusual task: rewarding large molecules with molecular weights between 700 and 1200 Da. To do this, we need to create two TOML configuration files. In each file, we define global parameters, learning strategy, and stage. The sigma is part of the learning strategy.  

Initially, we will use sigma = 128, as the authors suggested, and then we will try sigma = 16.


In [None]:
global_parameters = """
run_type = "staged_learning"
device = "cpu"
tb_logdir = "tb_logs_sigma128_unusualmw"
json_out_config = "_stage1.json"
"""

prior_filename = os.path.join("/content/REINVENT4/priors/reinvent.prior")
agent_filename = prior_filename

parameters = f"""
[parameters]

prior_file = "{prior_filename}"
agent_file = "{agent_filename}"
summary_csv_prefix = "stage1"

batch_size = 64

use_checkpoint = false
"""


learning_strategy = """
[learning_strategy]

type = "dap"
sigma = 128
rate = 0.0001
"""

stages = """
[[stage]]

max_score = 1.0
min_steps = 50
max_steps = 300

chkpt_file = 'stage1.chkpt'

[stage.scoring]
type = "geometric_mean"

[[stage.scoring.component]]
[stage.scoring.component.custom_alerts]

[[stage.scoring.component.custom_alerts.endpoint]]
name = "Alerts"

params.smarts = [
    "[*;r8]",
    "[*;r9]",
    "[*;r10]",
    "[*;r11]",
    "[*;r12]",
    "[*;r13]",
    "[*;r14]",
    "[*;r15]",
    "[*;r16]",
    "[*;r17]",
    "[#8][#8]",
    "[#6;+]",
    "[#16][#16]",
    "[#7;!n][S;!$(S(=O)=O)]",
    "[#7;!n][#7;!n]",
    "C#C",
    "C(=[O,S])[O,S]",
    "[#7;!n][C;!$(C(=[O,N])[N,O])][#16;!s]",
    "[#7;!n][C;!$(C(=[O,N])[N,O])][#7;!n]",
    "[#7;!n][C;!$(C(=[O,N])[N,O])][#8;!o]",
    "[#8;!o][C;!$(C(=[O,N])[N,O])][#16;!s]",
    "[#8;!o][C;!$(C(=[O,N])[N,O])][#8;!o]",
    "[#16;!s][C;!$(C(=[O,N])[N,O])][#16;!s]"
]
[[stage.scoring.component]]
[stage.scoring.component.MolecularWeight]

[[stage.scoring.component.MolecularWeight.endpoint]]
name = "Molecular weight"  # user chosen name for output
weight = 1  # weight to fine-tune the relevance of this component

transform.type = "double_sigmoid"
transform.high = 1200.0
transform.low = 700.0
transform.coef_div = 1200.0
transform.coef_si = 20.0
transform.coef_se = 20.0
"""



In [None]:
%cd /content/REINVENT4
config = global_parameters + parameters + learning_strategy + stages

toml_config_filename = "stage128.toml"

with open(toml_config_filename, "w") as tf:
    tf.write(config)
# to create the toml file for the other model, change the sigma with 16 in the learning strategy in the cell above and the file name in this cell and rerun the cells

Trainig would take a lot of time with the Colab CPUs, so you can find chkpt files and tensorboard logs of the models we trained before with same config file above. Therefore you can skip the next cell and analyse the results with tensorboard. If you want to train it yourself, just execute the next cell.

In [None]:
# this cell is optional (and will take a long time to execute)
%%time
!python -m reinvent.Reinvent -l stage1.log $toml_config_filename


To analyse models' learning performance, we will use tensorboard. First we need to call tensorboard to use it on Colab then we run tensorboard with log folders.

In [None]:
%load_ext tensorboard

Inspect the results and see which model learnt to generate molecules with molecular weight between 700-1200.

In [None]:
%tensorboard  --logdir /content/REINVENT4/tb_logs_sigma128_unusualmw --load_fast true # the model with sigma=128

In [None]:
%tensorboard  --logdir /content/REINVENT4/tb_logs_sigma16_unusualmw --load_fast true # the model with sigma=16

By looking at the first **First 30 Structures** in tensorboard, we can see that the `sigma128` model shows much larger molecules than the `sigma16` model. We can also see that the **Molecular weight (raw)** in tensorboard approaches a value of about 800 Da for the `sigma128` model, wheras for the `sigma16` model the molecular weight fluctuates around 270 Da.

So, even if we reward the same molecules (MW between 700-1200), the models learning abilities behave very different from each other. But what if we want to generate molecules with mw between 200-600, would be sigma=16 still insufficient? Let's see! We will use the log files of already trained model with these settings.

In [None]:
%tensorboard --logdir tb_logs_sigma16_normalmw --load_fast true

The average scores for each step are higher than the previous model. It is not perfect, but much better. Even though the reward function has a lower impact on the training, the generated molecules are getting higher scores because REINVENT was trained on the ChEMBL database, and most of the molecules in ChEMBL are already in this range.

So, the sigma parameter has a major impact on learning, especially when the task is generating molecules or exploring chemical space beyond the general ChEMBL distribution. In other words, if we want to break the bias coming from ChEMBL with a reward function, we might need a larger value of sigma.

Maybe rewarding large molecules was a bit extreme, because we usually don’t want molecules that large in most drug discovery projects. Instead, let’s reward molecules with a lower number of rotatable bonds. How would we adjust the TOML file to reward molecules with fewer rotatable bonds so they get higher scores?

In this case, we are going to transform the raw value with the reverse sigmoid function, because we want to reward molecules that have fewer rotatable bonds.

The transform types in REINVENT4 are given below:

**Sigmoid**: Smoothly transforms any real value into the interval (0, 1).

**Reverse_sigmoid**: Same as sigmoid, but direction reversed. Use when you want low input values mapped to high outputs and vice versa.

**Double_sigmoid**: Emphasizes values in the middle of a range while suppressing extremes. Use when you want to highlight "middle" inputs and not extremes.

**Right_step**: Sharp thresholding; all values below the threshold map to 0, above to 1.

**Left_step**: Inverse thresholding. Logic where you want the output high (1) for lower inputs.

**Step**: Only values in a specific interval are mapped to 1; otherwise 0.

**Value_mapping**: A mapping between discrete values. It assigns numeric values to categories.

Additionally, we will add a diversity filter to promote scaffold diversity during RL runs. REINVENT4 includes a diversity filter that uses a memory organized into "buckets" storing specific scaffolds and assigns zero scores to molecules once scaffold buckets reach capacity. This prevents mode collapse in molecular generation.

In [None]:
global_parameters = """
run_type = "staged_learning"
device = "cpu"
tb_logdir = "tb_logs_numrotbond"
json_out_config = "_stage1.json"
"""

prior_filename = os.path.join("/content/REINVENT4/priors/reinvent.prior")
agent_filename = prior_filename

parameters = f"""
[parameters]

prior_file = "{prior_filename}"
agent_file = "{agent_filename}"
summary_csv_prefix = "stage_numrotbond"

batch_size = 100

use_checkpoint = false
"""


learning_strategy = """
[learning_strategy]

type = "dap"
sigma = 128
rate = 0.0001

[diversity_filter]

type = "IdenticalMurckoScaffold" # IdenticalTopologicalScaffold,
                                 # ScaffoldSimilarity, PenalizeSameSmiles
bucket_size = 10                 # memory size in number of compounds
minscore = 0.4                   # only memorize if this threshold is exceeded
minsimilarity = 0.4              # minimum similarity for ScaffoldSimilarity
penalty_multiplier = 0.5         # penalty factor for PenalizeSameSmiles


"""

stage = """
[[stage]]

max_score = 1.0
min_steps = 50
max_steps = 300

chkpt_file = 'numrotbond_reversesig.chkpt'

[stage.scoring]
type = "geometric_mean"

######keep the alerts to avoid unwanted structures

[[stage.scoring.component]]
[stage.scoring.component.custom_alerts]

[[stage.scoring.component.custom_alerts.endpoint]]
name = "Alerts"

params.smarts = [
    "[*;r8]",
    "[*;r9]",
    "[*;r10]",
    "[*;r11]",
    "[*;r12]",
    "[*;r13]",
    "[*;r14]",
    "[*;r15]",
    "[*;r16]",
    "[*;r17]",
    "[#8][#8]",
    "[#6;+]",
    "[#16][#16]",
    "[#7;!n][S;!$(S(=O)=O)]",
    "[#7;!n][#7;!n]",
    "C#C",
    "C(=[O,S])[O,S]",
    "[#7;!n][C;!$(C(=[O,N])[N,O])][#16;!s]",
    "[#7;!n][C;!$(C(=[O,N])[N,O])][#7;!n]",
    "[#7;!n][C;!$(C(=[O,N])[N,O])][#8;!o]",
    "[#8;!o][C;!$(C(=[O,N])[N,O])][#16;!s]",
    "[#8;!o][C;!$(C(=[O,N])[N,O])][#8;!o]",
    "[#16;!s][C;!$(C(=[O,N])[N,O])][#16;!s]"
]


[[stage.scoring.component]]
[stage.scoring.component.NumRotBond]

[[stage.scoring.component.NumRotBond.endpoint]]
name = "ROTB"
weight = 1
transform.type = "reverse_sigmoid"
transform.high = 20
transform.low = 0
transform.k = 0.5

"""

In [None]:
config = global_parameters + parameters+ learning_strategy + stage
toml_config_filename = "reward_numrotb.toml"
with open(toml_config_filename, 'w') as tf:
    tf.write(config)


In [None]:
# actually training the model is optional, so you can skip this cell
!python -m reinvent.Reinvent -l numrotb.log $toml_config_filename

**Note:**
Don't forget to check the scaffold repetitions during RL on tensorboard.

In [None]:
%tensorboard --logdir /content/REINVENT4/tb_logs_numrotbond --load_fast true

What about how to use the model we trained and sample from it? We need a different `toml`-file for REINVENT. Here is an example:

In [None]:
global_sampling_parameters= """
run_type = "sampling"
device = "cpu"
json_out_config = "_sampling.json"
"""
parameters = """
[parameters]
model_file = "/content/REINVENT4/numrotbond_reversesig.chkpt" ##
output_file = 'sampling_numrotbond.csv'
num_smiles = 1200
unique_molecules = true
randomize_smiles = true
"""



In [None]:
samp_conf = global_sampling_parameters + parameters
with open("sampling_numrotbond.toml", "w") as tf:
    tf.write(samp_conf)

In [None]:
# running the sampling is much faster than training
!python -m reinvent.Reinvent -l sampling.log /content/REINVENT4/sampling_numrotbond.toml

In [None]:
# Here we can show the data that has been created by REINVENT
!cat /content/REINVENT4/sampling.csv | head

## Task: Scoring Molecules
Now that we have generated 1200 molecules, we can evaluate them using multiple scoring functions to understand:
- How well our optimization worked
- What other properties were affected
- Whether molecules maintain drug-likeness

### Available Scoring Functions in REINVENT4

REINVENT4 provides extensive molecular descriptors via RDKit integration:

#### **Drug-Likeness & ADMET Properties**
| Function | Description |
|----------|-------------|
| **QED** | Quantitative drug-likeness  
| **SlogP** | Lipophilicity (Crippen)  
| **TPSA** | Topological polar surface area
| **HBondAcceptors** | H-bond acceptor count
| **HBondDonors** | H-bond donor count

#### **Structural Descriptors**
| Function | Description | Use Case |
|----------|-------------|----------|
| **NumRotBond** | Rotatable bonds | Flexibility/rigidity |
| **GraphLength** | Longest path | Molecular size |
| **NumRings** | Total ring count | Complexity |
| **NumAromaticRings** | Aromatic rings | π-π interactions |
| **Csp3** | sp³ carbon fraction | 3D character |

#### **Stereochemistry & Complexity**
| Function | Desctription |
|----------|--------------|
|**NumAtomStereoCenters**| Chiral centers |
|**NumHeavyAtoms**| Non-hydrogen atoms |
|**NumHeteroAtoms** | Non-carbon atoms |

### Multi-Property Scoring Strategy

You can score the generated molecules using multiple descriptors above to get a comprehensive profile. We'll use another `toml`-file with our scoring-strategy.

In [None]:
global_scoring_parameters = """
run_type = "scoring"
json_out_config = "_scoring.json"
"""

parameters = """
[parameters]
smiles_file = "sampling_numrotbond.csv"   #the path of sampled molecules
output_csv = "scoring_numrotbond.csv"
"""
scoring = """
[scoring]
type = "geometric_mean"

[[scoring.component]]
[scoring.component.GraphLength]
[[scoring.component.GraphLength.endpoint]]
name = "GraphLength" #number of bonds in longest path
weight = 0.2
transform.type = "reverse_sigmoid"
transform.high = 40
transform.low = 20
transform.k = 0.5

######
#  Add scoring functions as many as you want. Define the weights for each score function and transform them properly.
######





"""

In [None]:
scoring_conf = global_scoring_parameters + parameters + scoring
with open("scoring_numrotbond.toml", "w") as tf:
    tf.write(scoring_conf)

In [None]:
!python -m reinvent.Reinvent -l scoring.log /content/REINVENT4/scoring_numrotbond.toml

Time to visualize the molecules with their scores.

In [None]:
# Load scored molecules

scored_mols = pd.read_csv("/content/REINVENT4/scoring_numrotbond.csv")

# Convert SMILES to RDKit molecule objects for visualization
scored_mols['mols'] = [Chem.MolFromSmiles(smi) for smi in scored_mols["SMILES"]]

# Sort by score (best molecules first)
scored_mols = scored_mols.sort_values('Score', ascending=False).reset_index(drop=True)

# Interactive molecular grid
mols2grid.display(
    scored_mols,
    mol_col='mols',
    dpi=300,
    subset=['Score', 'GraphLength (raw)'],  # You can also add the scoring components, if you want to analyse them too.
    transform={
        "Score": lambda x: f"{x:.2f}", # add the other scoring components to print transformed values instead of long, raw digits
        "GraphLength (raw)": lambda x: f"{x:.2f}", # example scoring component
    },
    n_items_per_page=12,
    size=(200, 200),
)


# Another task? Hoooraay!
Here is your last task for REINVENT4. We uploaded a dataset for M. tuberculosis TrxR from ChEMBL (Assay ID: CHEMBL2395745). In this dataset, there are 13 molecules with MW, LogP, HBA, HBD, PSA, ROTB, Aromatic rings, QED, and inhibition values (%).

Task:

- Apply RL to generate molecules with similar property values.

- Sample and score them.

- Select the top 10% scored molecules and use them for transfer learning.

There is no single correct answer for this task, so feel free to experiment. Instead of focusing on chemical structures, I suggest focusing on physicochemical properties, since we will use the structures in another task

Hint : To get %10 of highest scored molecules you can use the code below.


```
#Read the scored csv file
scored_mols = pd.read_csv(your_file_path)

# Convert SMILES to RDKit molecule objects for visualization
scored_mols['mols'] = [Chem.MolFromSmiles(smi) for smi in scored_mols["SMILES"]]

# Sort by score (best molecules first)
scored_mols = scored_mols.sort_values('Score', ascending=False).reset_index(drop=True)
top_10per_count = int(len(scored_mols)* 0.10)
top_10per_mols = scored_mols.head(top_10per_count)
top_10per_mols.to_csv("top_10per_mols.csv")


```
Here is an example toml file for transfer learning



```
run_type = "transfer_learning"
device = "cuda:0"  # set torch device e.g. "cpu"
tb_logdir = "tb_TL"  # name of the TensorBoard logging directory
json_out_config = "json_transfer_learning.json"  # write this TOML to JSON


[parameters]

num_epochs = 3  # number of steps to run
save_every_n_epochs = 3  # save checkpoint model file very N steps
batch_size = 50
num_refs = 100  # number of reference molecules randomly chosen for similarity
                # set this to zero for large datasets (>200 molecules)!
sample_batch_size = 100  # number of sampled molecules to compute sample loss
# Uncomment one of the comment blocks below.  Each generator needs a model
# file and possibly a SMILES file with seed structures.

## Reinvent
input_model_file = "priors/reinvent.prior"
smiles_file = "TL_reinvent_100.smi"  # read 1st column
output_model_file = "TL_reinvent.model"
validation_smiles_file = "TL_reinvent_100.smi"  
```



Hint : In transfer learning, you can use the provided dataset as a validation set.



In [None]:
dataset = pd.read_csv('/content/REINVENT4/mt_trxr.csv')
dataset['Inh_value(%)'] = dataset['Value']
dataset = dataset.drop(columns=['Value'])
dataset['mol'] = [Chem.MolFromSmiles(smi) for smi in dataset["Smiles"]]
dataset


In [None]:
mols2grid.display(dataset, mol_col='mol', subset=['Inh_value(%)'], size=(200, 200))