> **How to run this notebook (command-line)?**
1. Install the `ReinventCommunity` environment:
`conda env create -f environment.yml`
2. Activate the environment:
`conda activate ReinventCommunity`
3. Execute `jupyter`:
`jupyter notebook`
4. Copy the link to a browser


# `REINVENT 3.2`: reinforcement learning demo
The aim of this notebook is to illustrate how `REINVENT` can be used for the _de novo_ design of molecules in a *Reinforcement Learning* (RL) setup. The general idea is to start with a (somewhat) focussed prior and to define a (complex) scoring function from a set of building blocks (*components*) to decide which molecules are "good" or "bad". Each molecule generated by the _agent_ will thus receive a score from 0 to 1 and this feedback is used to train the agent over time, i.e. to focus it on chemical regions of interest for a given project.

![](img/REINVENT_RL_mode.png)

One of `REINVENT`'s most powerful features is the flexible way in which the scoring function can be defined. By adding together a multitude of different components, the tool is able to generate molecules that are e.g. predicted to be active against a given target, soluble, non-toxic and below a certain molecular weight - all at the same time. More complex components also allow to enrich in molecules that are e.g. active against a given receptor but not against a set of off-targets, i.e. they are pushed for selectivity.

In the following sections, we will show how to set up a fairly complex `REINVENT` run that optimizes molecules to be selective against _Aurora_ kinase (we will use one off-target). In addition, we will set further constraints e.g. the number of hydrogen bond donors, all of which will be optimized in parallel.


#### Steps to sucessfully apply `REINVENT`:
1. Think about the goals and prepare the input
  * Often, we will use predictive models of some sort, e.g. activity models. For those, we calculate fingerprints (e.g. `ECFP` descriptors) and map these input features to some response variable, either as a (binary) classification or regression. For this notebook, we supply two models (Random Forest regressors) for Aurora and B-RAF, which can be found in the `data` subfolder of the `REINVENT` repository. A more detailed description is given below, but usually these models are simple `scikit-learn` models with standard algorithms (Random Forest, SVR, ...).
  * Sometimes, we want to generate molecules that match a given (sub-)structure, so we need a way to enforce a match. In the opposite scenario, we want to avoid certain matches (e.g. because we want to move out of IP-crowded chemical space). We can define `SMARTS` to achieve both, see details below.
  * Some (physico-chemical) properties are very important and should be incorporated during the RL run. We make use of `RDkit`'s implementation of certain descriptors to e.g. ensure that a maximum number of rings is not exceeded.
  * Another important factor is the use of an appropriate _prior_, that embodies much of chemistry learned. We supply one in the `data` subfolder (`augmented.prior`) and will also use it to initialize the agent in this example.
2. Chose a `JSON` template (or build from scratch) and update:
  * The paths to the prior and initial agent.
  * The output paths to the _progress.log_ and _results_ folders (logging).
  * The scoring function components (add or delete, update paths and parameters, set weights, ...).
3. Run `REINVENT`
4. Analysis:
  * Inspect the output of the run with `tensorboard`
  * Look at the molecules generated in the result directory
  * Apply *post-processing* where appropriate

To proceed, please update the following code block such that it reflects your system's installation and execute it.

In [1]:
# load dependencies
import os
import re
import json
import tempfile

# --------- change these path variables as required

print("The current working directory is", os.getcwd())
print(os.path.exists("../../Reinvent"))
print(os.path.exists("../../../miniconda3/envs/ReinventCommunity"))
print(os.path.exists("./REINVENT_complete_use_case_DRD2_demo"))

# --------- change these path variables as required

reinvent_dir = os.path.expanduser("../../Reinvent")
reinvent_env = os.path.expanduser("../../../miniconda3/envs/ReinventCommunity")
output_dir = os.path.expanduser("./REINVENT_complete_use_case_DRD2_demo")

# --------- do not change
# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

# if required, generate a folder to store the results
try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

The current working directory is /home/springnuance/reinvent-hitl/Reinvent-Community-Binh/notebooks
True
True
True


## Setting up the configuration
`REINVENT` has an entry point that loads a specified `JSON` file on startup. `JSON` is a low-level data format that allows to specify a fairly large number of parameters in a cascading fashion very quickly. The parameters are structured into *blocks* which can in turn contain blocks or simple values, such as *True* or *False*, strings and numbers. In this tutorial, we will go through the different blocks step-by-step, explaining their purpose and potential values for given parameters. Note, that while we will write out the configuration as a `JSON` file in the end, in `python` we handle the same information as a simple `dict`.

In [2]:
# initialize the dictionary
configuration = {
    "version": 3,                          # we are going to use REINVENT's newest release
    "run_type": "reinforcement_learning",  # other run types: "sampling", "validation",
                                           #                  "transfer_learning",
                                           #                  "scoring" and "create_model",
    "model_type": "default"
}

In order to analyse the results of any run afterwards, it is paramount to *log* intermediare results, e.g. to judge whether the agent has been focussed enough (or too much), whether the learning is going well and so on. On top of this, we also need to make sure the final result (compounds) is deposited appropriately. Thus, we will log these data to two folders and inspect it afterwards with `tensorboard` which is already installed in the environment.

In [3]:
# add block to specify whether to run locally or not and
# where to store the results and logging
configuration["logging"] = {
    "sender": "http://127.0.0.1",          # only relevant if "recipient" is set to "remote"
    "recipient": "local",                  # either to local logging or use a remote REST-interface
    "logging_frequency": 10,               # log every x-th steps
    "logging_path": os.path.join(output_dir, "progress.log"), # load this folder in tensorboard
    "result_folder": os.path.join(output_dir, "results"),         # will hold the compounds (SMILES) and summaries
    "job_name": "Reinforcement learning demo",                # set an arbitrary job name for identification
    "job_id": "demo"                       # only relevant if "recipient" is set to "remote"
}

The aforementioned blocks are required for any kind of run, but there remains a lot to be specified in terms of reinforcement learning-specific parameters. All of these are on a child-level of `parameters`. Before we specify the scoring function components, let us address all other blocks that are possible.

* `diversity_filter`: If the agent becomes very focussed, it tends to produce the similar molecules over and over (because they return high scores). To enrich different scaffolds, we can activate the diversity filter, which will "bin" the molecules into groups (scaffolds). Once a given bin is full, all other molecules with the same scaffold will be penalized score-wise, effectively "pushing" the agent out of a local minimum in the score landscape thus enriching diversity.
* `inception`: Sometimes agents "linger around" for a while before they (by chance) happen to pick up a trace and generate interesting compounds. To speed up this very early exploration, we can *incept* a couple of promising molecules as list of `SMILES`.
* `reinforcement_learning`: This block holds all the parameters which are specific for the reinforcement running mode (see detailed description in the code). One important question is, which prior and initial agent to use: these are just models that have been trained on a large compound library to ensure they have learned "basic chemical rules". While the prior does not change over the course of the training (its feedback will be used to keep the agent in the realm of good chemistry), the agent is updated each *epoch* (step). In this case we have used *augmented* `SMILES` representation of `Chembl` data for both the prior and to initialize the agent.

In [4]:
# add the "parameters" block
configuration["parameters"] = {}

# add a "diversity_filter"
configuration["parameters"]["diversity_filter"] =  {
    "name": "IdenticalMurckoScaffold",     # other options are: "IdenticalTopologicalScaffold", 
                                           #                    "NoFilter" and "ScaffoldSimilarity"
                                           # -> use "NoFilter" to disable this feature
    "nbmax": 25,                           # the bin size; penalization will start once this is exceeded
    "minscore": 0.4,                       # the minimum total score to be considered for binning
    "minsimilarity": 0.4                   # the minimum similarity to be placed into the same bin
}

# prepare the inception (we do not use it in this example, so "smiles" is an empty list)
configuration["parameters"]["inception"] = {
    "smiles": [],                          # fill in a list of SMILES here that can be used (or leave empty)
    "memory_size": 100,                    # sets how many molecules are to be remembered
    "sample_size": 10                      # how many are to be sampled each epoch from the memory
}

# set all "reinforcement learning"-specific run parameters
configuration["parameters"]["reinforcement_learning"] = {
    "prior": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "agent": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "n_steps": 125,                        # the number of epochs (steps) to be performed; often 1000
    "sigma": 128,                          # used to calculate the "augmented likelihood", see publication
    "learning_rate": 0.0001,               # sets how strongly the agent is influenced by each epoch
    "batch_size": 128,                     # specifies how many molecules are generated per epoch
    "margin_threshold": 50                 # specify the (positive) margin between agent and prior
}

### Define the scoring function
Now all that remains to be done is the most tricky step: define a scoring function that allows the agent to identify promising suggestions and discard molecules that are of no interest to the project. It is not necessarily better to build a very complex scoring function (on the contrary it can make it hard for the agent to find appropriate solutions). Always bear in mind that there is a post-processing step at the end, in which you will be able to discard molecules either by eye-inspection or by applying further (probably more expensive) methods you have not used in the reinforcement learning loop. The following example will include fair share of the available scoring function components (added one-by-one), but this is for illustrative purposes only.

##### Score transformation
Before we start, there is one more topic requiring some explanation: *score transformations*. Remember that every component returns a value between '0' and '1' (higher values meaning "better") and all scores together are combined into a *total score* for a given compound (also between '0' and '1'). This is key, as the agent will try to generate molecules with ever increasing scores over the course of training, i.e. the numerical value "guides" the agent. However, some components might not naturally return values between '0' or '1' or they might represent the opposite, i.e. '0' being "good" rather than "bad". This is component-specific and to make it as flexible as possible, we include the specification of a score transformation for each component. We support multiple different functions (`sigmoid`, `reverse_sigmoid` and so on) which have different parameters to allow tweaking them to the desired result. For more details and to see how different parameter values affect the result, we refer to the dedicated notebook which is also part of this repository.

In [5]:
# prepare the scoring function definition and add at the end
scoring_function = {
    "name": "custom_product",              # this is our default one (alternative: "custom_sum")
    "parallel": False,                     # sets whether components are to be executed
                                           # in parallel; note, that python uses "False" / "True"
                                           # but the JSON "false" / "true"

    # the "parameters" list holds the individual components
    "parameters": [

    # add component: an activity model
    {
        "component_type": "predictive_property", # this is a scikit-learn model, returning
                                                 # activity values
        "name": "Regression model",              # arbitrary name for the component
        "weight": 2,                             # the weight ("importance") of the component (default: 1)
        "specific_parameters": {
            "model_path": os.path.join(ipynb_path, "models/Aurora_model.pkl"),   # absolute model path
            "scikit": "regression",                # model can be "regression" or "classification"
            "descriptor_type": "ecfp_counts",      # sets the input descriptor for this model
            "size": 2048,                          # parameter of descriptor type
            "radius": 3,                           # parameter of descriptor type
            "use_counts": True,                    # parameter of descriptor type
            "use_features": True,                  # parameter of descriptor type
            "transformation": {
                "transformation_type": "sigmoid",  # see description above
                "high": 9,                         # parameter for sigmoid transformation
                "low": 4,                          # parameter for sigmoid transformation
                "k": 0.25                          # parameter for sigmoid transformation
            }
        }
    },

    # add component: enforce the match to a given substructure
    {
        "component_type": "matching_substructure", 
        "name": "Matching substructure",       # arbitrary name for the component
        "weight": 1,                           # the weight of the component (default: 1)
        "specific_parameters": {
            "smiles": ["c1ccccc1CC"]           # a match with this substructure is required
        }
    },

    # add component: enforce to NOT match a given substructure
    {
        "component_type": "custom_alerts",
        "name": "Custom alerts",               # arbitrary name for the component
        "weight": 1,                           # the weight of the component (default: 1)
        "specific_parameters": {
            "smiles": [                            # specify the substructures (as list) to penalize
                "[*;r8]",
                "[*;r9]",
                "[*;r10]",
                "[*;r11]",
                "[*;r12]",
                "[*;r13]",
                "[*;r14]",
                "[*;r15]",
                "[*;r16]",
                "[*;r17]",
                "[#8][#8]",
                "[#6;+]",
                "[#16][#16]",
                "[#7;!n][S;!$(S(=O)=O)]",
                "[#7;!n][#7;!n]",
                "C#C",
                "C(=[O,S])[O,S]",
                "[#7;!n][C;!$(C(=[O,N])[N,O])][#16;!s]",
                "[#7;!n][C;!$(C(=[O,N])[N,O])][#7;!n]",
                "[#7;!n][C;!$(C(=[O,N])[N,O])][#8;!o]",
                "[#8;!o][C;!$(C(=[O,N])[N,O])][#16;!s]",
                "[#8;!o][C;!$(C(=[O,N])[N,O])][#8;!o]",
                "[#16;!s][C;!$(C(=[O,N])[N,O])][#16;!s]"
            ]
        }
    },

    # add component: calculate the QED drug-likeness score (using RDkit)
    {
        "component_type": "qed_score",
        "name": "QED Score",                   # arbitrary name for the component
        "weight": 1,                           # the weight of the component (default: 1)
    }]
}
configuration["parameters"]["scoring_function"] = scoring_function

We now have successfully filled the dictionary and will write it out as a `JSON` file in the output directory. Please have a look at the file before proceeding in order to see how the paths have been inserted where required and the `dict` -> `JSON` translations (e.g. `True` to `true`) have taken place.

In [6]:
# write the configuration file to the disc
configuration_JSON_path = os.path.join(output_dir, "RL_config.json")
with open(configuration_JSON_path, 'w') as f:
    json.dump(configuration, f, indent=4, sort_keys=True)

## Run `REINVENT`
Now it is time to execute `REINVENT` locally. Note, that depending on the number of epochs (steps) and the execution time of the scoring function components, this might take a while. As we have only specified a low number of epochs (125) and all components should be fairly quick, this should not take too long in our case though.

The command-line execution looks like this:
```
# activate envionment
conda activate reinvent.v3.2

# execute REINVENT
python <your_path>/input.py <config>.json
```

In [15]:
print(os.path.exists("/home/springnuance/reinvent-hitl/Reinvent-Community-Binh/notebooks/models/Aurora_model.pkl"))

True


In [14]:
# %%capture captured_err_stream --no-stderr

# execute REINVENT from the command-line
!{reinvent_env}/bin/python {reinvent_dir}/input.py {configuration_JSON_path}

Traceback (most recent call last):
  File "/home/springnuance/reinvent-hitl/Reinvent/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 31, in _load_model
    activity_model = self._load_container(parameters)
  File "/home/springnuance/reinvent-hitl/Reinvent/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 40, in _load_container
    scikit_model = pickle.load(f)
ModuleNotFoundError: No module named 'sklearn.ensemble.forest'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../../Reinvent/input.py", line 40, in <module>
    manager.run()
  File "/home/springnuance/reinvent-hitl/Reinvent/running_modes/manager.py", line 17, in run
    runner = RunningMode(self.run_configuration)
  File "/home/springnuance/reinvent-hitl/Reinvent/running_modes/constructors/running_mode.py", line 18, in __new__
    return ReinforcementLearningModeConstructor(configu

In [13]:
# print the output to a file, just to have it for documentation
with open(os.path.join(output_dir, "run.err"), 'w') as file:
    file.write(captured_err_stream.stdout)

# prepare the output to be parsed
list_epochs = re.findall(r'INFO.*?local', captured_err_stream.stdout, re.DOTALL)
data = [epoch for idx, epoch in enumerate(list_epochs) if idx in [1, 75, 124]]
data = ["\n".join(element.splitlines()[:-1]) for element in data]

We have calculated a total of 125 epochs, let us quickly investigate how the agent fared. Below you see the print-out of the first, one from the middle and the last epoch, respectively. Note, that the fraction of valid `SMILES` is high right from the start (because we use a pre-trained prior). You can see the partial scores for each component for the first couple of compounds, but the most important information is the average score. You can clearly see how it increases over time.

In [9]:
for element in data:
    print(element)

## Analyse the results
In order to analyze the run in a more intuitive way, we can use `tensorboard`:

```
# go to the root folder of the output
cd <your_path>/REINVENT_RL_demo

# make sure, you have activated the proper environment
conda activate reinvent.v3.2

# start tensorboard
tensorboard --logdir progress.log
```

Then copy the link provided to a browser window, e.g. "http://workstation.url.com:6006/". The following figures are exmaple plots - remember, that there is always some randomness involved. In `tensorboard` you can monitor the individual scoring function components. What you see is, that all of those depicted went up (and `Fraction_valid_SMILES` was high troughout). Not shown is the predictive model, which did not perform all that well, so you might want to consider a higher weight next time.

![](img/individual_components.png)

Also the total score increased over time.

![](img/total_score.png)

It might also be informative to look at the results from the prior (dark blue), the agent (blue) and the augmented likelihood (purple) over time.

![](img/likelihood.png)

And last but not least, there is a "Images" tab available that lets you browse through the compounds generated in an easy way. In the molecules, the substructure matches that were defined to be required are highlighted in red (if present). Also, the total scores are given per molecule.

![](img/molecules.png)

The results folder will hold four different files: the agent (pickled), the input JSON (just for reference purposes), the memory (highest scoring compounds in `CSV` format) and the scaffold memory (in `CSV` format).

In [11]:
!head -n 15 {output_dir}/results/memory.csv

head: cannot open './REINVENT_complete_use_case_DRD2_demo/results/memory.csv' for reading: No such file or directory
