> **How to run this notebook (command-line)?**
1. Install the `ReinventCommunity` environment:
`conda env create -f environment.yml`
2. Activate the environment:
`conda activate ReinventCommunity`
3. Execute `jupyter`:
`jupyter notebook`
4. Copy the link to a browser


# `REINVENT 3.0`: reinforcement learning with tanimoto similarity


This is a simple example of running `Reinvent` with only 1 score component.

NOTE: There is a detailed reasoning for each code block provided in the `Reinforcement Learning Demo` notebook.


## 1. Set up the paths
_Please update the following code block such that it reflects your system's installation and execute it._

In [1]:
# load dependencies
import os
import re
import json
import tempfile

# --------- change these path variables as required
reinvent_dir = os.path.expanduser("~/Desktop/reinventcli")
reinvent_env = os.path.expanduser("~/miniconda3/envs/reinvent.v3.0")
output_dir = os.path.expanduser("~/Desktop/REINVENT_RL_Tanimoto_Similarity_demo")

# --------- do not change
# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

# if required, generate a folder to store the results
try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

## 2. Setting up the configuration 
In the cells below we will build a nested dictionary object that will be eventually converted to JSON file which in turn will be consumed by `REINVENT`. 
You can find this file in your `output_dir` location.

### A) Declare the run type

In [2]:
# initialize the dictionary
configuration = {
    "version": 3,                          # we are going to use REINVENT's newest release
    "run_type": "reinforcement_learning"   # other run types: "sampling", "validation",
                                           #                  "transfer_learning",
                                           #                  "scoring" and "create_model"
}

### B) Sort out the logging details
This includes `result_folder` path where the results will be produced.

Also: `REINVENT` can send custom log messages to a remote location. We have retained this capability in the code. if the `recipient` value differs from `"local"` `REINVENT` will attempt to POST the data to the specified `recipient`. 

In [3]:
# add block to specify whether to run locally or not and
# where to store the results and logging
configuration["logging"] = {
    "sender": "http://0.0.0.1",          # only relevant if "recipient" is set to "remote"
    "recipient": "local",                  # either to local logging or use a remote REST-interface
    "logging_frequency": 10,               # log every x-th steps
    "logging_path": os.path.join(output_dir, "progress.log"), # load this folder in tensorboard
    "result_folder": os.path.join(output_dir, "results"),         # will hold the compounds (SMILES) and summaries
    "job_name": "Reinforcement learning demo",                # set an arbitrary job name for identification
    "job_id": "demo"                       # only relevant if "recipient" is set to a specific REST endpoint
}

Create `parameters` field:

In [4]:
# add the "parameters" block
configuration["parameters"] = {}

### C) Set Diversity Filter
During each step of Reinforcement Learning the compounds scored above `minscore` threshold are kept in memory. Those scored smiles are written out to a file in the results folder `scaffold_memory.csv`.

In [5]:
# add a "diversity_filter"
configuration["parameters"]["diversity_filter"] =  {
    "name": "IdenticalMurckoScaffold",     # other options are: "IdenticalTopologicalScaffold", 
                                           #                    "NoFilter" and "ScaffoldSimilarity"
                                           # -> use "NoFilter" to disable this feature
    "nbmax": 25,                           # the bin size; penalization will start once this is exceeded
    "minscore": 0.4,                       # the minimum total score to be considered for binning
    "minsimilarity": 0.4                   # the minimum similarity to be placed into the same bin
}

### D) Set Inception
* `smiles` provide here a list of smiles to be incepted 
* `memory_size` the number of smiles allowed in the inception memory
* `sample_size` the number of smiles that can be sampled at each reinforcement learning step from inception memory

In [6]:
# prepare the inception (we do not use it in this example, so "smiles" is an empty list)
configuration["parameters"]["inception"] = {
    "smiles": [],                          # fill in a list of SMILES here that can be used (or leave empty)
    "memory_size": 100,                    # sets how many molecules are to be remembered
    "sample_size": 10                      # how many are to be sampled each epoch from the memory
}

### E) Set the general Reinforcement Learning parameters
* `n_steps` is the amount of Reinforcement Learning steps to perform. Best start with 1000 steps and see if thats enough.
* `agent` is the generative model that undergoes transformation during the Reinforcement Learning run.

We reccomend keeping the other parameters to their default values.

In [7]:
# set all "reinforcement learning"-specific run parameters
configuration["parameters"]["reinforcement_learning"] = {
    "prior": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "agent": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "n_steps": 125,                        # the number of epochs (steps) to be performed; often 1000
    "sigma": 128,                          # used to calculate the "augmented likelihood", see publication
    "learning_rate": 0.0001,               # sets how strongly the agent is influenced by each epoch
    "batch_size": 128,                     # specifies how many molecules are generated per epoch
    "reset": 0,                            # if not '0', the reset the agent if threshold reached to get
                                           # more diverse solutions
    "reset_score_cutoff": 0.5,             # if resetting is enabled, this is the threshold
    "margin_threshold": 50                 # specify the (positive) margin between agent and prior
}

### F) Define the scoring function
We will use only a `tanimoto_smiliarity` component with only one SMILES string:

`"O=S(=O)(c3ccc(n1nc(cc1c2ccc(cc2)C)C(F)(F)F)cc3)N"`

However, using multiple smiles strings is also acceptable.

In [8]:
# prepare the scoring function definition and add at the end
scoring_function = {
    "name": "custom_product",                  # this is our default one (alternative: "custom_sum")
    "parallel": False,                         # sets whether components are to be executed
                                               # in parallel; note, that python uses "False" / "True"
                                               # but the JSON "false" / "true"

    # the "parameters" list holds the individual components
    "parameters": [

    # add component: use 
    {
        "component_type": "tanimoto_similarity", 
        "name": "Tanimoto similarity",         # arbitrary name for the component
        "weight": 1,                           # the weight of the component (default: 1)
        "specific_parameters": {
            "smiles": ["O=S(=O)(c3ccc(n1nc(cc1c2ccc(cc2)C)C(F)(F)F)cc3)N"], # a list of SMILES can be provided
        }
    }]
}
configuration["parameters"]["scoring_function"] = scoring_function

## 3. Write out the configuration

We now have successfully filled the dictionary and will write it out as a `JSON` file in the output directory. Please have a look at the file before proceeding in order to see how the paths have been inserted where required and the `dict` -> `JSON` translations (e.g. `True` to `true`) have taken place.

In [9]:
# write the configuration file to the disc
configuration_JSON_path = os.path.join(output_dir, "RL_config.json")
with open(configuration_JSON_path, 'w') as f:
    json.dump(configuration, f, indent=4, sort_keys=True)

## 4. Run `REINVENT`
Now it is time to execute `REINVENT` locally. Note, that depending on the number of epochs (steps) and the execution time of the scoring function components, this might take a while. As we have only specified a low number of epochs (125) and all components should be fairly quick, this should not take too long in our case though.

The command-line execution looks like this:
```
# activate envionment
conda activate reinvent.v3.0

# execute REINVENT
python <your_path>/input.py <config>.json
```

In [10]:
%%capture captured_err_stream --no-stderr

# execute REINVENT from the command-line
!{reinvent_env}/bin/python {reinvent_dir}/input.py {configuration_JSON_path}

In [11]:
# print the output to a file, just to have it for documentation
with open(os.path.join(output_dir, "run.err"), 'w') as file:
    file.write(captured_err_stream.stdout)

# prepare the output to be parsed
list_epochs = re.findall(r'INFO.*?local', captured_err_stream.stdout, re.DOTALL)
data = [epoch for idx, epoch in enumerate(list_epochs) if idx in [1, 75, 124]]
data = ["\n".join(element.splitlines()[:-1]) for element in data]

We have calculated a total of 125 epochs, let us quickly investigate how the agent fared. Below you see the print-out of the first, one from the middle and the last epoch, respectively. Note, that the fraction of valid `SMILES` is high right from the start (because we use a pre-trained prior). You can see the partial scores for each component for the first couple of compounds, but the most important information is the average score. You can clearly see how it increases over time.

In [12]:
for element in data:
    print(element)

INFO     
 Step 0   Fraction valid SMILES: 100.0   Score: 0.1904   Time elapsed: 0   Time left: 0.0
  Agent     Prior     Target     Score     SMILES
-23.79    -23.79     -4.99      0.15      c1c(C(N=c2sc(N3CCCCCC3)n[nH]2)=O)cccn1
-34.51    -34.51    -23.56      0.09      C1C2(C(=O)N(c3nnc(C(C)C)o3)CC1)CN(CC1CC1)CC2
-28.21    -28.21      0.80      0.23      C(CC(N=c1[nH]c(-c2ccccc2)c(-c2ccccc2)o1)=O)n1cncc1
-24.18    -24.18     -7.72      0.13      c1(=O)[nH]c(=S)n(C2OC(CO)C(O)C2O)c2cc(Cl)ccc12
-25.49    -25.49      8.58      0.27      c1(=O)c2c(nc(C)n1-c1cccc(C)c1)c(Br)cc(Br)c2
-34.47    -34.47     -6.20      0.22      N(Cc1cccc(N(C)C)c1)C(C1=CCN(S(=O)(c2ccc3c(cccc3)c2)=O)CC1)=O
-28.32    -28.32      4.65      0.26      c1ccc2c(n(Cc3c(C)cc(C)cc3C)c(C(=O)C(O)=O)c2)c1
-25.06    -25.06     18.46      0.34      CCc1c(C(NC(CO)c2ccccc2)=O)nn(-c2ccc(Cl)cc2Cl)c1-c1ccc(Cl)cc1
-34.74    -34.74     -5.76      0.23      c1(CN(Cc2ccc(CN(CC(C)(C)C(=O)O)Cc3cnccc3)cc2)C)ccccc1
-27.62    -27.62      7

## 5. Analyse the results
In order to analyze the run in a more intuitive way, we can use `tensorboard`:

```
# go to the root folder of the output
cd <your_path>/REINVENT_RL_demo

# make sure, you have activated the proper environment
conda activate reinvent.v3.0

# start tensorboard
tensorboard --logdir progress.log
```

Then copy the link provided to a browser window, e.g. "http://workstation.url.com:6006/". The following figures are exmaple plots - remember, that there is always some randomness involved. In `tensorboard` you can monitor the individual scoring function components. What you see is, that all of those depicted went up (and `Fraction_valid_SMILES` was high troughout). Not shown is the predictive model, which did not perform all that well, so you might want to consider a higher weight next time.

![](img/individual_components.png)

Also the total score increased over time.

![](img/total_score.png)

It might also be informative to look at the results from the prior (dark blue), the agent (blue) and the augmented likelihood (purple) over time.

![](img/likelihood.png)

And last but not least, there is a "Images" tab available that lets you browse through the compounds generated in an easy way. In the molecules, the substructure matches that were defined to be required are highlighted in red (if present). Also, the total scores are given per molecule.

![](img/molecules.png)

The results folder will hold four different files: the agent (pickled), the input JSON (just for reference purposes), the memory (highest scoring compounds in `CSV` format) and the scaffold memory (in `CSV` format).

In [13]:
!head -n 15 {output_dir}/results/memory.csv

,smiles,score,likelihood
61,c1(-c2n(-c3ccc(S(N)(=O)=O)cc3)nc(C(F)(F)F)c2)ccc([S+](C)[O-])cc1,0.86021507,-22.123093
62,c1(C(F)(F)F)ccc(-c2n(-c3ccc(S(=O)(N)=O)cc3)nc(C(F)(F)F)c2)cc1,0.84210527,-19.497047
5,c1(-c2n(-c3ccc(S(=O)(=O)N)cc3)nc(C(F)(F)F)c2)ccc(S(C)(=O)=O)cc1,0.84210527,-18.846498
80,c1c(-c2ccccc2)n(-c2ccc(S(=O)(N)=O)cc2)nc1C(F)(F)F,0.82417583,-17.817635
100,c1c(-c2ccc(Cl)cc2)n(-c2ccc(S(N)(=O)=O)cc2)nc1C(F)(F)F,0.8064516,-17.814945
80,c1cc(-c2n(-c3ccc(S(=O)(N)=O)cc3)nc(C(F)(F)F)c2)ccc1F,0.8064516,-17.603294
86,c1c(N(C)C)ccc(-c2n(-c3ccc(S(=O)(N)=O)cc3)nc(C(F)(F)F)c2)c1,0.78350514,-19.831734
14,NS(c1ccc(-n2c(-c3ccc(C)nc3)cc(C(F)(F)F)n2)cc1)(=O)=O,0.71428573,-22.547657
62,c1c(-c2cc(C(F)(F)F)nn2-c2ccc(S(=O)(=O)N)cc2)ccnc1,0.7113402,-20.04277
6,c1(-n2nc(C(F)(F)F)cc2-c2ccc(OCC)c(Cl)c2)ccc(S(=O)(N)=O)cc1,0.6952381,-22.668873
11,c1c(-c2n(-c3ccc(S(=O)(=O)N)cc3)nc(C(F)(F)F)c2)c(Cl)cc(Cl)c1,0.6930693,-21.483095
14,C(C)(C)(c1ccc(-c2n(-c3ccc(S(=O)(N(C)C)=O)cc3)nc(C(F)(F)F)c2)cc1