> **How to run this notebook (command-line)?**
1. Install the `ReinventCommunity` environment:
`conda env create -f environment.yml`
2. Activate the environment:
`conda activate ReinventCommunity`
3. Execute `jupyter`:
`jupyter notebook`
4. Copy the link to a browser


# `REINVENT 2.0`: reinforcement learning exploitation demo
This demo illustrates how to set up a `REINVENT` run to optimize molecules that are active against _Aurora_ kinase. We use here predictive model as the main component to guide the generation of the molecules. we also include a `qed_score` component to stimulate the generation of more "drug-like" molecules.




## 1. Set up the paths
_Please update the following code block such that it reflects your system's installation and execute it._

In [1]:
# load dependencies
import os
import re
import json
import tempfile

# --------- change these path variables as required
reinvent_dir = os.path.expanduser("~/Desktop/Projects/Publications/2020/2020-04_REINVENT_2.0/Reinvent")
reinvent_env = os.path.expanduser("~/miniconda3/envs/reinvent_shared.v2.1")
output_dir = os.path.expanduser("~/Desktop/REINVENT_RL_demo")

# --------- do not change
# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

# if required, generate a folder to store the results
try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

## 2. Setting up the configuration 
In the cells below we will build a nested dictionary object that will be eventually converted to JSON file which in turn will be consumed by `REINVENT`. 
You can find this file in your `output_dir` location.

### A) Declare the run type

In [3]:
# initialize the dictionary
configuration = {
    "version": 2,                          # we are going to use REINVENT's newest release
    "run_type": "reinforcement_learning"   # other run types: "sampling", "validation",
                                           #                  "transfer_learning",
                                           #                  "scoring" and "create_model"
}

### B) Sort out the logging details
This includes `resultdir` path where the results will be produced.

Also: `REINVENT` can send custom log messages to a remote location. We have retained this capability in the code. if the `recipient` value differs from `"local"` `REINVENT` will attempt to POST the data to the specified `recipient`. 

In [16]:
# add block to specify whether to run locally or not and
# where to store the results and logging
configuration["logging"] = {
    "sender": "http://0.0.0.1",          # only relevant if "recipient" is set to "remote"
    "recipient": "local",                  # either to local logging or use a remote REST-interface
    "logging_frequency": 10,               # log every x-th steps
    "logging_path": os.path.join(output_dir, "progress.log"), # load this folder in tensorboard
    "resultdir": os.path.join(output_dir, "results"),         # will hold the compounds (SMILES) and summaries
    "job_name": "Reinforcement learning demo",                # set an arbitrary job name for identification
    "job_id": "demo"                       # only relevant if "recipient" is set to a specific REST endpoint
}

Create `"parameters"` field

In [17]:
# add the "parameters" block
configuration["parameters"] = {}

### C) Set Diversity Filter
During each step of Reinforcement Learning the compounds scored above `minscore` threshold are kept in memory. The scored smiles are written out to a file in the results folder `scaffold_memory.csv`. In the example here we are not using any filter by setting it to `"NoFilter"`. This will lead to exploitation of the chemical space in vicinity to the local optimum for the defined scoring function. The scoring function will likely reach a higher overall score sooner than the exploration scenario.

For exploratory behavior the diversity filters below should be set to any of the listed alternatives `"IdenticalTopologicalScaffold"`, `"IdenticalMurckoScaffold"` or `"ScaffoldSimilarity"`. This will boost the diversity of generated solutions. The maximum value of the scoring fuinction will be lower in exploration mode because the Agent is encouraged to search for diverse solutions rather than to only optimize the best that are being found so far. The number of generated compounds should be higher in comparison to the exploitation scenario.

In [51]:
# add a "diversity_filter"
configuration["parameters"]["diversity_filter"] =  {
    "name": "NoFilter",                    # other options are: "IdenticalTopologicalScaffold", 
                                           # "IdenticalMurckoScaffold" and "ScaffoldSimilarity"
                                           # -> use "NoFilter" to disable this feature
    "nbmax": 25,                           # the bin size; penalization will start once this is exceeded
    "minscore": 0.4,                       # the minimum total score to be considered for binning
    "minsimilarity": 0.4                   # the minimum similarity to be placed into the same bin
}

### D) Set Inception
* `smiles` provide here a list of smiles to be incepted 
* `memory_size` the number of smiles allowed in the inception memory
* `sample_size` the number of smiles that can be sampled at each reinforcement learning step from inception memory

In [52]:
# prepare the inception (we do not use it in this example, so "smiles" is an empty list)
configuration["parameters"]["inception"] = {
    "smiles": [],                          # fill in a list of SMILES here that can be used (or leave empty)
    "memory_size": 100,                    # sets how many molecules are to be remembered
    "sample_size": 10                      # how many are to be sampled each epoch from the memory
}

### E) Set the general Reinforcement Learning parameters
* `n_steps` is the amount of Reinforcement Learning steps to perform. Best start with 1000 steps and see if thats enough.
* `agent` is the generative model that undergoes transformation during the Reinforcement Learning run.

We reccomend keeping the other parameters to their default values.

In [53]:
# set all "reinforcement learning"-specific run parameters
configuration["parameters"]["reinforcement_learning"] = {
    "prior": os.path.join(ipynb_path, "models/augmented.prior"), # path to the pre-trained model
    "agent": os.path.join(ipynb_path, "models/augmented.prior"), # path to the pre-trained model
    "n_steps": 1000,                        # the number of epochs (steps) to be performed; often 1000
    "sigma": 128,                          # used to calculate the "augmented likelihood", see publication
    "learning_rate": 0.0001,               # sets how strongly the agent is influenced by each epoch
    "batch_size": 128,                     # specifies how many molecules are generated per epoch
    "reset": 0,                            # if not '0', the reset the agent if threshold reached to get
                                           # more diverse solutions
    "reset_score_cutoff": 0.5,             # if resetting is enabled, this is the threshold
    "margin_threshold": 50                 # specify the (positive) margin between agent and prior
}

### F) Define the scoring function
We will use a `custom_product` type. The component types included are:
* `predictive_property` which is the target activity to _Aurora_ kinase represented by the predictive `regression` model. Note that we set the weight of this component to be the highest.
* `qed_score` is the implementation of QED in RDKit. It biases the egenration of  molecules towars more "drug-like" space. Depending on the study case can have beneficial or detrimental effect.
* `custom_alerts` the `"smiles"` field  also can work with SMILES or SMARTS

Note: The model used in this example is a regression model


In [54]:
# prepare the scoring function definition and add at the end
scoring_function = {
    "name": "custom_product",              # this is our default one (alternative: "custom_sum")
    "parallel": False,                     # sets whether components are to be executed
                                           # in parallel; note, that python uses "False" / "True"
                                           # but the JSON "false" / "true"

    # the "parameters" list holds the individual components
    "parameters": [

    # add component: an activity model
    {
        "component_type": "predictive_property", # this is a scikit-learn model, returning
                                                 # activity values
        "name": "Aurora kinase",        # arbitrary name for the component
        "weight": 6,                       # the weight ("importance") of the component (default: 1)
        "model_path": os.path.join(ipynb_path, "models/Aurora_model.pkl"),   # absolute model path
        "smiles": [],                            # list of SMILES (not required for this component)
        "specific_parameters": {
            "transformation_type": "sigmoid",  # see description above
            "high": 9,                         # parameter for sigmoid transformation
            "low": 4,                          # parameter for sigmoid transformation
            "k": 0.25,                         # parameter for sigmoid transformation
            "scikit": "regression",            # model can be "regression" or "classification"
            "transformation": True,            # enable the transformation
            "descriptor_type": "ecfp_counts",  # sets the input descriptor for this model
            "size": 2048,                      # parameter of descriptor type
            "radius": 3,                       # parameter of descriptor type
            "use_counts": True,                # parameter of descriptor type
            "use_features": True               # parameter of descriptor type
        }
    },

    # add component: QED
    {
        "component_type": "qed_score", # this is the QED score as implemented in RDKit
        "name": "QED",        # arbitrary name for the component
        "weight": 2,            # the weight ("importance") of the component (default: 1)
        "model_path": None,
        "smiles":  None                         
    },

    # add component: enforce to NOT match a given substructure
    {
        "component_type": "custom_alerts",
        "name": "Custom alerts",               # arbitrary name for the component
        "weight": 1,                           # the weight of the component (default: 1)
        "model_path": None,                    # not required; note, this is "null" in JSON
        "smiles": [                            # specify the substructures (as list) to penalize
            "[*;r8]",
            "[*;r9]",
            "[*;r10]",
            "[*;r11]",
            "[*;r12]",
            "[*;r13]",
            "[*;r14]",
            "[*;r15]",
            "[*;r16]",
            "[*;r17]",
            "[#8][#8]",
            "[#6;+]",
            "[#16][#16]",
            "[#7;!n][S;!$(S(=O)=O)]",
            "[#7;!n][#7;!n]",
            "C#C",
            "C(=[O,S])[O,S]",
            "[#7;!n][C;!$(C(=[O,N])[N,O])][#16;!s]",
            "[#7;!n][C;!$(C(=[O,N])[N,O])][#7;!n]",
            "[#7;!n][C;!$(C(=[O,N])[N,O])][#8;!o]",
            "[#8;!o][C;!$(C(=[O,N])[N,O])][#16;!s]",
            "[#8;!o][C;!$(C(=[O,N])[N,O])][#8;!o]",
            "[#16;!s][C;!$(C(=[O,N])[N,O])][#16;!s]"
        ],
        "specific_parameters": None            # not required; note, this is "null" in JSON
    }]
}
configuration["parameters"]["scoring_function"] = scoring_function

#### NOTE:  Getting the selectivity score component to reach satisfactory levels is non-trivial and might take considerably higher number of steps

## 3. Write out the configuration

We now have successfully filled the dictionary and will write it out as a `JSON` file in the output directory. Please have a look at the file before proceeding in order to see how the paths have been inserted where required and the `dict` -> `JSON` translations (e.g. `True` to `true`) have taken place.

In [55]:
# write the configuration file to the disc
configuration_JSON_path = os.path.join(output_dir, "RL_config.json")
with open(configuration_JSON_path, 'w') as f:
    json.dump(configuration, f, indent=4, sort_keys=True)

## 4. Run `REINVENT`
Now it is time to execute `REINVENT` locally. Note, that depending on the number of epochs (steps) and the execution time of the scoring function components, this might take a while. 

The command-line execution looks like this:
```
# activate envionment
conda activate reinvent_shared.v2.1

# execute REINVENT
python <your_path>/input.py <config>.json
```

In [56]:
%%capture captured_err_stream --no-stderr

# execute REINVENT from the command-line
!python {reinvent_dir}/input.py {configuration_JSON_path}

In [31]:
# print the output to a file, just to have it for documentation
with open(os.path.join(output_dir, "run.err"), 'w') as file:
    file.write(captured_err_stream.stdout)

# prepare the output to be parsed
list_epochs = re.findall(r'INFO.*?local', captured_err_stream.stdout, re.DOTALL)
data = [epoch for idx, epoch in enumerate(list_epochs) if idx in [1, 75, 124]]
data = ["\n".join(element.splitlines()[:-1]) for element in data]

Below you see the print-out of the first, one from the middle and the last epoch, respectively. Note, that the fraction of valid `SMILES` is high right from the start (because we use a pre-trained prior). You can see the partial scores for each component for the first couple of compounds, but the most important information is the average score. You can clearly see how it increases over time.

In [59]:
for element in data:
    print(element)

INFO     
 Step 0   Fraction valid SMILES: 99.2   Score: 0.2306   Time elapsed: 0   Time left: 0.0
  Agent     Prior     Target     Score     SMILES
-19.51    -19.51     21.92      0.32      n1cnc(N2CCN(C)CC2)c2c(-c3ccccc3)c(-c3ccccc3)oc12
-53.61    -53.61    -53.61      0.00      c1c(-c2ccccc2)c(C)cc(CCC2COC3C(NC(C(C)NC)=O)(OC)COCC3(O)C2)c1
-32.90    -32.90     15.19      0.38      c1cc(C(Nc2cc(N=c3[nH]ccc(-c4ccnc(-c5cccnc5)c4)n3)ccc2)=O)ccc1NC(=O)C=C
-18.69    -18.69     28.08      0.37      OC1C(O)C(n2c3c(nc2)c(=Nc2ccccc2)[nH]cn3)OC1CO
-24.39    -24.39     16.43      0.32      O=C(c1c(C(=O)NCCC)nc[nH]1)N=c1c(C)ccc[nH]1
-34.69    -34.69     10.84      0.36      C1CC(CNC(=O)C(N)Cc2ccc(OC)cc2)CCN1C(=O)Cn1cccn1
-21.42    -21.42    -21.42      0.00      c1(=Cc2[nH]c(=O)[nH]c2O)c2nc(Nc3cccc(C#C)c3)cc(=NC3CC3)n2nc1
-23.29    -23.29     13.14      0.28      n1(-c2ccc([N+](=O)[O-])cc2)c(O)c(C2=c3ccccc3=NC2=O)c2ccccc12
-28.67    -28.67    -28.67      0.00      c1cccc2c1CC1(C2=O)OC(c2ccccc2)(c

## 5. Analyse the results
In order to analyze the run in a more intuitive way, we can use `tensorboard`:

```
# go to the root folder of the output
cd <your_path>/REINVENT_RL_demo

# make sure, you have activated the proper environment
conda activate reinvent_shared.v2.1

# start tensorboard
tensorboard --logdir progress.log
```

Then copy the link provided to a browser window, e.g. "http://workstation.url.com:6006/". The following figures are exmaple plots - remember, that there is always some randomness involved. In `tensorboard` you can monitor the individual scoring function components. 

The score for predicted Aurora Kinase activity.

![](img/exploit_aurora_kinase.png)

The average score over time.

![](img/exploit_avg_score.png)

It might also be informative to look at the results from the prior (dark blue), the agent (blue) and the augmented likelihood (purple) over time.

![](img/nll_plot.png)

And last but not least, there is a "Images" tab available that lets you browse through the compounds generated in an easy way. In the molecules, the substructure matches that were defined to be required are highlighted in red (if present). Also, the total scores are given per molecule.

![](img/molecules.png)

The results folder will hold four different files: the agent (pickled), the input JSON (just for reference purposes), the memory (highest scoring compounds in `CSV` format) and the scaffold memory (in `CSV` format).

In [13]:
!head -n 15 {output_dir}/results/memory.csv

,smiles,score,likelihood
65,C(CCCn1cc(C(C)(C)C)c2c(C(C)C)cc(C(C)C)cc2c1=O)C(=O)N=c1nc[nH][nH]1,0.3286117,-50.641468
70,C1C(N(CCC)CCC)Cc2cccc3[nH]c(=O)n(c32)C1,0.32649106,-18.146914
26,O1c2c(nc(OC)cc2)C(C(NCCCN(C)C)=O)(Cc2ccccc2)c2ccccc21,0.32437962,-35.405247
60,c1c(C(CNCCc2ccc(NS(=O)(c3ccc(-c4oc(Cc5c[nH]c(=N)s5)cc4)nc3)=O)cc2)O)c[nH]c(=N)c1,0.32314676,-38.32259
99,c1c2c(cc(Cl)c1Cl)C(CC(=O)c1cnn(C)c1)(O)C(=O)N2,0.31027606,-27.762121
11,c1c(O)c(C(Cc2ccc(Cl)cc2)=O)cc(O)c1Oc2c(O)cc(O)cc2CCC(O)c1cc(O)c(OC)cc1,0.30576745,-52.903526
32,c1(C(NC(Cc2ccccc2)C(C(NCCN2CCOCC2)=O)=O)=O)cc(C(=O)NS(Cc2ccccc2)(=O)=O)c(NCCC)s1,0.30178678,-43.933296
1,c1c(C(C)C)ccc(NC(c2cc3c(cc2)[nH]c2c(C(N)=O)ccc(O)c23)=O)c1,0.30052438,-31.108843
108,c1(C(C(F)(F)F)(F)F)cc(Cn2c3cccc(NC(c4n5ccc(OCCN6CCN(C)CC6)cc5nc4)=O)c3c(CC)n2)ccc1,0.29700187,-34.311478
118,c1ccc(C(COc2ccc3c(occ(Oc4ccccc4)c3=O)c2)(O)C(N2CCCCC2)C)cc1F,0.29602197,-45.389744
109,C1CN(CC(CNC(c2ccc3n(c(=O)cc(C)n3)c2)=O)O)CCC1Cc1ccccc1,0.29525602,-