> **How to run this notebook (command-line)?**
1. Install the `ReinventCommunity` environment:
`conda env create -f environment.yml`
2. Activate the environment:
`conda activate ReinventCommunity`
3. Execute `jupyter`:
`jupyter notebook`
4. Copy the link to a browser


# `REINVENT 3.0`: reinforcement learning with DockStream (docking)


This is a simple example of running `Reinvent` with only 1 score component (`DockStream`). To execute this notebook, make sure you have cloned the `DockStream` repository from GitHub and installed the conda environment.

**NOTE: There is a detailed reasoning for each `REINVENT` code block provided in the `Reinforcement Learning Demo` notebook.**


## 1. Set up the paths
_Please update the following code block such that it reflects your system's installation and execute it._

In [1]:
# load dependencies
import os
import re
import json
import tempfile

# --------- change these path variables as required
reinvent_dir = os.path.expanduser("~/Desktop/Reinvent")
reinvent_env = os.path.expanduser("~/miniconda3/envs/reinvent.v3.0")

# DockStream variables
dockstream_dir = os.path.expanduser("~/Desktop/ProjectData/DockStream")
dockstream_env = os.path.expanduser("~/miniconda3/envs/DockStream/bin/python")
# generate the path to the DockStream entry points
docker_path = os.path.join(dockstream_dir, "docker.py")

output_dir = os.path.expanduser("~/Desktop/REINVENT_RL_DockStream_demo")

# --------- do not change
# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

# if required, generate the folder to store the results
try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

# Glide docking variables
grid_file_path = os.path.expanduser("~/Desktop/ReinventCommunity/notebooks/data/DockStream/1UYD_grid.zip")
output_ligands_docked_poses_path = os.path.expanduser("~/Desktop/REINVENT_RL_DockStream_demo/docked_poses")
output_ligands_docking_scores_path = os.path.expanduser("~/Desktop/REINVENT_RL_DockStream_demo/docking_scores")

try:
    os.mkdir(output_ligands_docked_poses_path)
except FileExistsError:
    pass

try:
    os.mkdir(output_ligands_docking_scores_path)
except FileExistsError:
    pass

docking_configuration_path = os.path.join(output_dir, "Glide_DockStream_Conf.json")

## 2. Set up the `DockStream` Configuration
_Please update the following code block such that it reflects your system's installation and execute it._

In this notebook, we will demonstrate how to use `DockStream` with `REINVENT`. `Glide` with `LigPrep` will be used as the molecular docking component. For more details regarding using `Glide` in `DockStream`, see the `demo_Glide` notebook in the `DockStreamCommunity` repository. There, all details and supported functionalities are presented. The `Glide` with `LigPrep` configuration used in this notebook is the simplest case.

In [2]:
# specify the embedding and docking JSON file as a dictionary and write it out
ed_dict = {
  "docking": {
    "header": {                                   # general settings
      "environment": {
      }
    },
    "ligand_preparation": {                       # the ligand preparation part, defines how to build the pool
      "embedding_pools": [
        {
          "pool_id": "Ligprep_pool",
          "type": "Ligprep",
          "parameters": {
            "prefix_execution": "module load schrodinger/2019-4",
            "parallelization": {
                "number_cores": 2
            },
            "use_epik": {
              "target_pH": 7.0,
              "pH_tolerance": 2.0
            },
            "force_field": "OPLS3e"
          },
          "input": {
            "standardize_smiles": False,
            "type": "console"                     # input type "console" when using DockStream with REINVENT
          }
        }
      ]
    },
    "docking_runs": [
        {
          "backend": "Glide",
          "run_id": "Glide_run",
           "input_pools": ["Ligprep_pool"],
          "parameters": {
              "prefix_execution": "module load schrodinger/2019-4", # will be executed before a program call
              "parallelization": {                                  # if present, the number of cores to be used
                                                                    # can be specified
            "number_cores": 2
          },
          "glide_flags": {                                  # all all command-line flags for Glide here 
            "-HOST": "localhost"
          },
          "glide_keywords": {                               # add all keywords for the "input.in" file here
                                                            # this is the minimum keywords that needs to be 
                                                            # specified and represents a simple `Glide` 
                                                            # docking configuration
                                                            
            "GRIDFILE": grid_file_path,
            "POSE_OUTTYPE": "ligandlib_sd",
            "PRECISION": "HTVS"
          }
        },
        "output": {
          "poses": { "poses_path": os.path.join(output_ligands_docked_poses_path, "docked_poses.sdf")},
          "scores": { "scores_path": os.path.join(output_ligands_docking_scores_path, "docking_scores.csv")}
        }
      }]}}

with open(docking_configuration_path, 'w') as f:
    json.dump(ed_dict, f, indent=2)

## 3. Set up the `REINVENT` configuration 
In the cells below we will build a nested dictionary object that will be eventually converted to JSON file which in turn will be consumed by `REINVENT`. 
You can find this file in your `output_dir` location.

### A) Declare the run type

In [3]:
# initialize the dictionary
configuration = {
    "version": 3,                          # we are going to use REINVENT's newest release
    "run_type": "reinforcement_learning"   # other run types: "sampling", "validation",
                                           #                  "transfer_learning",
                                           #                  "scoring" and "create_model"
}

### B) Sort out the logging details
This includes `result_folder` path where the results will be produced.

Also: `REINVENT` can send custom log messages to a remote location. We have retained this capability in the code. if the `recipient` value differs from `"local"` `REINVENT` will attempt to POST the data to the specified `recipient`. 

In [4]:
# add block to specify whether to run locally or not and
# where to store the results and logging
configuration["logging"] = {
    "sender": "http://0.0.0.1",            # only relevant if "recipient" is set to "remote"
    "recipient": "local",                  # either to local logging or use a remote REST-interface
    "logging_frequency": 1,                # log every x-th steps
    "logging_path": os.path.join(output_dir, "progress.log"), # load this folder in tensorboard
    "result_folder": os.path.join(output_dir, "results"),         # will hold the compounds (SMILES) and summaries
    "job_name": "Reinforcement learning DockStream demo",         # set an arbitrary job name for identification
    "job_id": "demo"                       # only relevant if "recipient" is set to a specific REST endpoint
}

Create `parameters` field:

In [5]:
# add the "parameters" block
configuration["parameters"] = {}

### C) Set Diversity Filter
During each step of Reinforcement Learning the compounds scored above `minscore` threshold are kept in memory. Those scored smiles are written out to a file in the results folder `scaffold_memory.csv`.

In [6]:
# add a "diversity_filter"
configuration["parameters"]["diversity_filter"] =  {
    "name": "IdenticalMurckoScaffold",     # other options are: "IdenticalTopologicalScaffold", 
                                           #                    "NoFilter" and "ScaffoldSimilarity"
                                           # -> use "NoFilter" to disable this feature
    "nbmax": 25,                           # the bin size; penalization will start once this is exceeded
    "minscore": 0.4,                       # the minimum total score to be considered for binning
    "minsimilarity": 0.4                   # the minimum similarity to be placed into the same bin
}

### D) Set Inception
* `smiles` provide here a list of smiles to be incepted 
* `memory_size` the number of smiles allowed in the inception memory
* `sample_size` the number of smiles that can be sampled at each reinforcement learning step from inception memory

In [7]:
# prepare the inception (we do not use it in this example, so "smiles" is an empty list)
configuration["parameters"]["inception"] = {
    "smiles": [],                          # fill in a list of SMILES here that can be used (or leave empty)
    "memory_size": 100,                    # sets how many molecules are to be remembered
    "sample_size": 10                      # how many are to be sampled each epoch from the memory
}

### E) Set the general Reinforcement Learning parameters
* `n_steps` is the amount of Reinforcement Learning steps to perform. Best start with 1000 steps and see if thats enough.
* `agent` is the generative model that undergoes transformation during the Reinforcement Learning run.

We reccomend keeping the other parameters to their default values.

In [8]:
# set all "reinforcement learning"-specific run parameters
configuration["parameters"]["reinforcement_learning"] = {
    "prior": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "agent": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "n_steps": 2,                          # the number of epochs (steps) to be performed; often 1000
                                           # (set to 2 in this notebook to decrease docking computation time -
                                           # it is not expected that the agent will appreciably learn to
                                           # generate compounds with good docking scores in only 2 epochs.
                                           # The purpose of this notebook is to illustrate how DockStream 
                                           # can be specified as a component to the `Scoring Function`)
    
    "sigma": 128,                          # used to calculate the "augmented likelihood", see publication
    "learning_rate": 0.0001,               # sets how strongly the agent is influenced by each epoch
    "batch_size": 32,                      # specifies how many molecules are generated per epoch, often 128
                                           # docking becomes more computationally demanding the greater the
                                           # batch size, as each compound must be docked. Depending on the
                                           # docking configuration, embedding ligands may generate different 
                                           # tautomers, ionization states, etc., which will increase the number
                                           # of compounds that need to be docked. Batch size is set to 32 in 
                                           # this notebook to decrease docking computation time)
    
    "reset": 0,                            # if not '0', the reset the agent if threshold reached to get
                                           # more diverse solutions
    "reset_score_cutoff": 0.5,             # if resetting is enabled, this is the threshold
    "margin_threshold": 50                 # specify the (positive) margin between agent and prior
}

### F) Define the scoring function
The scoring function will consist only of the `DockStream` component, in which `Glide` with `LigPrep` is used for molecular docking.

In [9]:
# prepare the scoring function definition and add at the end
scoring_function = {
    "name": "custom_product",                  # this is our default one (alternative: "custom_sum")
    "parallel": False,                         # sets whether components are to be executed
                                               # in parallel; note, that python uses "False" / "True"
                                               # but the JSON "false" / "true"

    # the "parameters" list holds the individual components
    "parameters": [

    # add component: use 
    {
    "component_type": "dockstream",                           # use DockStream as a Scoring Function component      
    "name": "Glide LigPrep Docking",                          # arbitrary name
    "weight": 1,
    "specific_parameters": {
        "transformation": {
            "transformation_type": "reverse_sigmoid",         # lower Glide scores are better - use reverse
                                                              # sigmoid transformation
            "low": -11,
            "high": -5,
            "k": 0.25
            },
        "configuration_path": docking_configuration_path,
        "docker_script_path": docker_path,
        "environment_path": dockstream_env
        }
    }]
}
configuration["parameters"]["scoring_function"] = scoring_function

## 4. Write out the `REINVENT` configuration

We now have successfully filled the dictionary and will write it out as a `JSON` file in the output directory. Please have a look at the file before proceeding in order to see how the paths have been inserted where required and the `dict` -> `JSON` translations (e.g. `True` to `true`) have taken place.

In [10]:
# write the configuration file to the disc
configuration_JSON_path = os.path.join(output_dir, "RL_DockStream_config.json")
with open(configuration_JSON_path, 'w') as f:
    json.dump(configuration, f, indent=4, sort_keys=True)

## 5. Run `REINVENT`
Now it is time to execute `REINVENT` locally. Note, that depending on the number of epochs (steps) and the execution time of the scoring function components, this might take a while. As we have only specified a low number of epochs (125) and all components should be fairly quick, this should not take too long in our case though.

The command-line execution looks like this:
```
# activate envionment
conda activate reinvent.v3.0

# execute REINVENT
python <your_path>/input.py <config>.json
```

In [11]:
%%capture captured_err_stream --no-stderr

# execute REINVENT from the command-line
!{reinvent_env}/bin/python {reinvent_dir}/input.py {configuration_JSON_path}

In [12]:
# print the output to a file, just to have it for documentation
with open(os.path.join(output_dir, "run.err"), 'w') as file:
    file.write(captured_err_stream.stdout)

# prepare the output to be parsed
list_epochs = re.findall(r'INFO.*?local', captured_err_stream.stdout, re.DOTALL)
data = [epoch for idx, epoch in enumerate(list_epochs)]
data = ["\n".join(element.splitlines()[:-1]) for element in data]

We have calculated a total of 2 epochs, let us quickly investigate how the agent fared in the first epoch. Below you see the print-out of the first epoch. Running `REINVENT` with `DockStream` for more epochs will show that the agent gradually improves over time, i.e, generates compounds that satisfy the docking component, thus generating compounds that dock well. Note, that the fraction of valid `SMILES` is high right from the start (because we use a pre-trained prior). You can see the partial scores for each component for the first couple of compounds, but the most important information is the average score. If run for more epochs, the average score will increase over time.

In [13]:
for element in data:
    print(element)

INFO     starting an RL run
INFO     
 Step 0   Fraction valid SMILES: 100.0   Score: 0.1702   Time elapsed: 229   Time left: 458.0
  Agent     Prior     Target     Score     SMILES
-27.59    -27.59      2.99      0.24      C1(O)(c2cccc(Cl)c2)CC2N(Cc3c(C)noc3C)C(CC2)C1
-23.84    -23.84      3.04      0.21      C1(C)N(S(c2cccc(Cl)c2C)(=O)=O)CCNC1=O
-24.26    -24.26    -21.44      0.02      c1(C2C(C(OCC)=O)=C(C)N=C3CC(c4cc(OC)c(OC)cc4)CC(=O)C23)cscc1
-37.08    -37.08    -30.63      0.05      C1CN(CCCN(C2CC3N(C(c4cnc(Cl)cc4)c4ccccc4Cl)C(CC3)C2)c2ccccc2)CCO1
-24.37    -24.37      8.83      0.26      c1c2c(ccc1OC)-c1c(sc(=NCC3CCN(C(C)=O)CC3)[nH]1)CCC2
-25.28    -25.28     -1.84      0.18      c1(C(N)=O)ccc(C(N(Cc2cccs2)C)=O)cc1
-24.27    -24.27     34.28      0.46      c1(F)cc(C(c2cc(C(=O)N(CC)CC)[nH]c2)=O)cc(F)c1
-27.77    -27.77     44.67      0.57      N1Cc2c(-c3ccc(C)cc3)cccc2C1CC=C
-31.68    -31.68    -17.89      0.11      c1cnc(-c2cc3ncn(C4(C)CCN(C(C5CC5)=O)CC4)c3cn2)cc1
-22.01    -22

## 6. Analyse the Results
In order to analyze the run in a more intuitive way, we can use `tensorboard`:

```
# go to the root folder of the output
cd <your_path>/REINVENT_RL_demo

# make sure, you have activated the proper environment
conda activate reinvent.v3.0

# start tensorboard
tensorboard --logdir progress.log
```

Then copy the link provided to a browser window, e.g. "http://workstation.url.com:6006/".

The results folder will hold four different files: the agent (pickled), the input JSON (just for reference purposes), the memory (highest scoring compounds in `CSV` format) and the scaffold memory (in `CSV` format).

In [14]:
!head -n 15 {output_dir}/results/memory.csv

,smiles,score,likelihood
19,C(=NNC(N)=S)c1cc(OC2CCCC2)c(OC)cc1,0.6942421,-22.620014
7,N1Cc2c(-c3ccc(C)cc3)cccc2C1CC=C,0.56595427,-27.769112
28,c1c(Cn2c(=O)[nH]c3c(C)nc(C)n32)cccc1,0.5590625,-22.66359
16,c1cc(Cn2c(CCCCN)cnc2)cc(Cl)c1,0.51199514,-27.736107
3,C(C(N)=O)C(=O)N1CCN(c2ccc3c(c2)CN(C(=O)C)CC3)C=C1,0.4736048,-41.357193
6,c1(F)cc(C(c2cc(C(=O)N(CC)CC)[nH]c2)=O)cc(F)c1,0.45742637,-24.27091
18,c1(C2=NN(C(=O)C)C(c3ccccc3)C2)ccc(OC)cc1,0.43780145,-17.408478
15,O1CCN(Cc2ccc(NC(=O)CC(C)=O)cc2)CC1,0.39261344,-24.114302
17,c1(C=C2C(=O)N(C)C(=Nc3ccccc3)S2)ccc(N(CC)CC)c(Cl)c1,0.3528337,-28.79707
18,Clc1ccccc1-c1[nH]c2c(n1)C(=O)CCC2,0.3498513,-22.323753
24,Fc1c(C(C)=O)c(F)c(C(=O)c2ccc(=N)[nH]c2)c(Cl)c1,0.33398008,-24.691898
9,n1ccncc1N1CCN(C(c2cccc(OC)c2)=O)CC1,0.3304087,-22.013836
26,N(CC(=O)c1c(C(=O)c2ccccc2)ccc(C)c1)c1cc(Cc2ccccc2)ccc1,0.31439623,-43.123184
23,O=S1(=O)CCC(n2c(=O)c3sc(-c4cccnc4)cc3[nH]c2=O)C1,0.29765692,-25.620085
