> **How to run this notebook (command-line)?**
1. Install the `ReinventCommunity` environment:
`conda env create -f environment.yml`
2. Activate the environment:
`conda activate ReinventCommunity`
3. Execute `jupyter`:
`jupyter notebook`
4. Copy the link to a browser


# `REINVENT 3.2`: reinforcement learning with `Icolos` (docking)


This is a simple example of running `Reinvent` with only 1 score component (`Icolos`). To execute this notebook, make sure you have cloned the [Icolos](https://github.com/MolecularAI/Icolos) repository from GitHub and installed it into a `conda` environment (see instructions in the `README.md`).

There is another notebook illustrating integration of docking using `DockStream`, which can also be used for this task. However, more complex features such as ensemble docking are only supported by `Icolos`.

**NOTE: There is a detailed reasoning for each `REINVENT` code block provided in the `Reinforcement Learning Demo` notebook.**

**Required software:**
1. You need to have [AutoDock Vina](https://github.com/ccsb-scripps/AutoDock-Vina) (version 1.2.0 or later) installed.
2. You need to have [Icolos](https://github.com/MolecularAI/Icolos) (version 1.8.0 or later) installed.

## 1. Set up the paths
_Please update the following code block such that it reflects your system's installation and execute it._

In [1]:
# load dependencies
import os
import re
import json
import tempfile

# --------- change these path variables as required
reinvent_dir = os.path.expanduser("~/Desktop/Projects/reinvent/Reinvent")
reinvent_env = os.path.expanduser("~/miniconda3/envs/reinvent.v3.2")

# Icolos entry point, ADV receptor file and temporary folder
icolos_executor = os.path.expanduser("~/miniconda3/envs/icolosprod/bin/icolos")
tmp_dir = os.path.expanduser("~/Desktop/REINVENT_RL_Icolos_demo")

# --------- do not change
# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

receptor_file_path = os.path.join(ipynb_path, "data/1UYD_fixed.pdbqt")

# if required, generate the folder to store the results
try:
    os.mkdir(tmp_dir)
except FileExistsError:
    pass

output_dir = os.path.join(tmp_dir, "output")

try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

docking_configuration_path = os.path.join(tmp_dir, "ADV_Icolos_Conf.json")

## 2. Set up the `Icolos` Configuration
_Please update the following code block such that it reflects your system's installation and execute it._

In this notebook, we will demonstrate how to use `Icolos` with `REINVENT`. `AutoDock Vina 1.2.0` with `RDkit` embedding will be used as the molecular docking component.

### Details
* An example receptor file (in `PDBQT` format, as expected by `AutoDock Vina`) is provided in the _data_ subfolder of this repository. Note, that you need to specify the location and the dimensions of the search space according to your protein target. `Icolos` has a _target preparation_ step implemented that can assist you with that.
* Receptors / Grids can be named by using parameter `grid_id`. This allows to track which receptor led to which score afterwards, e.g. in case ensemble docking was used.
* `REINVENT` hands over the path to a input `JSON` file (so the input block in the `Icolos` workflow must be using `{input_json_path}`, a _global variable_ that is set automatically, as input source). Likewise, `Icolos` will need to write the result(s) to the file specified by `{output_json_path}`. From the point of view of `Icolos`, there is nothing special about these global variables and they are handed over in the same manner as any other. For details on how to use and define _global variables_ in `Icolos`, please have a look at the respective material.
* Also, `REINVENT` sets a thrid global variable `step_id`, which contains the current step / epoch number. This can be used to write-out consecutive files, so that you can trace the batch of compounds a specific epoch has generated.
* `Icolos` supports an arbitrary number of write-outs (actually for each step, if so desired). For usage in `REINVENT` the write-out of the result `JSON` is mandatory, but in this example we also store the docking scores as a `CSV` and the poses in an `SDF` file.

In [2]:
# specify the embedding and docking JSON file as a dictionary and write it out
adv_dict = {
    "workflow": {
        "header": {
            "workflow_id": "AutoDock Vina docking",
            "description": "Runs docking using AutoDock Vina and a predefined receptor file in PDBQT format.",
            "environment": {
                "export": [
                ]
            },
            "global_variables": {
            }
        },
        "steps": [{
                "step_id": "rdkit_embedding",
                "type": "embedding",
                "settings": {
                    "arguments": {
                        "flags": [],
                        "parameters": {
                            "protonate": True,
                            "method": "rdkit"
                        }
                    },
                    "additional": {
                    }
                },
                "input": {
                    "compounds": [{
                            "source": "{input_json_path}",          # path to JSON input file from REINVENT
                            "source_type": "file",
                            "format": "JSON"
                        }
                    ]
                }
            }, {
                "step_id": "ADV",
                "type": "vina_docking",
                "execution": {
                    "prefix_execution": "module load AutoDock_Vina",
                    "parallelization": {
                        "cores": 4
                    },
                    "failure_policy": {
                        "n_tries": 3
                    }
                },
                "settings": {
                    "arguments": {
                        "flags": [],
                        "parameters": {
                        }
                    },
                    "additional": {
                        "configuration": {
                            "seed": 42,
                            "receptor_path": receptor_file_path,    # file path to PDBQT file shipped in
                                                                    # this repository
                            "number_poses": 2,
                            "search_space": {
                                "--center_x": 3.3,                  # coordinates and size of the search space
                                                                    # see details above in the text
                                "--center_y": 11.5,
                                "--center_z": 24.8,
                                "--size_x": 15,
                                "--size_y": 10,
                                "--size_z": 10
                            }
                        },
                        "grid_ids": ["1UYD"]                        # allows to name the receptor / grid
                    }
                },
                "input": {
                    "compounds": [{
                            "source": "rdkit_embedding",
                            "source_type": "step"
                        }
                    ]
                },
                "writeout": [{
                        "compounds": {
                            # this will write out the result JSON REINVENT expects and parses
                            "category": "conformers",
                            "selected_tags": ["docking_score"],
                            "aggregation": {
                                "mode": "best_per_compound",
                                "key": "docking_score",
                                "highest_is_best": False
                            }
                        },
                        "destination": {
                            "resource": "{output_json_path}",
                            "type": "file",
                            "format": "JSON"
                        }
                    },
                    {
                        # (optional) writeout: all conformers for this epoch are stored in an SDF file
                        "compounds": {
                            "category": "conformers"
                        },
                        "destination": {
                            "resource": os.path.join(output_dir, "{step_id}_poses.sdf"),
                            "type": "file",
                            "format": "SDF"
                        }
                    },
                    {
                        # (optional) writeout: the scores (aggregated to show the best per compound
                        #                      only) are written out as a CSV (including the grid id)
                        "compounds": {
                            "category": "conformers",
                            "selected_tags": ["docking_score", "grid_id"],
                            "aggregation": {
                                "mode": "best_per_compound",
                                "key": "docking_score"
                            }
                        },
                        "destination": {
                            "resource": os.path.join(output_dir, "{step_id}_scores.csv"),
                            "type": "file",
                            "format": "CSV"
                        }
                    }
                ]
            }
        ]
    }
}

with open(docking_configuration_path, 'w') as f:
    json.dump(adv_dict, f, indent=2)

## 3. Set up the `REINVENT` configuration 
In the cells below we will build a nested dictionary object that will be eventually converted to `JSON` file which in turn will be consumed by `REINVENT`. You can find this file in your `tmp_dir` location. To sum up, we will have one `JSON` steering the `REINVENT` execution and `REINVENT` in turn will call `Icolos` as a component which is orchestrated by its own `JSON` configuration file. In principle, you could have any number of `Icolos` components in a single `REINVENT` run.

### A) Declare the run type

In [3]:
# initialize the dictionary
configuration = {
    "version": 3,                          # we are going to use REINVENT's newest release
    "run_type": "reinforcement_learning",  # other run types: "sampling", "validation",
                                           #                  "transfer_learning",
                                           #                  "scoring" and "create_model"
    "model_type": "default"
}

### B) Sort out the logging details
This includes `result_folder` path where the results will be produced.

Also: `REINVENT` can send custom log messages to a remote location. We have retained this capability in the code. if the `recipient` value differs from `"local"` `REINVENT` will attempt to POST the data to the specified `recipient`. 

In [4]:
# add block to specify whether to run locally or not and
# where to store the results and logging
configuration["logging"] = {
    "sender": "http://0.0.0.1",            # only relevant if "recipient" is set to "remote"
    "recipient": "local",                  # either to local logging or use a remote REST-interface
    "logging_frequency": 1,                # log every x-th steps
    "logging_path": os.path.join(tmp_dir, "progress.log"), # load this folder in tensorboard
    "result_folder": os.path.join(tmp_dir, "results"),         # will hold the compounds (SMILES) and summaries
    "job_name": "Reinforcement learning Icolos demo",         # set an arbitrary job name for identification
    "job_id": "demo"                       # only relevant if "recipient" is set to a specific REST endpoint
}

Create `parameters` field:

In [5]:
# add the "parameters" block
configuration["parameters"] = {}

### C) Set Diversity Filter
During each step of Reinforcement Learning the compounds scored above `minscore` threshold are kept in memory. Those scored smiles are written out to a file in the results folder `scaffold_memory.csv`.

In [6]:
# add a "diversity_filter"
configuration["parameters"]["diversity_filter"] =  {
    "name": "IdenticalMurckoScaffold",     # other options are: "IdenticalTopologicalScaffold", 
                                           #                    "NoFilter" and "ScaffoldSimilarity"
                                           # -> use "NoFilter" to disable this feature
    "nbmax": 25,                           # the bin size; penalization will start once this is exceeded
    "minscore": 0.4,                       # the minimum total score to be considered for binning
    "minsimilarity": 0.4                   # the minimum similarity to be placed into the same bin
}

### D) Set Inception
* `smiles` provide here a list of smiles to be incepted 
* `memory_size` the number of smiles allowed in the inception memory
* `sample_size` the number of smiles that can be sampled at each reinforcement learning step from inception memory

In [7]:
# prepare the inception (we do not use it in this example, so "smiles" is an empty list)
configuration["parameters"]["inception"] = {
    "smiles": [],                          # fill in a list of SMILES here that can be used (or leave empty)
    "memory_size": 100,                    # sets how many molecules are to be remembered
    "sample_size": 10                      # how many are to be sampled each epoch from the memory
}

### E) Set the general Reinforcement Learning parameters
* `n_steps` is the amount of Reinforcement Learning steps to perform. Best start with 1000 steps and see if thats enough.
* `agent` is the generative model that undergoes transformation during the Reinforcement Learning run.

We reccomend keeping the other parameters to their default values.

In [8]:
# set all "reinforcement learning"-specific run parameters
configuration["parameters"]["reinforcement_learning"] = {
    "prior": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "agent": os.path.join(ipynb_path, "models/random.prior.new"), # path to the pre-trained model
    "n_steps": 2,                          # the number of epochs (steps) to be performed; often 1000
                                           # (set to 2 in this notebook to decrease docking computation time -
                                           # it is not expected that the agent will appreciably learn to
                                           # generate compounds with good docking scores in only 2 epochs.
                                           # The purpose of this notebook is to illustrate how DockStream 
                                           # can be specified as a component to the `Scoring Function`)
    
    "sigma": 128,                          # used to calculate the "augmented likelihood", see publication
    "learning_rate": 0.0001,               # sets how strongly the agent is influenced by each epoch
    "batch_size": 32,                      # specifies how many molecules are generated per epoch, often 128
                                           # docking becomes more computationally demanding the greater the
                                           # batch size, as each compound must be docked. Depending on the
                                           # docking configuration, embedding ligands may generate different 
                                           # tautomers, ionization states, etc., which will increase the number
                                           # of compounds that need to be docked. Batch size is set to 32 in 
                                           # this notebook to decrease docking computation time)
    
    "reset": 0,                            # if not '0', the reset the agent if threshold reached to get
                                           # more diverse solutions
    "reset_score_cutoff": 0.5,             # if resetting is enabled, this is the threshold
    "margin_threshold": 50                 # specify the (positive) margin between agent and prior
}

### F) Define the scoring function
The scoring function will consist only of the `DockStream` component, in which `Glide` with `LigPrep` is used for molecular docking.

In [9]:
# prepare the scoring function definition and add at the end
scoring_function = {
    "name": "custom_product",                  # this is our default one (alternative: "custom_sum")
    "parallel": False,                         # sets whether components are to be executed
                                               # in parallel; note, that python uses "False" / "True"
                                               # but the JSON "false" / "true"

    # the "parameters" list holds the individual components
    "parameters": [

    # add component: use 
    {
    "component_type": "icolos",                           # use Icolos as a Scoring Function component      
    "name": "Icolos_RDkit_ADV",                          # arbitrary name
    "weight": 1,
    "specific_parameters": {
        "transformation": {
            "transformation_type": "reverse_sigmoid",         # lower Glide scores are better - use reverse
                                                              # sigmoid transformation
            "low": -11,
            "high": -5,
            "k": 0.25
            },
        "debug": False,
        "values_key": "docking_score",
        "configuration_path": docking_configuration_path,
        "executor_path": icolos_executor
        }
    }]
}
configuration["parameters"]["scoring_function"] = scoring_function

## 4. Write out the `REINVENT` configuration

We now have successfully filled the dictionary and will write it out as a `JSON` file in the output directory. Please have a look at the file before proceeding in order to see how the paths have been inserted where required and the `dict` -> `JSON` translations (e.g. `True` to `true`) have taken place.

In [10]:
# write the configuration file to the disc
configuration_JSON_path = os.path.join(tmp_dir, "RL_Icolos_config.json")
with open(configuration_JSON_path, 'w') as f:
    json.dump(configuration, f, indent=4, sort_keys=True)

## 5. Run `REINVENT`
Now it is time to execute `REINVENT` locally. Note, that depending on the number of epochs (steps) and the execution time of the scoring function components, this might take a while. As we have only specified a low number of epochs (125) and all components should be fairly quick, this should not take too long in our case though.

The command-line execution looks like this:
```
# activate envionment
conda activate reinvent.v3.0

# execute REINVENT
python <your_path>/input.py <config>.json
```

In [11]:
%%capture captured_err_stream --no-stderr

workdir = os.getcwd()
os.chdir(tmp_dir)

# execute REINVENT from the command-line
!{reinvent_env}/bin/python {reinvent_dir}/input.py {configuration_JSON_path}

os.chdir(workdir)

In [12]:
# print the output to a file, just to have it for documentation
with open(os.path.join(output_dir, "run.err"), 'w') as file:
    file.write(captured_err_stream.stdout)

# prepare the output to be parsed
list_epochs = re.findall(r'INFO.*?local', captured_err_stream.stdout, re.DOTALL)
data = [epoch for idx, epoch in enumerate(list_epochs)]
data = ["\n".join(element.splitlines()[:-1]) for element in data]

We have calculated a total of 2 epochs, let us quickly investigate how the agent fared in the first epoch. Below you see the print-out of the first epoch. Running `REINVENT` with `Icolos` for more epochs will show that the agent gradually improves over time, i.e, generates compounds that satisfy the docking component ever better, thus generating compounds that dock well. Note, that the fraction of valid `SMILES` is high right from the start (because we use a pre-trained prior). You can see the partial scores for each component for the first couple of compounds, but the most important information is the average score. If run for more epochs, the average score will increase over time.

In [13]:
for element in data:
    print(element)

INFO     starting an RL run
INFO     
 Step 0   Fraction valid SMILES: 100.0   Score: 0.3316   Time elapsed: 1095   Time left: 2190.0
  Agent     Prior     Target     Score     SMILES
-25.47    -25.47     -0.93      0.19      c1(-c2csc3c2c(=N)[nH]cc3-c2cc(NS(C)(=O)=O)ccc2)ccccc1
-18.64    -18.64      5.89      0.19      c1c(CN)ccc(NC(C)=O)c1
-30.19    -30.19     45.94      0.59      c1n(Cc2c(F)c(F)c(C)c(F)c2F)c2c(cccc2)c1C(C1C(C)(C)C1(C)C)=O
-38.06    -38.06    -38.06      0.00      c1c(C)cccc1-c1ccc2cc(-c3noc(-c4c(C)cc(C(O)=O)cc4)n3)oc2c1
-37.02    -37.02    -24.33      0.10      C1(c2cnccc2)CCN(C(=O)NCCc2n3ncnc3nc(C)c2C)CC1
-20.09    -20.09     61.84      0.64      c12ccc(-c3oc4cc(O)ccc4c(=O)c3)cc1cccc2
-40.07    -40.07    -36.50      0.03      c12ccccc1nc(C(n1cncc1)=Cc1ccc(C(=O)N3CCCCC3)cc1)s2
-34.08    -34.08    -33.68      0.00      c1cc(C(C)NC(=O)C(c2cc3c(cc2)nc(OCC)n3Cc2ccc(-c3ccccc3)cc2)O)ccc1
-18.31    -18.31     36.54      0.43      c12c(C(c3cc(OC)c(OC)c(OC)c3)=O)nccc1cc(OC)c

## 6. Analyse the Results
In order to analyze the run in a more intuitive way, we can use `tensorboard`:

```
# go to the root folder of the output
cd <your_path>/REINVENT_RL_demo

# make sure, you have activated the proper environment
conda activate reinvent.v3.0

# start tensorboard
tensorboard --logdir progress.log
```

Then copy the link provided to a browser window, e.g. "http://workstation.url.com:6006/".

The results folder will hold four different files: the agent (pickled), the input JSON (just for reference purposes), the memory (highest scoring compounds in `CSV` format) and the scaffold memory (in `CSV` format).

In [14]:
!head -n 15 {tmp_dir}/results/memory.csv

,smiles,score,likelihood
30,S(c1nc[nH]n1)c1nnc2c(cccc2)c1-c1ccc(C)cc1,0.9362163,-30.029327
23,c1(CCNS(c2ccc3c(c2)NC(=O)CO3)(=O)=O)nc2cc(C)nc(C)n2n1,0.8720069,-33.14348
19,C(c1onc(-c2ccc3c(c2)CCO3)n1)CC(Nc1ccccc1Br)=O,0.8609113,-30.007324
22,Oc1cc2c(cc1O)CCNC2CCc1ccc2c(cccc2)c1,0.8490204,-26.214272
26,c1c(N=c2nc(-c3cc4c(ncs4)cc3)cc[nH]2)cccc1,0.7597469,-23.750828
28,c1cc(OC2CCNCC2OCc2ccc3ccccc3c2)ccc1Cl,0.7418012,-25.59244
5,C1(c2nc(=Nc3ccccc3)[nH]cc2)CCN(CCOC)CC1,0.7418012,-26.271206
20,C1CN(Cc2cc3c(cc2)OCO3)CCN1CC(NCCc1c(F)cccc1)=O,0.703385,-25.126637
24,c1(NC(=O)C(C)OC(=O)CC2CC3CCC2C3)cc(OC)ccc1,0.6618585,-25.261784
27,n1cc2c([nH]c(=N)c(C(=O)NCCc3ccc(OC)c(OC)c3)c2)nc1,0.6618585,-24.886251
11,c1(F)cc(COc2cccc(N=c3[nH]cnc4cc(-c5sccn5)sc43)c2)ccc1,0.640065,-34.586082
5,c12ccc(-c3oc4cc(O)ccc4c(=O)c3)cc1cccc2,0.640065,-20.088259
13,c1c(OCC)ccc(C(=O)COC(=O)c2cc3c(cc2)OCCO3)c1,0.640065,-22.818
4,c1(-c2noc(CN(C)Cc3cn(C)nc3)n2)c(F)cccc1,0.640065,-23.297112
