> **How to run this notebook (command-line)?**
1. Install the `ReinventCommunity` environment:
`conda env create -f environment.yml`
2. Activate the environment:
`conda activate ReinventCommunity`
3. Execute `jupyter`:
`jupyter notebook`
4. Copy the link to a browser


# `REINVENT 3.2`: Automated Curriculum Learning Demo

The aim of this notebook is to illustrate how `REINVENT` can be used for the _de novo_ design of molecules in a *Curriculum Learning* (CL) setup. The general idea of CL is to decompose a Production Objective (target properties for the proposed molecules) into simpler sequential Curriculum Objectives to accelerate convergence: 

![](img/REINVENT_CL_mode.png)
The purpose of the Curriculum Objectives is to guide the REINVENT agent to areas of chemical space that satisfy the Production Objective. The order of Curriculum Objectives is user-defined and each Objective (Curriculum and Production) can consist of any REINVENT scoring function component(s), e.g., Tanimoto similarity. Progression through Curriculum Objectives is controlled by a threshold (can be specified for different Objectives) which the agent must achieve.

In the following section, we will show how to set up an `Automated Curriculum Learning` `REINVENT` run that gradually generates compounds possessing a target scaffold (which is not present in the training set of the provided prior) that represents a difficult task for a standard `REINVENT` run. The target scaffold is dihydro-pyrazoloquinazoline and is a known active scaffold against 3-phosphoinositide-dependent protein kinase-1 (PDK1). The reference paper is:

**Angiolini, M.; Banfi, P.; Casale, E.; Casuscelli, F.; Fiorelli, C.; Saccardo, M. B.; Silvagni, M.; Zuccotto, F. Structure-Based Optimization of Potent PDK1 Inhibitors. Bioorg. Med. Chem. Lett. 2010, 20 (14), 4095–4099. https://doi.org/10.1016/j.bmcl.2010.05.070.**



## 1. Setting the Paths
_Please update the following code block such that it reflects your system's installation and execute it._

In [1]:
# load dependencies
import os
import re
import json
import tempfile

# --------- change these path variables as required
reinvent_dir = os.path.expanduser("~/Desktop/Reinvent")
reinvent_env = os.path.expanduser("~/miniconda3/envs/reinvent.v3.2")
output_dir = os.path.expanduser("~/Desktop/REINVENT_AutoCL_demo")

# --------- do not change
# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

# if required, generate a folder to store the results
try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

## 2. Setting the Configuration
In the cells below we will build a nested dictionary object that will be eventually converted to JSON file which in turn will be interpreted by `REINVENT`. 
You can find this file in your `output_dir` location.

### A) Declare the Run Type

In [2]:
# initialize the dictionary
configuration = {
    "version": 3,                                 # we are going to use REINVENT's newest release
    "run_type": "curriculum_learning",            # other run types: "sampling", "reinforcement_learning",
                                                  #                  "transfer_learning",
                                                  #                  "scoring" and "create_model"
    "model_type": "default"
}

### B) Sort out the logging details
This includes `result_folder` path where the results will be produced.

Also: `REINVENT` can send custom log messages to a remote location. We have retained this capability in the code. if the `recipient` value differs from `"local"`, `REINVENT` will attempt to POST the data to the specified `recipient`. 

In [3]:
# add block to specify whether to run locally or not and
# where to store the results and logging
configuration["logging"] = {
    "sender": "http://0.0.0.1",            # only relevant if "recipient" is set to "remote"
    "recipient": "local",                  # either to local logging or use a remote REST-interface
    "logging_frequency": 100,              # log every x-th steps
    "logging_path": os.path.join(output_dir, "progress.log"), # load this folder in tensorboard
    "result_folder": os.path.join(output_dir, "results"),     # will hold the compounds (SMILES) and summaries
    "job_name": "Automated Curriculum Learning Demo",         # set an arbitrary job name for identification
    "job_id": "Demo"                       # only relevant if "recipient" is set to a specific REST endpoint
}

Create `parameters` field:

In [4]:
# add the "parameters" block
configuration["parameters"] = {}

# First add the paths to the Prior, Agent, and set the curriculum type to automated
configuration["parameters"]["prior"] = os.path.join(ipynb_path, "models/random.prior.new")
configuration["parameters"]["agent"] = os.path.join(ipynb_path, "models/random.prior.new")
configuration["parameters"]["curriculum_type"] = "automated"

### C) Specify the Curriculum Strategy
Overview of important `REINVENT` parameters:
* **Diversity Filter**: If the agent becomes very focussed, it tends to produce the similar molecules over and over (because they return high scores). To enrich different scaffolds, we can activate the diversity filter, which will "bin" the molecules into groups (scaffolds). Once a given bin is full, all other molecules with the same scaffold will be penalized score-wise, effectively "pushing" the agent out of a local minimum in the score landscape thus enriching diversity. 
* **Inception**: Sometimes agents "linger around" for a while before they (by chance) happen to pick up a trace and generate interesting compounds. To speed up this very early exploration, we can *incept* a couple of promising molecules as list of `SMILES`. Inception also allows storing of molecules up to `memory_size` which correspond to the highest scoring molecules which can be replayed back to the agent to keep it on track.

These 2 parameters are relevant for any `REINVENT` experiment but there are additional features and considerations for a Curriculum Learning experiment. Curriculum Learning is split into a Curriculum Phase and Production Phase. During the Curriculum Phase, Curriculum Objectives are used to guide the agent to favourable chemical space. During the Production Phase, the Production Objective is activated and the agent samples for a pre-defined number of epochs, presumably generating favourable molecules. 
1) A separate and distinct **Diversity Filter** can be specified in the Curriculum and Production Phases. This is particularly relevant as perhaps a **Diversity Filter** is not desired during the Curriculum Phase since the goal is to guide the agent to favourable chemical space and not necessarily generate molecules that satisfy the Production Objective (target objective). Conversely, once the Production Phase starts, initializing a **Diversity Filter** can ensure agent sampling of diverse minima to balance exploration and exploitation (Setting up a **Diversity Filter** for the Production Phase will be shown in the "Specify the Production Strategy" section)

2) A separate and distinct **Inception** can also be specified in the Curriculum and Production Phases. A relevant use case is that any stored molecules in the **Inception** memory during the Curriculum Phase may not be relevant in the Production Phase. In this case, initializing a new **Inception** will clear the memory. 

The below cell block will set up the Curriculum Strategy which provides all the parameters necessary in the Curriculum Phase.

In [5]:
# set up the Curriculum Strategy
configuration["parameters"]["curriculum_strategy"] = {
    "name": "user_defined",         # denotes that the order of Curriculum Objectives is defined by the user
    "max_num_iterations": 1500,     # denotes the total number of epochs to spend in the Curriculum Phase
                                    # if by the end of the total epochs the last Curriculum Objective is not 
                                    # satisfied (based on the agent achieving a score >= threshold), the run stops
    "batch_size": 128,              # specifies how many molecules are generated per epoch
    "learning_rate": 0.0001,        # sets how strongly the agent is influenced by each epoch
    "sigma": 128,                   # used to calculate the "augmented likelihood", see publication
    "learning_strategy": {
        "name": "dap_single_query",
        "parameters": {
            "sigma": 120
        }
    },
    "diversity_filter": {
        "name": "NoFilter",         # other options are: "IdenticalTopologicalScaffold", 
                                    #                    "IdenticalMurckoScaffold", and "ScaffoldSimilarity"

        "bucket_size": 25,          # the bin size; penalization will start once this is exceeded
        "minscore": 0.4,            # the minimum total score to be considered for binning
        "minsimilarity": 0.4        # the minimum similarity to be placed into the same bin
    },
    "inception": {
        "smiles": [],               # fill in a list of SMILES here that can be used (or leave empty)
        "memory_size": 100,         # sets how many molecules are to be remembered
        "sample_size": 10           # how many are to be sampled each epoch from the memory for experience replay
    },
    # Curriculum Objectives are all the scoring functions that are to be sequentially activated
    "curriculum_objectives": [{
        # 1st scoring function below
        "scoring_function": {
            "name": "custom_product",     # this is our default one (alternative: "custom_sum")
            "parallel": False,
            "parameters": [{
                "component_type": "matching_substructure",     # enforce the match to a given substructure
                "name": "Pyrimidine",     # arbitrary name for the component
                "specific_parameters": {
                    "smiles": [
                            "[c]1[c][c]n[c]n1"     # a match with this substructure is required
                    ]
                },    
                "weight": 1}]             # the weight of the component (default: 1)
            },
        "score_threshold": 0.8            # agent must achieve an average score of this before 
                                          # progressing to the next Curriculum Objective 
        },
        # 2nd scoring function below
        {
        "scoring_function": {
            "name": "custom_product",
            "parallel": False,
            "parameters": [{
                "component_type": "matching_substructure",
                "name": "H-Bonding Ring",
                "specific_parameters": {
                    "smiles": [
                            "[c]1[c][c]nc(n1)[N]"
                    ]
                },
                "weight": 1}]
            },
        "score_threshold": 0.8
        },
        # 3rd scoring function below
        {
        "scoring_function": {
            "name": "custom_product",
            "parallel": False,
            "parameters": [{
                "component_type": "matching_substructure",
                "name": "H-Bonding Ring with Phenyl",
                "specific_parameters": {
                    "smiles": [
                            "[c]1[c][c]c([c][c]1)[N]c2n[c][c][c]n2"
                    ]
                },
                "weight": 1}]
            },
        "score_threshold": 0.8
        },
        # 4th scoring function below
        {
        "scoring_function": {
            "name": "custom_product",
            "parallel": False,
            "parameters": [{
                "component_type": "matching_substructure",
                "name": "Double Ring",
                "specific_parameters": {
                    "smiles": [
                            "[c]1[c][c]c([c][c]1)[N]c2n[c]c3c(n2)-[c][c][C][C]3"
                    ]
                },    
                "weight": 1}]
            },
        "score_threshold": 0.8
        },
        # 5th scoring function below
        {
        "scoring_function": {
            "name": "custom_product",
            "parallel": False,
            "parameters": [{
                "component_type": "matching_substructure",
                "name": "Triple Ring",
                "specific_parameters": {
                    "smiles": [
                            "[c]1[c][c]c([c][c]1)[N]c2n[c]c3c(n2)-c4c([c]n[n]4)[C][C]3"
                    ]
                },
                "weight": 1}]
            },
        "score_threshold": 0.8
        },
        # 6th scoring function below
        {
        "scoring_function": {
            "name": "custom_product",
            "parallel": False,
            "parameters": [{
                "component_type": "matching_substructure",
                "name": "Full Substructure",
                "specific_parameters": {
                    "smiles": [
                            "[*]NC(=O)c1nn([*])c2c1CCc3cnc(Nc4ccccc4)nc23"
                    ]
                },
                "weight": 1}]
            },
        "score_threshold": 0.8
        },
    ]
}

### D) Specify the Production Strategy

The Production Strategy provides the parameters to be used in the Production Phase. Here, the Production Objective (target objective) is activated. A new **Diversity Filter** and **Inception** can be initialized. We keep the Curriculum Phase **Inception** here as the last Curriculum Objective is the same as the Production Objective. Moreover, we continue to use **NoFilter** so the agent is not penalized for sampling the same scaffold as our only goal in this tutorial is to generate the target scaffold. Using a **Diversity Filter** would penalize the agent and eventually give compounds possesing the target scaffold a score of 0.

In [6]:
# set up the Curriculum Strategy
configuration["parameters"]["production_strategy"] = {
    "name": "standard",
    "retain_inception": True,       # option to retain the inception from the Curriculum Phase
                                    # retain it here since the last Curriculum Objective is the same as
                                    # Production Objective. Previous top compounds will be relevant
    
    "number_of_steps": 100,         # number of epochs to run the Production Phase
    "batch_size": 128,              # specifies how many molecules are generated per epoch
    "learning_rate": 0.0001,        # sets how strongly the agent is influenced by each epoch
    "sigma": 128,                   # used to calculate the "augmented likelihood", see publication
    "learning_strategy": {
        "name": "dap_single_query",
        "parameters": {
            "sigma": 120
        }
    },
    "diversity_filter": {
        "name": "NoFilter",         # other options are: "IdenticalTopologicalScaffold",
                                    #                    "IdenticalMurckoScaffold"", and "ScaffoldSimilarity"

        "bucket_size": 25,          # the bin size; penalization will start once this is exceeded
        "minscore": 0.4,            # the minimum total score to be considered for binning
        "minsimilarity": 0.4        # the minimum similarity to be placed into the same bin
    },
    "inception": {
        "smiles": [],               # fill in a list of SMILES here that can be used (or leave empty)
        "memory_size": 100,         # sets how many molecules are to be remembered
        "sample_size": 10           # how many are to be sampled each epoch from the memory for experience replay
    },
    # the Production Objective contains the final scoring function to be activated
    # here, it is the same scoring function as the last Curriculum Objective
    # as we want to continue sampling the target substructure
    "scoring_function": {
            "name": "custom_product",
            "parallel": False,
            "parameters": [{
                "component_type": "matching_substructure",
                "name": "Full Substructure",
                "specific_parameters": {
                    "smiles": [
                            "[*]NC(=O)c1nn([*])c2c1CCc3cnc(Nc4ccccc4)nc23"
                    ]
                },
                "weight": 1}]
    }
}

# Write Out the Configuration
We now have successfully filled the dictionary and will write it out as a `JSON` file in the output directory. Please have a look at the file before proceeding in order to see how the paths have been inserted where required and the `dict` -> `JSON` translations (e.g. `True` to `true`) have taken place.

In [7]:
configuration_JSON_path = os.path.join(output_dir, "AutoCL_config.json")
with open(configuration_JSON_path, 'w') as f:
    json.dump(configuration, f, indent=4)

## 4. Run `REINVENT`
Now it is time to execute `REINVENT` locally. Note, that depending on the number of epochs (steps) and the execution time of the scoring function components, this might take a while. The "matching_substructure" component should be fairly quick, and the total runtime should be under 15 minutes.

**Note**: Sometimes, `REINVENT` will be unsuccessful in generating the desired substructure in this demo as there is stochasticity involved in the sampling process. The substructure was not present in the training set for the prior and thus represents a challenging task.

The command-line execution looks like this:
```
# activate envionment
conda activate reinvent.v3.0

# execute REINVENT
python <your_path>/input.py <config>.json
```

In [8]:
%%capture captured_err_stream --no-stderr

# execute REINVENT from the command-line
!{reinvent_env}/bin/python {reinvent_dir}/input.py {configuration_JSON_path}

## Analyse The Results
In order to analyze the run, we can use `tensorboard`:

```
# go to the root folder of the output
cd <your_path>/REINVENT_AutoCL_demo

# make sure you have activated the proper environment
conda activate reinvent.v3.2

# start tensorboard
tensorboard --logdir progress.log


```

Then copy the link provided to a browser window, e.g. "http://workstation.url.com:6006/". The following figures are example plots - remember, that there is always some randomness involved so replicate runs will follow different training progressions.

**Note: There is a chance that the curriculum learning run does not find the target scaffold. The target scaffold is not present in the prior and thus may not be found in 1500 epochs (as enforced in the configuration `JSON`)**

In `tensorboard` you can monitor the individual scoring function components. By analyzing the average score plot, we can see that the agent gradually constructs the target scaffold. The plot is annotated with the substructures. We further observe that the `Fraction_valid_SMILES` was high throughout. 

![](img/AutoCL_Training_Plots.png)

There is also an "Images" tab available in `tensorboard` that lets you browse through the compounds generated in an easy way. In the moleculess, the target scaffold is highlighted in red (if present). Also, the total scores are given per molecule. Below is what was observed for epoch 1060. The generated compounds feature the target scaffold and therefore all possess the maximum score of 1.000.

![](img/AutoCL_Sample_Compounds.png)

Finally, scaffold memories is a `CSV` file containing all the compounds collected with each each Curriculum Objective or Production Objective activated. All Curriculum Objective scaffold memories are identified by a number suffix starting from 0 (denoting the first Curriculum Objective). The scaffold memory for the Production Objective is `scaffold_memory.csv`. The files are saved at:

`<your_path>/REINVENT_AutoCL_demo/results`