<div class="alert alert-block alert-info">
<b>How to run this notebook?</b><br />
<ol>
    <li>Install the DockStream environment: conda env create -f environment.yml in the DockStream directory</li>
    <li>Activate the environment: conda activate DockStreamCommunity</li>
    <li>Execute jupyter: jupyter notebook</li>
    <li> Copy the link to a browser</li>
    <li> Update variables <b>dockstream_path</b> and <b>dockstream_env</b> (the path to the environment DockStream) in the 
        first code block below</li>
    </ol>
</div>

# Benchmarking Script Demo

The purpose of the `benchmarking script` is to enable automated batch execution of `DockStream` runs. This will allow users to run multiple backends + ligand embedders (e.g. `Glide with LigPrep` and `Hybrid with Corina`) in a more streamlined manner to determine the best docking configuration for their specific application. A subsequent `analysis script` quantifies the differences between different docking configurations by automating calculations of relevant enrichment metrics and generating plots for visualization of `DockStream` run results. A demo for the `analysis script` can be found in the `demo_Analysis_Script` Jupyter notebook. This notebook focuses strictly on the `benchmarking script` and demonstrates the necessary preparatory steps for batch execution of `DockStream` runs. 


**Benchmarking Script Steps:**
  1. Prepare the ligands file
  2. Prepare the receptors/grids
  3. Prepare `DockStream` configuration files (`JSON` format)
  4. Execute the script and parse the results


__Note:__ By default, this notebook will deposit all files created into `~/Desktop/Benchmarking_demo`.

The following imports / loadings are only necessary when executing this notebook. If you want to use `benchmarking_v2.py` directly from the command-line, it is enough to execute the following with the appropriate input path (path to a folder containing `DockStream` configuration `JSONs`):

```
conda activate DockStream (or DockStreamFull for GOLD docking)
python /path/to/DockStream/benchmarking.py -input_path <path to input JSONs folder>
```

In [None]:
import os
import json
import tempfile

# update these paths to reflect your system's configuration
dockstream_path = os.path.expanduser("~/Desktop/ProjectData/DockStream")
# note: DockStreamFull (as opposed to DockStream) is required to run GOLD
dockstream_env = os.path.expanduser("~/miniconda3/envs/DockStream")

# no changes are necessary beyond this point
# ---------
# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

# generate the path to the benchmarking script entry point
benchmarking_script = os.path.join(dockstream_path, "benchmarking.py")

# generate a folder to store the results
output_dir = os.path.expanduser("~/Desktop/Benchmarking_demo")
try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

## Step 1: Prepare the Ligands File

A file containing all the ligands to be docked must be generated in either `SDF` or `SMI` or `CSV` format (see docking_input_types notebook for more details). Typically, `SMI` format is used for readability especially when there is a substantial number of ligands to be docked. In the `DockStream` codebase, there is a script called `sdf2smiles.py` which converts a ligands database in `SDF` format to `SMI` format. 

For the purpose of this notebook, a ligand `SMI` file is provided and shipped in the `DockStream` codebase. Let's generate the path to this file:

In [None]:
# generate the paths to the ligands smi file shipped with this implementation
ligands_path = ipynb_path + "/../data/Benchmarking_Script/ligands_smiles.smi"

## Step 2: Prepare the Receptors/Grids

As with any `DockStream` run, a receptor/grid must be prepared before the docking run. This process will be dependent on the backend(s) used. For example, receptor grids for `Glide` can be generated using Schrodinger's GUI, `Maestro` (see demo_Glide for more details). On the other hand, receptor grids for `Hybrid` can be generated using `target_preparator.py` which is a script shipped with the `DockStream` codebase (see demo_Hybrid for more details).

For the purpose of this notebook, Glide and Hybrid receptor grids are provided and shipped with the `DockStream` codebase. Note that the benchmarking script is compatible with any backend and any ligand embedder. `Glide` and `Hybrid` were chosen in this notebook arbitrarily. Let's generate the paths to the relevant files:

In [None]:
# generate the paths to the receptor grids shipped with this implementation
glide_grid_path = os.path.join(ipynb_path, "../data/Benchmarking_Script/1UYD_grid.zip")
hybrid_grid_path = os.path.join(ipynb_path, "../data/Benchmarking_Script/1UYD_grid.oeb")
smiles_path = os.path.join(ipynb_path, "../data/Benchmarking_Script/ligands_smiles.smi")

# generate output paths for the docked ligands and the scores
glide_docked_poses_path = os.path.join(output_dir, "glide_docked_poses.sdf")
glide_docked_scores_path = os.path.join(output_dir, "glide_docked_scores.csv")
hybrid_docked_poses_path = os.path.join(output_dir, "hybrid_docked_poses.sdf")
hybrid_docked_scores_path = os.path.join(output_dir, "hybrid_docked_scores.csv")                             

Notice in the above code block the `benchmarking_run_jsons` path is a path to a folder rather than a single `DockStream` `JSON` configuration file. This is because the `benchmarking script` takes as input a folder containing 1 or more `DockStream` `JSON` configuration files and runs them all successively (single runs are supported too, in which case, the path to the single configuration `JSON` should be passed). 

## Step 3: Prepare the DockStream Configuration Files

Next, we need to generate the `DockStream` configuration files. This step is covered in the backend specific demos (e.g. `demo_Glide`) but will be briefly described again here as it is especially relevant in highlighting the utility of the benchmarking script. Let's first create a new subfolder to hold the `DockStream` run JSONs that will be generated later:

In [None]:
# generate output paths for the DockStream configuration files 
benchmarking_conf_jsons = os.path.join(output_dir, "benchmarking_conf_jsons")
# create the benchmaking run jsons subfolder
try:
    os.mkdir(benchmarking_conf_jsons)
except FileExistsError:
    pass

Each `DockStream` run will require its own configuration `JSON` (they need not be unique but that would simply run `DockStream` with the exact same configuration which is probably not desirable unless you are interested in observing the stochastic nature of some docking algorithms in select backends such as `GOLD`). An example `Glide with LigPrep` configuration `JSON` is shown in the below code block (for more details on `Glide`, see demo_Glide).

In [None]:
# specify the embedding and docking JSON file as a dictionary and write it out
glide_ligprep_conf_json = {
  "docking": {
    "header": {                                   # general settings
      "environment": {
      }
    },
    "ligand_preparation": {                       # the ligand preparation part, defines how to build the pool
      "embedding_pools": [
        {
          "pool_id": "Ligprep",
          "type": "Ligprep",
          "parameters": {
            "prefix_execution": "module load schrodinger/2019-4",
            "use_epik": {
                "target_pH": 7.4,                 # LigPrep embeds ligands at a specified pH which is particularly
                "pH_tolerance": 0.2               # relevant to ionization states --> this parameter can be tweaked
          },
            "force_field": "OPLS3e"
          },
          "input": {
            "standardize_smiles": False,
            "input_path": smiles_path,
            "type": "smi"                                   
          }
        }
      ]
    },
    "docking_runs": [
        {
          "backend": "Glide",
          "run_id": "Glide",
        "input_pools": ["Ligprep"],
        "parameters": {
          "prefix_execution": "module load schrodinger/2019-4", # will be executed before a program call
          "parallelization": {                                  
            "number_cores": 2
          },
          "glide_flags": {                                  # all all command-line flags for Glide here 
            "-HOST": "localhost"
          },
          "glide_keywords": {                               # add all keywords for the "input.in" file here
            "EXPANDED_SAMPLING": "True",                    # all these parameteres can be tweaked and/or
            "GRIDFILE": [glide_grid_path],                  # included/omitted
            "NENHANCED_SAMPLING": "2",
            "POSE_OUTTYPE": "ligandlib_sd",
            "POSES_PER_LIG": "3",
            "POSTDOCK_NPOSE": "15",
            "POSTDOCKSTRAIN": "True",
            "PRECISION": "HTVS"
          }
        },
        "output": {
          "poses": { "poses_path": glide_docked_poses_path },            # output path to save docked poses
          "scores": { "scores_path": glide_docked_scores_path }          # output path to save docked scores   
        }
      }
    ]
  }
}

with open(os.path.join(benchmarking_conf_jsons, "Glide_LigPrep.json"), "w+") as f:
    json.dump(glide_ligprep_conf_json, f, indent=2)

The above cell block saves the `DockStream` `Glide` configuration `JSON` in the benchmarking `JSONs` folder. Notice the comments that highlight parameters that can be tweaked. For instance, `"target_pH"` can be tweaked if the user is interested in docking a set of ligands at different pH. This can have a significant impact on ligand activity as the ionization states will change. Moreover, one could envision changing parameters located under `"glide_keywords"`. For instance, the `"PRECISION"` can be changed to `"SP" ("Standard Precision")` which is generally more accurate than `"HTVS" ("High Throughput Virtual Screening")` which is only used in this notebook simply because it is much faster. One can change as many or as few parameters as they would like. It is evident that the combinations of parameters leads to a combinatorial explosion of docking configurations. In the event that the user wants to run many `DockStream` runs, it would be cumbersome to keep executing `docker.py`. The utility of the `benchmarking script` is to automate running all `DockStream` jobs so long as the configuration `JSON` is provided. Internally, the script calls `docker.py` and therefore no functionalities are lost in using the benchmarking script.

As the purpose of this notebook is to demonstrate batch execution of `DockStream` runs, the below code block will generate an example `Hybrid with Corina` configuration `JSON` (for more details on `Hybrid`, see demo_Hybrid).

In [None]:
hybrid_corina_conf_json = {
  "docking": {
    "header": {
       "environment":{
        }
      },
    "ligand_preparation": {
      "embedding_pools": [
        {
          "pool_id": "Corina",                                      # Corina is used here as the ligand embedder
          "type": "Corina",                                         # but this can be changed to LigPrep or RDKit
          "parameters": {
              "prefix_execution": "module load corina"
          },
          "input": {
            "standardize_smiles": False,
            "input_path": ligands_path,
            "type": "smi"
           }
        }
      ]
    },
    "docking_runs": [
      {
        "backend": "Hybrid",
        "run_id": "Hybrid",
        "input_pools": ["Corina"],
        "parameters": {
          "prefix_execution": "module load oedocking",
          "parallelization": {
            "number_cores": 2
          },
          "receptor_paths": [hybrid_grid_path]
        },
        "output": {
          "poses": { "poses_path": hybrid_docked_poses_path },            # output path to save docked poses
          "scores": { "scores_path": hybrid_docked_scores_path }          # output path to save docked scores   
        }
      }
    ]
  }
}

with open(os.path.join(benchmarking_conf_jsons, "Hybrid_Corina.json"), "w+") as f:
    json.dump(hybrid_corina_conf_json, f, indent=2)

The above cell block saves the `DockStream` `Hybrid` configuration `JSON` in the benchmarking `JSON` folder. Note that `Hybrid` has much fewer parameters that can be tweaked compared to `Glide`. 

We are now finished generating the `DockStream` configuration `JSONs`. There is no limit to how many configuration `JSONs` are provided; the `benchmarking script` will continue running `DockStream` until all configuration JSONs are executed. For the purpose of this notebook, only the 2 runs specified above (`Glide with LigPrep` and `Hybrid with Corina`) will be executed.

## Step 4: Execute the Benchmarking Script

We are now ready to execute batch `DockStream` runs. Call the `benchmarking script` via command-line and provide the `-input path` argument which is the path to the folder containing all the `DockStream` configuration `JSONs`.

In [None]:
# execute this in a command-line environment after replacing the parameters
!{dockstream_env}/bin/python {benchmarking_script} -input_path {benchmarking_conf_jsons}

As with any `DockStream` run, the docked poses and scores are outputted to `SDF` and `CSV` files as specified in the configuration `JSONs`. As a final note, the `benchmarking script` will output an error message if an invalid path is provided and will also notify the user which `DockStream` run failed with the associated error trace back displayed.