# Advanced features
Some complex features are available for multiple (or all) docking backends. This notebook describes how to use them and gives some background information on them.

## Write-out details
For writing out the poses (as `SDF`) and scores (as `CSV`), the user can set the parameter `overwrite` to `false` (note that in `JSON`, there is no capital 'F'), which will cause a check whether the file specified already exists. If so, the first free numbering is attached as a prefix, e.g. "0000_" (or, if this is already taken as well, "0001_").

```
"output":
{
  "poses": { "poses_path": <path_to_poses>, "overwrite": false},
  "scores": { "scores_path": <path_to_scores>, "overwrite": false}
}
```

Another option for `poses` is `mode`, which can be set to `best_per_enumeration`: even when `N` poses are calculated by the backend, only the top-scoring one is saved (per enumeration, see below).

```
"output":
{
  "poses": { "poses_path": <path_to_poses>, "mode": "best_per_enumeration"}
}
```

In order to make it easier to match the `CSV` and `SDF` outputs, the following to tags are added to the poses:
* `original_smiles`: The `SMILES` that was used as input
* `smiles`: The actual `SMILES` used for embedding and docking. If no enumerations were calculated, this is always identical to `original_smiles`.

## Compound naming scheme
While it is possible to retain molecules' names throughout the process (when using `CSV` or `SDF` input, see respective notebook), the following scheme is used internally to uniquely identify any compound:

`<ligand_id>:<enumeration_id>:<conformer_id>` (all starting with '0').

The ligand ID refers to the input number, e.g. the first compound in a `smi` file will have '0', the second '1' and so on. If there are no enumerations (see below), there will be only one per ligand (i.e. all will have the enumeration ID '0'). The conformer ID refers to the pose number. The poses are order from "best" to "worst" and their number can be specified in the configuration file.

![](img/naming_scheme.png)

## Logging
As described in the notebook on the command-line scripts, you may want to set your own logging configuration (also check on the `-debug` flag available). See folder `config/logging` for examples on different "levels". By default, the output logging file is called "dockstream_run.log", but you can set it in the header region of configuration files as such:

```
"header":
{
 "logging": {
   "logfile": <path_to_log_file>
  }
}
```

## Prefix execution
For external programs, it might be necessary to set specific steps (e.g. loading set modules) prior to using them. In those cases, the optional parameter `prefix_execution` is joined to the binary call with `&&`. For example, if `rDock` is available in the module "rDock", you can load it like this at the docking stage:

```
...
  {
    "backend": "rDock",
    "run_id": "rDock",
    ...
    "parameters": {
      "prefix_execution": "module load rDock",
      ...
    },
    ...
  }
```

If this is omitted, the binaries should be available in your `PATH` variable. You can explicitly export environment variables via the respective block in the header region:

```
...
  "header": {
    "environment": {
      "export": [{
        "key": "OE_LICENSE", "value": "/home/user/oelicense/1.0/oe_license.seq1"
      }]
    }
  }
...
```

## Tautomer and protonation enumeration: `TautEnum`
Some molecules have different protonation and tautomeric states. `DockStream` allows you to use [`TautEnum`](https://github.com/OpenEye-Contrib/TautEnum) to enumerate all "meaningful" states, embed and dock them all and then report the individual scores and poses for them. The following optional block should be inserted into the `input` block of the `ligand embedding`:

```
...
  "ligand_preparation": {
    "embedding_pools": [
    {
      "input": {
        "use_taut_enum": {
          "prefix_execution": "module load taut_enum",
          "enumerate_protonation": true
        },
        ...
    }]
  }
...
```
Use `enumerate_protonation` (a `boolean` flag) to specify whether you want to do just tautomerizations or all protonations as well.

![](img/enumeration.png)

## `SMIRKS`
Sometimes, we want to enforce a certain state on a (sub-)molecule. `DockStream` offers the possibility to use `transformations` in the embedding pools (see example below). At the moment, only `OpenEye`'s implementation of `SMIRKS` are supported.

```
...
  "input": {
    "transformations": 
      [{
        "type": "smirks",
        "backend": "OpenEye",
        "smirks": "[c:1]1[n:2][c:3]2[c:4]([n:5][c:6][n;X2:7][c:8]2[n:9]1)>>[c:1]1[n:2][c:3]2[c:4]([n:5][c:6][n+:7>([H])[c:8]2[n:9]1)",
        "fail_action": "keep"
      }]
  }
...
```

![](img/SMIRKS.png)