## Icolos Docking Workflow Demo
Icolos can perform automated docking, with support for advanced features such as ensemble docking and pose rescoring. 

In this notebook, we demonstrate an ensemble docking workflow in which the ligands are docked into multiple related receptor structures, and the scores aggretated afterwards. In this implementation, we use RDKit for embedding, and AutoDock Vina as a docking backend.

Files required to execute the workflow are provided in the accompanying IcolosData repository, available at https://github.com/MolecularAI/IcolosData.

Note, we provide an `icoloscommunity` environment which should be used for this notebook.  It contains the jupyter dependencies in addition to the Icolos production environment requirements, allowing you to execute workflows from within the notebook.  

### Step 1: Prepare input files
The following files are required to start the docking run
* receptor files (prepared in pdbqt format)
* smiles strings for the compounds to dock, in `.smi` or `.csv` format
* Icolos config file: a `JSON` file containing the run settings.  Templates for the most common workflows can be found in the `examples` folder of main Icolos repository.

In [1]:
import os
import json
import subprocess
import pandas as pd

# set up some file paths to use the provided test data
# please ammend as appropriate
icolos_path = "~/Icolos"
data_dir = "~/IcolosData"
output_dir = "../output"
config_dir = "../config/docking"
for path in [output_dir, config_dir]:
    if not os.path.isdir(path):
        os.makedirs(path)
receptor_path = os.path.expanduser(os.path.join(data_dir, "AutoDockVina/1UYD_fixed.pdbqt"))



In [8]:
conf={
    "workflow": {
        "header": {
            "workflow_id": "AutoDock Vina docking",
            "description": "Runs docking using AutoDock Vina and a predefined receptor file.",
            "environment": {
                "export": [
                ]
            },
            "global_variables": {
                "smiles": "another_mol:Nc1ccc(cc1N)C(F)(F)F;failure:CXXC;aspirin:O=C(C)Oc1ccccc1C(=O)O",
                "receptor_path": receptor_path
            }
        },
        "steps": [{
                "step_id": "rdkit_embedding",
                "type": "embedding",
                "settings": {
                    "arguments": {
                        "flags": ["-epik"],
                        "parameters": {
                            "protonate": True,
                            "method": "rdkit"
                        }
                    },
                    "additional": {
                    }
                },
                "input": {
                    "compounds": [{
                            "source": "{smiles}",
                            "source_type": "string"
                        }
                    ]
                }
            }, {
                "step_id": "ADV_receptor_1",
                "type": "vina_docking",
                "execution": {
                    "prefix_execution": "module load foss/2019a && ml AutoDock_Vina",
                    "parallelization": {
                        "cores": 4
                    },
                    "failure_policy": {
                        "n_tries": 3
                    }
                },
                "settings": {
                    "arguments": {
                        "flags": [],
                        "parameters": {
                        }
                    },
                    "additional": {
                        "configuration": {
                            "seed": 42,
                            "receptor_path": receptor_path,
                            "number_poses": 2,
                            "search_space": {
                                "--center_x": 3.3,
                                "--center_y": 11.5,
                                "--center_z": 24.8,
                                "--size_x": 15,
                                "--size_y": 10,
                                "--size_z": 10
                            }
                        },
                        "grid_ids": ["1UYD_1"]
                    }
                },
                "input": {
                    "compounds": [{
                            "source": "rdkit_embedding",
                            "source_type": "step"
                        }
                    ]
                }
            },
            {
                "step_id": "ADV_receptor_2",
                "type": "vina_docking",
                "execution": {
                    "prefix_execution": "module load foss/2019a && ml AutoDock_Vina",
                    "parallelization": {
                        "cores": 4
                    },
                    "failure_policy": {
                        "n_tries": 3
                    }
                },
                "settings": {
                    "arguments": {
                        "flags": [],
                        "parameters": {
                        }
                    },
                    "additional": {
                        "configuration": {
                            "seed": 42,
                            "receptor_path": receptor_path,
                            "number_poses": 2,
                            "search_space": {
                                "--center_x": 3.3,
                                "--center_y": 11.5,
                                "--center_z": 24.8,
                                "--size_x": 15,
                                "--size_y": 10,
                                "--size_z": 10
                            }
                        },
                        "grid_ids": ["1UYD_2"]
                    }
                },
                "input": {
                    "compounds": [{
                            "source": "rdkit_embedding",
                            "source_type": "step"
                        }
                    ]
                }
            },
            {
                "step_id": "data_manipulation",
                "type": "data_manipulation",
                "settings": {
                    "additional": {
                        "action": "no_action"
                    }
                },
                "input": {
                    "compounds": [{
                            "source": "ADV_receptor_1",
                            "source_type": "step"
                        }, {
                            "source": "ADV_receptor_2",
                            "source_type": "step"
                        }
                    ],
                    "merge": {
                        "compounds": True,
                        "merge_compounds_by": "id",
                        "enumerations": True,
                        "merge_enumerations_by": "id"
                    }
                },
                "writeout": [{
                        "compounds": {
                            "category": "conformers",
                            "selected_tags": ["docking_score"],
                            "aggregation": {
                                "mode": "best_per_compound",
                                "key": "docking_score",
                                "highest_is_best": False
                            }
                        },
                        "destination": {
                            "resource": os.path.join(output_dir, "ensemble_docking_adv.json"),
                            "type": "file",
                            "format": "JSON"
                        }
                    }, {
                        "compounds": {
                            "category": "conformers",
                            "selected_tags": ["docking_score", "grid_id"]
                        },
                        "destination": {
                            "resource": os.path.join(output_dir, "adv_ensemble_dock_results.csv"),
                            "type": "file",
                            "format": "CSV"
                        }
                    }
                ]
            }
        ]
    }
}


with open(os.path.join(config_dir, "adv_docking_conf.json"), 'w') as f:
    json.dump(conf, f, indent=4)

The workflow can be executed by running the following command (with paths ammended as necessary), in a terminal. 

In [12]:
# this run will take a few seconds to complete
docking_conf = os.path.join(config_dir, "adv_docking_conf.json")

command = f"icolos -conf {docking_conf}"
subprocess.run(command, shell=True)

We will briefly inspect the results files

In [11]:
results = pd.read_csv(os.path.join(output_dir, "adv_ensemble_dock_results.csv"))
results.head()

Unnamed: 0,_Name,compound_name,grid_id,docking_score
0,0:0:0,0,1UYD_1,-7.0
1,0:0:1,0,1UYD_1,-5.9
2,0:0:2,0,1UYD_2,-7.0
3,0:0:3,0,1UYD_2,-5.9
4,2:0:0,2,1UYD_1,-7.4
