# Intro

This notebook will show you how to dock and score molecules using the asapdiscovery-docking module. 

This docking pipeline primarily focuses on the use-case for a structure-enabled drug discovery program, in which we have crystal structures of early molecules to use for *reference-based* docking. 

To this end, we have implemented an api that wraps the OpenEye POSIT docking algorithm, which through it's use of the HYBRID and SHAPEFIT algorithms, enables reference-based docking. 

## The scope of this guide

This guide will show you how to dock and score molecules. For the *extremely* necessary precursor step of data loading and prepping, please see [protein_and_ligand_prep](%protein_and_ligand_prep.ipynb)

# Data

We will use files we use for testing. Since these molecules have already been prepped for docking

In [16]:
from asapdiscovery.data.testing.test_resources import fetch_test_file
from asapdiscovery.data.schema.complex import PreppedComplex
from asapdiscovery.data.schema.ligand import Ligand
prepped_complex = PreppedComplex.from_oedu_file(
        fetch_test_file("Mpro-P2660_0A_bound-prepped_receptor.oedu"),
        ligand_kwargs={"compound_name": "test"},
        target_kwargs={"target_name": "test", "target_hash": "mock_hash"},
    )
ligand = Ligand.from_sdf(
        fetch_test_file("Mpro-P0008_0A_ERI-UCB-ce40166b-17.sdf"), compound_name="test"
    )

# Docking

As with any scientific endeavour, it's important to consider why you want to run docking and what you expect to get out of this.

| Context           | Goal                                                                     | Considerations                                                                                                                                                                                 |
|-------------------|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Hit-to-lead       | Reference-based docking of 100s-1000s of molecules                       | High-throughput, low false-positives                                                                                                                                                           |
| Lead-Optimization | Reference-based docking of 10s-100s of molecules                         | High accuracy, low false-positives. You probably know generally where the molecules should bind, and the maximum common substructures of the molecules should probably be very closely aligned |
| Research          | Generate protein-ligand complexes for downstream analyses or ML training | High-throughput but you can live with in-accurate poses because                                                                                                                                |


## Examining the POSITDocker

There are a *ton* of choices we can make for running docking, which will not be enumerated here. But in order to get a flavor for the options, we can examine the class attributes of the POSITDocker:

In [37]:
from asapdiscovery.docking.openeye import POSITDocker

In [38]:
docker = POSITDocker()

In [39]:
docker.dict()

{'type': 'POSITDocker',
 'relax': <POSIT_RELAX_MODE.NONE: 0>,
 'posit_method': <POSIT_METHOD.ALL: 15>,
 'use_omega': True,
 'omega_dense': False,
 'num_poses': 1,
 'allow_low_posit_prob': False,
 'low_posit_prob_thresh': 0.1,
 'allow_final_clash': False,
 'allow_retries': True}

We can also look at the `.dock` method to see what arguments we can make 

In [40]:
docker.dock?

We can see that we need:
1) a list of DockingInputBase objects
2) an output directory
3) and some dask options

Currently, we have 2 kinds of DockingInputBase objects implemented:
1) a complex-ligand pair (DockingInputPair)
2) a one-to-many ligand:complexes object (DockingInputMultiStructure)

In [43]:
from asapdiscovery.docking.docking import DockingInputPair, DockingInputMultiStructure

## Running simple docking 

### First we generate docking input

In [46]:
input_pair = DockingInputPair(ligand=ligand, complex=prepped_complex)

In [48]:
docker = POSITDocker() # let's just use defaults for now

In [53]:
results = docker.dock([input_pair]) # we won't use dask or write an output, takes ~30 s on a Macbook Pro

This returns a list of POSITDockingResults objects!

In [61]:
result = results[0]

In [62]:
result.write_docking_files("docking_test")

# Scoring

# A few side notes

## Dask

We make heavy use of Dask throughout our code, which helps automate parallel processing and provides a nice dashboard for evaluating the progress of large scale docking efforts. Due to the way in which Dask automates error handling, this has occasionally led to situations where the behaviour of our code is different depending on whether you have enabled Dask. We have tried to stamp out any instances of this, but if you find another, please make an issue!

## Target-specific workflows

We have implemented our library code within the `asapdiscovery-workflows` module, which puts everything together in a command-line interface (cli). Unfortunately, as of version 0.4, these workflows only work if you are using the targets specified for ASAP. We plan on changing this for version 0.5 

To find out which targets can be passed to these workflows, you can use this:

In [11]:
from asapdiscovery.data.services.postera.manifold_data_validation import TargetTags

In [12]:
TargetTags.get_values()

['DENV-NS2B-NS3pro',
 'SARS-CoV-2-Mpro',
 'ZIKV-NS2B-NS3pro',
 'MERS-CoV-Mpro',
 'EV-D68-Capsid',
 'EV-D68-3Cpro',
 'SARS-CoV-2-N-protein',
 'SARS-CoV-2-Mac1',
 'EV-A71-3Cpro',
 'EV-A71-Capsid']