# Reverse Docking

This notebook demonstrates how a variety of different tools can be glued together into an efficient and flexible workflow using **Crossflow**.

The workflow docks a ligand (PRZ) to a set of protein structures (taken from the cryptosite database). For each protein structure, fpocket is used to identify all the pockets, and then docking of the ligand is attempted into each of them (run in parallel over the workers you have available in your cluster).

The notebook requires you to have versions of **fpocket**, **autodock tools** and **autodock vina** installed on the worker node(s) of your dask cluster. 

This cluster may either be one you have created already (e.g. across a collection of local workstations, cloud resources, or nodes on an HPC service) which is identifiable via the file "scheduler.json" or equivalent (created when dask-scheduler is started with the `--scheduler_file` option), or it can be a Dask LocalCluster launched on the the current machine within this Notebook. Obviously the first option will give much better performance.

In [1]:
from distributed import LocalCluster
from crossflow import filehandling, tasks, clients
import numpy as np
import mdtraj as mdt

Create a crossflow client, connected to a pool of workers:

In [2]:
scheduler_file = None # if you have created a cluster externally, replace with the path to yours
if scheduler_file is None:
    cluster = LocalCluster(resources={'tasks':1}) # the 'tasks' resource ensures only one concurrent docking job per worker
    client = clients.Client(cluster)
else:
    client = clients.Client(scheduler_file=scheduler_file)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 5
Total threads: 10,Total memory: 16.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:54848,Workers: 5
Dashboard: http://127.0.0.1:8787/status,Total threads: 10
Started: Just now,Total memory: 16.00 GiB

0,1
Comm: tcp://127.0.0.1:54862,Total threads: 2
Dashboard: http://127.0.0.1:54866/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:54851,
Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-6pmp66f1,Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-6pmp66f1

0,1
Comm: tcp://127.0.0.1:54863,Total threads: 2
Dashboard: http://127.0.0.1:54870/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:54852,
Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-g9n1gyas,Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-g9n1gyas

0,1
Comm: tcp://127.0.0.1:54864,Total threads: 2
Dashboard: http://127.0.0.1:54872/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:54853,
Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-4oo9a3vk,Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-4oo9a3vk

0,1
Comm: tcp://127.0.0.1:54861,Total threads: 2
Dashboard: http://127.0.0.1:54867/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:54854,
Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-o2od67zi,Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-o2od67zi

0,1
Comm: tcp://127.0.0.1:54865,Total threads: 2
Dashboard: http://127.0.0.1:54873/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:54855,
Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-6z6t_1iy,Local directory: /var/folders/v0/vzwwlsr12vvbmyxvhlrhgpf80000gp/T/dask-scratch-space/worker-6z6t_1iy


Make the SubprocessTasks for **fpocket** and **Vina**, and FunctionTasks for other tasks:

In [3]:
# The fpocket task:
fpocket = tasks.SubprocessTask('fpocket -f x.pdb')
fpocket.set_inputs(['x.pdb'])
fpocket.set_outputs(['x_out/x_out.pdb'])

In [9]:
# The vina task:
vina = tasks.SubprocessTask('vina --receptor r.pdbqt --ligand l.pdbqt --out out.pdbqt'
                                 ' --center_x {xc} --center_y {yc} --center_z {zc}'
                                 ' --size_x {sx} --size_y {sy} --size_z {sz} > dock.log')
vina.set_inputs(['r.pdbqt', 'l.pdbqt', 'xc', 'yc', 'zc', 'sx', 'sy', 'sz'])
vina.set_outputs(['out.pdbqt', 'dock.log'])

In [5]:
# AutoDock Tool based tasks to prepare ligand for docking:
prep_ligand = tasks.SubprocessTask('adt prepare_ligand4.py -l x.pdb -o x.pdbqt')
prep_ligand.set_inputs(['x.pdb'])
prep_ligand.set_outputs(['x.pdbqt'])

In [6]:
def _get_bounding_boxes(pockets):
    '''
    A Function to find the centre and extents of each of the pockets found by fpocket
    
    Args:
        pockets (str): Name of the pdb format file produced by fpocket
        
    Returns:
        list of tuples: the pocket centres and extents in x/y/z - in Angstroms
    '''
    buffer = 2.0
    t = mdt.load(pockets)
    n_pockets = len([r for r in t.topology.residues if r.name == 'STP'])
    bounding_boxes = []
    
    for ip in range(n_pockets):
        site = t.topology.select('resname STP and residue {}'.format(ip + 1))
        # In the next two lines, the factor of 10 is a conversion from nanometres to Angstroms:
        xc, yc, zc = tuple(10 * (t.xyz[0][site].min(axis=0) + t.xyz[0][site].max(axis=0)) / 2)
        sx, sy, sz = tuple(10 * (t.xyz[0][site].max(axis=0) - t.xyz[0][site].min(axis=0)) + buffer)
        bounding_boxes.append((xc, yc, zc, sx, sy, sz))
    return bounding_boxes

# Now make a FunctionTask for this:
get_bounding_boxes = tasks.FunctionTask(_get_bounding_boxes)
get_bounding_boxes.set_inputs(['pockets'])
get_bounding_boxes.set_outputs(['bounding_boxes'])

In [7]:
def best_affinity(logfiles):
    '''
    Search a set of docking log files and return the affinity of the top-ranked pose
    '''
    best_a = 0.0
    for logfile in logfiles:
        for line in logfile.result().read_text().split('\n'):
            if '   1   ' in line:
                best_a = min(float(line.split()[1]), best_a)
    return best_a

Here is the workflow. Each protein target is studied serially, but the dockings to the pockets in each are done in parallel (as much as available resources allow)

In [10]:
fh = filehandling.FileHandler()
ligand = fh.load('prz.pdb')
# prepare ligand for docking
ligand_qt = client.submit(prep_ligand, ligand)
vina.set_constant('l.pdbqt', ligand_qt) # as this never changes, make a constant

with open('complexes.list') as f:
    receptors = f.readlines()
pdbcodes = [r[:4].lower() for r in receptors]


for pdbcode in pdbcodes:
    
    receptor_qt = fh.load(f'./receptors/{pdbcode}_receptor.pdbqt')
    
    # Run fpocket:
    pockets = client.submit(fpocket, receptor_qt)

    # Find the dimensions of each pocket
    bounding_boxes = client.submit(get_bounding_boxes, pockets)

    # Run vina on all potential pockets:
    docks = []
    logfiles = []
    for bounding_box in bounding_boxes.result():
        dock, logfile = client.submit(vina, receptor_qt, *bounding_box, resources={'tasks': 1})
        docks.append(dock)
        logfiles.append(logfile)

    # Look through the log files for each pocket to find the best:
    print(f'{pdbcode}: testing {len(docks)} pockets...')
    best_a = best_affinity(logfiles)
    print(f'{pdbcode}: best docking score: {best_a}')
    

2iuz: testing 18 pockets...
2iuz: best docking score: -5.236
1yv3: testing 57 pockets...
1yv3: best docking score: -6.441
2jds: testing 20 pockets...
2jds: best docking score: -5.35
1lic: testing 5 pockets...
1lic: best docking score: -4.77
2hka: testing 8 pockets...
2hka: best docking score: -5.465
3eks: testing 18 pockets...
3eks: best docking score: -5.339
1nx3: testing 16 pockets...
1nx3: best docking score: -4.769
2yqs: testing 24 pockets...
2yqs: best docking score: -4.963
2wi7: testing 12 pockets...
2wi7: best docking score: -5.327
1br6: testing 19 pockets...
1br6: best docking score: -5.206
1j6z: testing 18 pockets...
1j6z: best docking score: -5.754
1tr5: testing 5 pockets...
1tr5: best docking score: -5.356
1fqc: testing 19 pockets...
1fqc: best docking score: -5.308
2hvd: testing 16 pockets...
2hvd: best docking score: -4.549
3f82: testing 21 pockets...
3f82: best docking score: -5.232
1g67: testing 13 pockets...
1g67: best docking score: -5.113
3bqm: testing 12 pockets...
3



1ghy: testing 16 pockets...
1ghy: best docking score: -5.016
1oke: testing 26 pockets...
1oke: best docking score: -4.835
3hzt: testing 28 pockets...
3hzt: best docking score: -5.252
2egh: testing 27 pockets...
2egh: best docking score: -5.564
3ip0: testing 9 pockets...
3ip0: best docking score: -6.35
2bu2: testing 26 pockets...
2bu2: best docking score: -5.814
2ohv: testing 13 pockets...
2ohv: best docking score: -5.127
3gqz: testing 19 pockets...
3gqz: best docking score: -5.159
2w5k: testing 8 pockets...
2w5k: best docking score: -4.001
1cib: testing 22 pockets...
1cib: best docking score: -5.325
2ofp: testing 14 pockets...
2ofp: best docking score: -4.664
2ixu: testing 27 pockets...
2ixu: best docking score: -4.398
3bl7: testing 23 pockets...
3bl7: best docking score: -4.449
1ryo: testing 23 pockets...
1ryo: best docking score: -5.768
2q8h: testing 30 pockets...
2q8h: best docking score: -5.004
2iyq: testing 11 pockets...
2iyq: best docking score: -5.704
1gky: testing 10 pockets...



3fqk: testing 36 pockets...
3fqk: best docking score: -5.293
2gz7: testing 19 pockets...
2gz7: best docking score: -4.533
2v57: testing 9 pockets...
2v57: best docking score: -5.118
1cib: testing 22 pockets...
1cib: best docking score: -5.355
1l5s: testing 53 pockets...
1l5s: best docking score: -5.489
3ixj: testing 28 pockets...
3ixj: best docking score: -5.247
2al4: testing 15 pockets...
2al4: best docking score: -4.255
3hqp: testing 30 pockets...
3hqp: best docking score: -5.724
1u1d: testing 19 pockets...
1u1d: best docking score: -4.866
2oo8: testing 17 pockets...
2oo8: best docking score: -5.086
1q0b: testing 28 pockets...
1q0b: best docking score: -5.473
2npq: testing 27 pockets...
2npq: best docking score: -5.302
2bys: testing 13 pockets...
2bys: best docking score: -6.191
2gir: testing 44 pockets...
2gir: best docking score: -5.032
3cfn: testing 7 pockets...
3cfn: best docking score: -4.205
1afq: testing 5 pockets...
1afq: best docking score: -4.919
1ow3: testing 11 pockets...