# PockDock

This notebook demonstrates how a variety of different tools can be glued together into an efficient and flexible workflow using **Crossflow**.

Some basic understanding of **crossflow** is assumed, e.g. that you have completed either the **Amber** or **Gromacs** example workflow.

The workflow downloads a protein-ligand complex form the PDB, runs fpocket, then docks the ligand back into the biggest pocket found. Then it calculates the error between the crystal structure coordinates of the ligand and those of each docking pose, before and after least-squares fitting.

The notebook requires you to have versions of **fpocket**, **autodock tools** and **autodock vina** installed locally. If you have **Docker** installed, you can use [pinda](https://bitbucket.org/claughton/pinda/src/master/) to do this.

In [None]:
from crossflow import filehandling, kernels, clients
import sys
from urllib.request import urlretrieve
import numpy as np
import mdtraj as mdt

Create a crossflow client, connected to a local pool of workers:

In [None]:
client = clients.Client(local=True)
client.client

Make the kernels for **fpocket** and **Vina**, and functions to convert between file formats.

In [None]:
# The fpocket kernel:
fpocket = kernels.SubprocessKernel('fpocket -f x.pdb')
fpocket.set_inputs(['x.pdb'])
fpocket.set_outputs(['x_out/x_out.pdb'])

In [None]:
# The vina kernel:
vina = kernels.SubprocessKernel('vina --receptor r.pdbqt --ligand l.pdbqt --out out.pdbqt --log dock.log'
                                 ' --center_x {xc} --center_y {yc} --center_z {zc}'
                                 ' --size_x {sx} --size_y {sy} --size_z {sz}')
vina.set_inputs(['r.pdbqt', 'l.pdbqt', 'xc', 'yc', 'zc', 'sx', 'sy', 'sz'])
vina.set_outputs(['out.pdbqt', 'dock.log'])

In [None]:
# AutoDock Tool based kernels to prepare receptor and ligand for docking:
prep_receptor = kernels.SubprocessKernel('adt prepare_receptor4.py -r x.pdb -o x.pdbqt')
prep_receptor.set_inputs(['x.pdb'])
prep_receptor.set_outputs(['x.pdbqt'])

prep_ligand = kernels.SubprocessKernel('adt prepare_ligand4.py -l x.pdb -o x.pdbqt')
prep_ligand.set_inputs(['x.pdb'])
prep_ligand.set_outputs(['x.pdbqt'])

In [None]:
# A FunctionKernel to convert pdbqt files back to pdb ones, because the OpenBabel
#  command to do this seems to be broken...
def pdbqt2pdb(infile):
    outfile = 'tmp.pdb'
    fout = open(outfile, 'w')
    with open(infile, 'r') as fin:
        for line in fin:
            if line[1:6] in 'ATOM  MODEL ENDMDL':
                fout.write(line)       
    fout.close()
    return 'tmp.pdb'

pdbqt_to_pdb = kernels.FunctionKernel(pdbqt2pdb)
pdbqt_to_pdb.set_inputs(['infile'])
pdbqt_to_pdb.set_outputs(['outfile'])

Now we construct the workflow. For convenience it's split up here into sections.

In [None]:
# Download the pdb file, and split into receptor and ligand:
pdb_file = '1qy1.pdb'
ligand_residue_name = 'PRZ'
path = urlretrieve('http://files.rcsb.org/download/' + pdb_file, pdb_file)
# For Python 2 replace the line above with:
#path = urlretrieve('http://files.rcsb.org/download/' + pdb_file, pdb_file)
hydrated_complex = mdt.load(pdb_file)
receptor_atoms = hydrated_complex.topology.select('protein')
ligand_atoms = hydrated_complex.topology.select('resname {}'.format(ligand_residue_name))
receptor = mdt.load(pdb_file, atom_indices=receptor_atoms)
ligand = mdt.load(pdb_file, atom_indices=ligand_atoms)

In [None]:
# Run fpocket:
pockets = client.submit(fpocket, receptor)

In [None]:
# Find the centre and extents of the largest pocket:
buffer = 2.0
t = mdt.load(str(pockets.result()))
site = t.topology.select('resname STP and residue 1') # This should be the largest pocket
# In the next two lines, the factor of 10 is a conversion from nanometres to Angstroms:
xc, yc, zc = tuple(10 * (t.xyz[0][site].min(axis=0) + t.xyz[0][site].max(axis=0)) / 2)
sx, sy, sz = tuple(10 * (t.xyz[0][site].max(axis=0) - t.xyz[0][site].min(axis=0)) + buffer)
print(xc, yc, zc)
print(sx, sy, sz)

In [None]:
# Prepare receptor and ligand for docking:
receptor_qt = client.submit(prep_receptor, receptor)
ligand_qt = client.submit(prep_ligand, ligand)

In [None]:
# Run vina:
docks, logfile = client.submit(vina, receptor_qt, ligand_qt, xc, yc, zc, sx, sy, sz)

In [None]:
# Check the log file:
with open(logfile.result()) as f:
    lines = f.read()
print(lines)

In [None]:
# Convert the docked poses back to PDB format, and calculate unfitted and fitted rmsds using MDTraj:
pdbout = client.submit(pdbqt_to_pdb, docks)
docktraj = mdt.load(str(pdbout.result()))
dxyz = docktraj.xyz - ligand.xyz
dxyz = (dxyz * dxyz).sum(axis=2).mean(axis=1)
rmsd = mdt.rmsd(docktraj, ligand) * 10.0
err = np.sqrt(dxyz) * 10.0
print('Mode Fitted   Unfitted')
print('      rmsd      rmsd')
for mode in range(9):
    print('{:3d}   {:5.3f}    {:6.3f}'.format(mode+1, rmsd[mode], err[mode]))