In [1]:
import urllib
from dlpacker import DLPacker

# the lines below hide neverending warnings from biopython
import warnings
from Bio.PDB.PDBExceptions import PDBConstructionWarning
warnings.simplefilter('ignore', PDBConstructionWarning)

<p>For everythig in this notebook to work, make sure you have <b>Tensorflow 2.x</b>, <b>pickle</b> and <b>biopython</b> installed. Also, in addition to everything contained in the repo, you need to download weights and side chain library files from <a href = "https://drive.google.com/file/d/1zOpEltLPpZVYWNDgrAx7mjLLMWiBwkwA/view?usp=sharing">here</a> (around 140Mb).</p>

<p><b>The code cell below just loads PDB file from RCSB to serve as an example. Feel free to replace this with your PDB structure:</b></p>

In [2]:
pdb = '4p0d'
# this line loads an example PDB structure from RCSB
urllib.request.urlretrieve('http://files.rcsb.org/download/' + pdb + '.pdb', pdb + '.pdb')

('4p0d.pdb', <http.client.HTTPMessage at 0x7fa52c9c69a0>)

<p><b>Basic usage example</b></p>
<p>Reconstructing a whole protein structure only takes two lines of code as shown below.</p>
<p>You only need a PDB structure to work with. The structure may or may not contain side chains, existing side chains will be ignored anyway. Just make sure the backbone is present.</p>
<p>As described in the paper, there are three order options:<br/>1. <b>sequence</b> - the simplest strategy that restores all the residues from N- to C-terminus. The fastest of three, but with slighly worse preformance than the other two.<br/>2. <b>natoms</b> - Slightly slower, but better strategy that sorts residues by the number ot backbone atoms around them and restores side chains in the most crowded microenvironments first. Roughly corresponds to first restoring side chains in the protein’s interior and then gradually moving to its exterior.<br/>3. <b>score</b> - the best performing strategy, but also the slowest one - takes twice as much time as the other two and is a two-stage process. First, predictions are made for each amino acid (without actually restoring the side chains) and then residues are sorted by their prediction score,normalized by the number of atoms in a side chain. Side chains are then restored in the second pass.</p>
<p>The average performance difference between the three is not that significant, so choose wisely.</p>

In [5]:
%%time
dlp = DLPacker(pdb + '.pdb')
dlp.reconstruct_protein(order = 'sequence', output_filename = pdb + '_repacked.pdb')

Missing backbone atoms: residue (35, 'A', 'LYS')
Skipping residue: (35, 'A', 'LYS')
Working on residue: 469 (513, 'A', 'LEU') (506, 'A', 'THR')
Writing output file...
Done!
CPU times: user 40.7 s, sys: 1.42 s, total: 42.1 s
Wall time: 41.6 s


<p><b>Mutation example #1</b></p>
<p>Just a bit more complicated usage example. The code below introduces mutations into the original PDB structure and then repacks it. It takes an (unsuccessful) attempt at building a salt bridge inside the structure.</p>

In [6]:
%%time
dlp = DLPacker(pdb + '.pdb')
dlp.mutate_sequence((155, 'A', 'LEU'), 'ARG')
dlp.mutate_sequence((144, 'A', 'LEU'), 'ASP')
dlp.reconstruct_protein(order = 'natoms', output_filename = pdb + '_mutated.pdb')

Mutated (155, 'A', 'LEU') to ARG!
Mutated (144, 'A', 'LEU') to ASP!
Sorting residues...
Working on residue: 469 (473, 'A', 'ASN')
Writing output file...
Done!
CPU times: user 44 s, sys: 848 ms, total: 44.9 s
Wall time: 44.4 s


<p><b>Mutation example #2</b></p>
<p>The code below introduces mutations into the original PDB structure and then repacks only a <b>part</b> of the PDB structure. For this purpose we introduced a helper method `get_targets` that takes as input the amino acid we just mutated and assembles a list of its neighbours that lie within specified radius:</p>

In [7]:
%%time
dlp = DLPacker(pdb + '.pdb')
dlp.mutate_sequence((72, 'A', 'PHE'), 'TRP')
targets = dlp.get_targets(target = (72, 'A', 'TRP'), radius = 10)
print('List of residues targeted for remodelling:\n', targets)

Mutated (72, 'A', 'PHE') to TRP!
List of residues targeted for remodelling:
 [(59, 'A', 'GLU'), (123, 'A', 'GLY'), (58, 'A', 'GLY'), (60, 'A', 'GLY'), (155, 'A', 'LEU'), (167, 'A', 'LEU'), (145, 'A', 'LEU'), (147, 'A', 'ALA'), (126, 'A', 'VAL'), (148, 'A', 'SER'), (61, 'A', 'VAL'), (73, 'A', 'LYS'), (158, 'A', 'LYS'), (105, 'A', 'PHE'), (71, 'A', 'ASN'), (72, 'A', 'TRP'), (57, 'A', 'ILE'), (70, 'A', 'ILE'), (82, 'A', 'GLU'), (62, 'A', 'TYR'), (124, 'A', 'ALA'), (156, 'A', 'VAL'), (157, 'A', 'THR'), (153, 'A', 'GLY'), (56, 'A', 'LYS'), (149, 'A', 'TYR'), (154, 'A', 'ASN'), (127, 'A', 'TYR'), (74, 'A', 'TYR'), (80, 'A', 'LEU'), (69, 'A', 'PHE'), (75, 'A', 'ALA'), (125, 'A', 'SER'), (146, 'A', 'ALA'), (150, 'A', 'ASN')]
CPU times: user 464 ms, sys: 354 ms, total: 818 ms
Wall time: 812 ms


<p>The line below calls a function called `reconstruct_region` that takes a list of residues to remodel as an additional argument:</p>

In [8]:
dlp.reconstruct_region(targets = targets, order = 'sequence', output_filename = pdb + '_region.pdb')

Working on residue: 34 (167, 'A', 'LEU')
Writing output file...
Done!


<p><b>Mutation example #3</b></p>
<p>Sometimes you might need to just mutate a single residue (e.g. on the surface) without remodelling any other parts of the protein. The code below does just this:</p>

In [9]:
%%time
dlp = DLPacker(pdb + '.pdb')
dlp.mutate_residue((156, 'A', 'VAL'), 'TYR')
dlp.save_structure(pdb + '_single.pdb')

Mutated (156, 'A', 'VAL') to TYR!
CPU times: user 472 ms, sys: 254 ms, total: 726 ms
Wall time: 725 ms


<p>Note the difference between method `mutate_sequence` above and `mutate_residue` here: the first one literally just changes the sequence and nothing else, while the second one mutates the sequence and does side chain restoration for the target residue.</p>