# Fast & Accurate PDB Prediction in Python Only

Having the ability to use AlphaFold2, ESM, and other recent structural modeling NNs is great, but what if you don't want to leave Python; don't want to spin up a GPU; want to avoid conterization; or need to massively scale out your PDB file prediction / creation?

You can predict a PDB file for proteins up to 1024 in length using the highly accurate ESMFold, scaled out and pre-loaded into memory on BioLM.ai. The API docs show an [example protein and PDB string response](https://api.biolm.ai/#57b44720-556f-4a75-996d-2c05ea56ae6d).

In [1]:
import os, sys
import py3Dmol

# Add Python module with utility functions for interacting with BioLM API
repo_root = os.path.join(os.getcwd(), '..', '..')
src_dir = os.path.join(repo_root, 'src')
if src_dir not in sys.path:
    sys.path.append(src_dir)
    
import biolm_util

In [2]:
seq = "MAETAVINHKKRKNSPRIVQSNDLTEAAYSLSRDQKRMLYLFVDQIRKSDGTLQEHDGICEIHVAKYAEIFGLTSAEASKDIRQALKSFAGKEVVFYRPEEDAGDEKGYESFPWFIKRAHSPSRGLYSVHINPYLIPFFIGLQNRFTQFRLSETKEITNPYAMRLYESLCQYRKPDGSGIVSLKIDWIIERYQLPQSYQRMPDFRRRFLQVCVNEINSRTPMRLSYIEKKKGRQTTHIVFSFRDITSMTTG"

print("Sequence length: {}".format(len(seq)))

Sequence length: 251


Make API request:

In [4]:
tok = biolm_util.get_api_token()

os.environ['BIOLM_ACCESS'] = tok['access']
os.environ['BIOLM_REFRESH'] = tok['refresh']

There is already a server on BioLM with ESMFold loaded into memory, so predictions will be nearly instant.

In [5]:
%%time

pdb_pred = biolm_util.esmfold_pdb(seq)

CPU times: user 19.2 ms, sys: 5.27 ms, total: 24.5 ms
Wall time: 6.06 s


If the model was starting cold, there would be an initial wait time of several minutese to load this large model into memory, after which subsequent API requests would respond normally, without delay. This is what is known as a model cold-start time. It is generally not very noticeable, except in this case since ESMFold is one of the largest protein models to date.

## Visualize Structure in 3D

We have the PDB file contents as a string. We can use it directly to visualize the structure.

In [6]:
# Check out file contents first
pdb_pred

'PARENT N/A\nATOM      1  N   MET A   1     -23.875  24.253 -12.341  1.00  0.94           N  \nATOM      2  CA  MET A   1     -22.469  24.648 -12.366  1.00  0.95           C  \nATOM      3  C   MET A   1     -21.661  23.849 -11.349  1.00  0.93           C  \nATOM      4  CB  MET A   1     -22.326  26.145 -12.088  1.00  0.94           C  \nATOM      5  O   MET A   1     -21.786  24.065 -10.142  1.00  0.84           O  \nATOM      6  CG  MET A   1     -22.895  27.030 -13.185  1.00  0.91           C  \nATOM      7  SD  MET A   1     -22.682  28.817 -12.829  1.00  0.94           S  \nATOM      8  CE  MET A   1     -24.263  29.171 -12.013  1.00  0.92           C  \nATOM      9  N   ALA A   2     -20.869  22.879 -11.773  1.00  0.91           N  \nATOM     10  CA  ALA A   2     -20.005  22.105 -10.886  1.00  0.92           C  \nATOM     11  C   ALA A   2     -18.715  22.861 -10.580  1.00  0.90           C  \nATOM     12  CB  ALA A   2     -19.687  20.746 -11.504  1.00  0.89           C  \nATO

In [7]:
view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js', width=800, height=400)
view.addModel("".join(pdb_pred), 'pdb')
view.setStyle({'model': -1}, {"cartoon": {'color': 'spectrum'}})

<py3Dmol.view at 0x1041ddb40>