# Biomolecules

Everything that you have seen up until now remains the same -- but now we are moving into much larger molecules. Containing several hundreds of atoms for the small ones and close to 60,000 for the largest one in our examples. This is where a quick way of looking at movement is important. Understanding the movement of biomolecules helps us understand how to behave inside of us -- and is the basis for designing new medicines that inhibit or enhance certain movements, interactions, etc.

All of the structures in the examples come from the Protein Data Bank (PDB; that's the reason for the .pdb extension). You can access it online at http://www.rcsb.org ; it's free and it has all human knowledge on biomolecules (except for some pharma knowledge). By the end of the exercises you can just look for your favorite molecule and analyze it! 

What we have prepared for you are some represantive examples: DNA, protein and large protein complexes. For the larger protein complex the calculation gets cumbersome, so you can just use the .nmd file we provide.

# DNA

In [2]:
import prody

In [3]:
# The same parsePDB command as before can be used to access
# structures from the PDB. They all have a 4 letter code.
pdb = prody.parsePDB('1BNA')
pdb

<AtomGroup: 1BNA (566 atoms)>

In [6]:
# For large biomolecules we usually  get more molecules than we expect:
# other small molecules, ions or water. Let's look at what we have.
pdb.getResnames()

array(['DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC',
       'DC', 'DC', 'DC', 'DC', 'DC', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DG', 'DG', 'DG', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC',
       'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC',
       'DC', 'DC', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT',
       'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT',
       'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'D

In [8]:
# We are not intersted in the water, so let's just select the DNA:
dna = pdb.select('not water')
dna.getResnames()

array(['DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC',
       'DC', 'DC', 'DC', 'DC', 'DC', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DG', 'DG', 'DG', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC',
       'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC', 'DC',
       'DC', 'DC', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG', 'DG',
       'DG', 'DG', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA', 'DA',
       'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT',
       'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT',
       'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'DT', 'D

In [9]:
# Dna is made of two chains, and they can be identified in prody:
dna.getChids()

array(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A

In [11]:
# we can select individual chains:
chain_B = dna.select('chain B')

In [12]:
chain_B

<Selection: '(chain B) and (not water)' from 1BNA (243 atoms)>

In [13]:
chain_A = pdb.select('chain A and not water')

In [14]:
chain_A

<Selection: 'chain A and not water' from 1BNA (243 atoms)>

In [15]:
dna

<Selection: 'not water' from 1BNA (486 atoms)>

In [17]:
anm = prody.ANM('DNA')

In [18]:
anm.buildHessian(dna)

In [19]:
anm.calcModes()

In [20]:
prody.writeNMD('BDNA.nmd',anm[:3],dna)

'BDNA.nmd'

Visualize the structure in VMD and the three first modes. I recommend setting the Drawing Method in Graphical Representations to New Cartoon.

With biomolecules the arrows become much more informative than before. Notice that movement is smooth and cooperative. You can later calculate the Collectivity which is related to how atoms are moving together along the whole structure.

Make sure to rotate the DNA around to see what's happening (hide the arrows from the NMWiz menu so that you can see better).

What do you see, can you give descriptive names to the different modes? This is usually a big travel when you publish research, since paper still does not show movies, so the names have to be descriptive! :)

Here are my examples:
mode 1: Twisting-untwisting;  pretty much like a spring.
mode 2: Lateral Bending.
mode 3: Bending onto the major groove.

So, each one of our cells have 6  feet of DNA... put all the DNA of all your cells in a straight line and there is enough to go to the sun and back... several hundred times!!!
How do you keep it inside of you? Do you think the way DNA is able to bend and twist helps? How?

If you are stuck look for nucleosome in the protein data bank! 

In [21]:
# We can take a look at which residues are in contact with each other.
# why does the contact map have the shape of an X?
prody.showContactMap(anm)

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x10d1fa350>

In [24]:
# The cross correlation map can be used to see which atoms move together, 
# which opposite to each other and which do not depend on each other.
prody.showCrossCorr(anm)

<IPython.core.display.Javascript object>

(<matplotlib.image.AxesImage at 0x10cb7f790>,
 <matplotlib.colorbar.Colorbar at 0x10cf8d990>)

In [16]:
anm.getCovariance()

array([[  9.16293631e-02,  -3.81783849e-02,   8.85628835e-02, ...,
          1.31208924e-04,   8.38492520e-04,  -7.31566799e-04],
       [ -3.81783849e-02,   1.96240360e-02,  -4.79696908e-02, ...,
         -1.58816994e-04,   1.54974215e-03,   1.72566445e-05],
       [  8.85628835e-02,  -4.79696908e-02,   1.21201434e-01, ...,
         -3.73892969e-04,  -3.92292849e-03,   4.69952850e-04],
       ..., 
       [  1.31208924e-04,  -1.58816994e-04,  -3.73892969e-04, ...,
          1.25173767e-03,  -2.78066667e-04,  -3.79041308e-04],
       [  8.38492520e-04,   1.54974215e-03,  -3.92292849e-03, ...,
         -2.78066667e-04,   1.81768926e-03,  -1.45738145e-04],
       [ -7.31566799e-04,   1.72566445e-05,   4.69952850e-04, ...,
         -3.79041308e-04,  -1.45738145e-04,   1.85293650e-04]])

In [17]:
prody.calcCollectivity(anm[0])

0.55223102682617486

In [18]:
prody.calcCollectivity(anm[1])

0.50929711219339613

In [19]:
prody.calcCollectivity(anm[2])

0.59226312678348803

# Proteins 

Proteins are the workers in your body, they take care of carrying the oxygen you breathe, digesting food, defeating viruses and bacteria... They come in all shapes and sizes and their shape is precisely what predisposes them to do their work.

Follow the steps above to work with the protein 2LAO (or any other protein of your interest). I think 2LAO is a good example to start: it binds certain amino acids and carries them. 

Proteins are the main thing for which prody is used. As the molecules get larger, so do the calculations. To make them quicker we'll use only one atom to represent each residue -- this is common in the field. You can later recover the movement of each atom based on the single atom per residue calculation.

Can you look at the modes of action and think how they might help in this process?

In [28]:
pdb = prody.parsePDB('2LAO')
pdb

<AtomGroup: 2LAO (1911 atoms)>

In [29]:
pdb = pdb.select('protein')
pdb

<Selection: 'protein' from 2LAO (1822 atoms)>

In [31]:
'''
Proteins can be visualized in a crude maner in the notebook.
We'll activate the matplotlib functionality that allows us to interact with
the plot by using %matplotlib notebook. When the plot is produced you 
will be able to use the mouse to rotate and move around.
When you are done you can click on the blue button.
'''
%matplotlib notebook
prody.showProtein(pdb)

<IPython.core.display.Javascript object>

<mpl_toolkits.mplot3d.axes3d.Axes3D at 0x10c883550>

In [32]:
# select only the alpha carbons
pdb_CA = pdb.select('name CA')

In [33]:
anm = prody.ANM('2LAO')
anm.buildHessian(pdb_CA)
anm.calcModes()
prody.writeNMD('2LAO.nmd',anm[:3],pdb_CA)

'2LAO.nmd'

In [36]:
# the correlations and contact maps look very different for proteins than
# for DNA isnce there's no symetry. For each protein they will be
# very different
prody.showCrossCorr(anm)

<IPython.core.display.Javascript object>

(<matplotlib.image.AxesImage at 0x1123cd890>,
 <matplotlib.colorbar.Colorbar at 0x11059bc50>)

In [37]:
prody.showContactMap(anm)

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x113c838d0>

In [38]:
# Extend the normal mode model to an all atom model:
all_anm, all_atoms = prody.extendModel(anm, pdb_CA, pdb)
all_anm

<NMA: Extended ANM 2LAO (20 modes; 1822 atoms)>

In [39]:
prody.writeNMD('2LAO_all_atom.nmd',all_anm[:3],all_atoms)

'2LAO_all_atom.nmd'

# Complexes

Many proteins form complexes by interacting together to create bigger machinery capable of greater things. We are going to go over a protein named "1SX4" - a chaperone molecule named Groel. What it does is it takes proteins that do not have the right structure (hence cannot do their function) and helps them to get to the right structure (now they can do their work). 

If you look at it in VMD and rotate it around you will see that it has two main points where a molecule could get in/out. 

This one is pretty large, so use the groel.nmd given to you. Change the coloring of the different subunits in vmd and increase the RMSD to make sure you can appreciate the movements. What do they look like?

A second complex that I find interesting is "4v2t" -- a transporter molecule that sets on the membrane to let different molecules get in and out of the membrane. Again, this is a large one, so you should use 4v2t.nmd