Expansion of dihedrals analysis module #2033

hfmull · 2018-08-08T23:32:16Z

Changes made in this Pull Request:

Added Dihedral and Janin classes to dihedral.py
Ramachandran class changed to account for selection errors
Added tests to test_dihedral.py to account for changes
Added reference plots for Ramachandran and Janin classes

PR Checklist

Tests?
Docs?
CHANGELOG updated?
[n/a] Issue raised/referenced?

hfmull · 2018-08-08T23:39:10Z

package/MDAnalysis/analysis/dihedrals.py

@@ -147,6 +235,16 @@ def __init__(self, atomgroup, **kwargs):

        phi_sel = [res.phi_selection() for res in residues]
        psi_sel = [res.psi_selection() for res in residues]
+        if any(sel is None for sel in phi_sel):


This was added because phi_selection() and psi_selection() do not account for altloc selections. They will return None in the middle of the protein if that is the case, so this removes the None and the corresponding phi or psi angle. The amount of residues for which this is the case is very small so it seemed easier to just ignore them instead of rewriting phi_selection() and psi_selection(). If a user is interested in those specific residues, they can use the Dihedral class to look at them.

Add a comment to the code, summarizing what you explained above.

hfmull · 2018-08-08T23:42:25Z

package/MDAnalysis/analysis/dihedrals.py

+
+        if any(len(self.ag1) != len(ag) for ag in [self.ag2, self.ag3,
+                                                   self.ag4, self.ag5]):
+            raise ValueError("Too many or too few atoms selected. Check for "


Similar to Ramachandran, altloc will make the selections fail, so here the class simply fails. Unlike Ramachandran, the user can then select for only on altloc and the analysis will run. Also I've found some topologies that were incomplete and did not have all the atoms it was supposed to so that's why it says 'too few'.

altloc (and anything else that specifies as "dual topology") is a problem.

@richardjgowers is there a sensible way how a user can switch between different topologies? The problem is that at the moment, ag.residues will contain all atoms, including altloc ones, even if my selection for ag excluded them. This is problematic for the residue.phi_selection(), residue.psi_selection() and residue.chi1_selection() methods, which then see groups that are not 4 atoms.

Currently, not really. This would have to be something done when the file is parsed. It sounds like maybe Residue.phi_selection() should take an altloc kwarg or something to tell it what to identify?

MMTF switches between topologies... somehow, so we have that.

We could/should use the fact that the residue.X_selections() are methods. Perhaps indeed use a default Residue.phi_selection(selection="altloc ' '") or similar – with the issue that we also have to catch the case when this selection makes no sense.

I think mmtf does the topologies as frames, (or rather we read models as frames, like how PDB is abused (or at least one of the various ways..)).

Maybe naïve question, is altloc just a different position? Or can it be different functionalisation? If the former, I guess you could write an altloc aware reader/parser which expands out the different possibilities.

My understanding is that altloc just refers to different positions, but that those positions correspond to separate atoms that are both present in the topology, so they are treated just like extra atoms that are part of the residue. One of the issues with altloc selection is that there doesn't seem to be a way to select the atoms that have no altloc (at least I haven't figured it out). So to get the atoms with no altloc and those with altloc A, you have to say 'not altloc B', assuming there is only A and B for options. I think there can be more altlocs but I have yet to see any that go into C or higher.

Do you have the PDB ID of a file with altloc handy?

If it has altlocs then we assign a default, which is either '' (empty string) or ' ' (one space). I don't know if our selection language can deal with it. But there's always the object-oriented/pandas-style approach

ag[ag.altloc == '']

where you use boolean indexing. (Assuming that we expose an .altloc attribute somewhere... @richardjgowers would know.)

I've been using PDB ID 19hc, but that method returns an AttributeError and says that AtomGroup has no altloc attribute

@richardjgowers was right

>>> import MDAnalysis as mda >>> u = mda.fetch_mmtf('19hc') >>> u.atoms.altLocs array(['', '', '', ..., '', '', ''], dtype=object) >>> set(u.atoms.altLocs) {'', 'A', 'B'} # select no A or B >>> u.atoms[u.atoms.altLocs == ''] <AtomGroup with 5944 atoms> >>> u.atoms <AtomGroup with 6098 atoms>

The default altLoc indicator is the empty string.

hfmull · 2018-08-08T23:44:24Z

package/MDAnalysis/analysis/dihedrals.py

+                          " have been removed from the selection.")
+            residues = residues.difference(remove)
+
+        self.ag1 = residues.atoms.select_atoms("name N")


I decided not to use chi1_selection() for this one because it does not capture all of the atoms it should, specifically the residues with multiple branches or non-carbon side chains.

Should the standard residue.chi1_selection() be updated?

If so, just open an issue so that it isn't forgotten.

Yes please!

Please also add your comment above as a comment in the code.

It will be helpful to anyone else coming after you.

I raised #2044. @hfmull if you can add anything to the issue report (e.g., example PDB ID where the current chi1_angles() gives wrong results) then that would be great!

codecov · 2018-08-09T18:48:17Z

Codecov Report

Merging #2033 into develop will increase coverage by 0.03%.
The diff coverage is 96.2%.

@@             Coverage Diff             @@
##           develop    #2033      +/-   ##
===========================================
+ Coverage     88.9%   88.93%   +0.03%     
===========================================
  Files          143      144       +1     
  Lines        17386    17462      +76     
  Branches      2665     2684      +19     
===========================================
+ Hits         15457    15530      +73     
  Misses        1321     1321              
- Partials       608      611       +3

Impacted Files	Coverage Δ
package/MDAnalysis/analysis/data/filenames.py	`100% <100%> (ø)`
package/MDAnalysis/analysis/dihedrals.py	`96.52% <95.83%> (-1.31%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 388346d...80533cc. Read the comment docs.

orbeckst · 2018-08-09T19:22:32Z

@hfmull I merged develop into your branch because Appveyor was complaining about it not being in a conflict-free state. I think just adding this new commit kicked Travis CI into doing something, too – sometimes one just has to add another commit...

richardjgowers · 2018-08-09T21:12:33Z

Iirc it has a capital L?

…

On Thu, Aug 9, 2018 at 4:09 PM, hfmull ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In package/MDAnalysis/analysis/dihedrals.py <#2033 (comment)> : > + "inside of a 'protein' selection can be used to " + "calculate dihedrals.") + elif len(remove) != 0: + warnings.warn("All ALA, CYS, GLY, PRO, SER, THR, and VAL residues" + " have been removed from the selection.") + residues = residues.difference(remove) + + self.ag1 = residues.atoms.select_atoms("name N") + self.ag2 = residues.atoms.select_atoms("name CA") + self.ag3 = residues.atoms.select_atoms("name CB") + self.ag4 = residues.atoms.select_atoms("name CG CG1") + self.ag5 = residues.atoms.select_atoms("name CD CD1 OD1 ND1 SD") + + if any(len(self.ag1) != len(ag) for ag in [self.ag2, self.ag3, + self.ag4, self.ag5]): + raise ValueError("Too many or too few atoms selected. Check for " I've been using PDB ID 19hc, but that method returns an AttributeError and says that AtomGroup has no altloc attribute — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2033 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI0jB-FnEPyaPIgWA6HerU0HpbYF806vks5uPKUggaJpZM4V00md> .

orbeckst

Very nice code and cool new features!

See comments – main issue is that datafiles for MDAnalysis cannot reside in MDAnalysisTests.

orbeckst · 2018-08-09T21:07:52Z

package/MDAnalysis/analysis/dihedrals.py

+Ramachandran analysis
+~~~~~~~~~~~~~~~~~~~~~
+
+The :class:`~MDAnalysis.analysis.dihedrals.Ramachandran` class allows for the


Please add references to the papers (Ramachandran and Janin) and add a section References.

orbeckst · 2018-08-09T21:10:13Z

package/MDAnalysis/analysis/dihedrals.py

   R.plot(ax=ax, color='k', marker='s')

-Alternatively, if you wanted to plot the data yourself, the angles themselves


Still mention that the data is in :attr:angles – most people will want to use it.

orbeckst · 2018-08-09T21:13:27Z

package/MDAnalysis/analysis/dihedrals.py

-   for ts in R.angles:
-       ax.scatter(ts[:,0], ts[:,1], color='k', marker='s')
+Reference plots can be added to the axes for both the Ramachandran and Janin
+classes using the kwarg ``ref=True``. These were made using data obtained from


Cool!

Can you generate example plots for Ramachandran and Janin so that we can (eventually) add them to the docs? I can't quite remember how best to do it... I did it for the MDA tutorial.

Just create a directory package/doc/sphinx/source/images, add them there, and we'll figure out the rest later.

orbeckst · 2018-08-09T21:14:28Z

package/MDAnalysis/analysis/dihedrals.py

+   .. attribute:: angles
+
+       Contains the time steps of the angles for each atomgroup in the list as
+       an n_frames×len(atomgroups) :class:`numpy.ndarray` with content


ooooh... proper typography!

Put the variables in double-back ticks

``n_frames×len(atomgroups)``

so that they get code formatting.

(Still ❤️ ing the ×)

orbeckst · 2018-08-09T21:16:39Z

package/MDAnalysis/analysis/dihedrals.py

+
+   .. attribute:: angles
+
+       Contains the time steps of the phi and psi angles for each residue as


chi1 and chi2

Btw, you can use proper Greek characters with LaTeX-like formatting

contains :math:`\chi_` and :math:`\chi_2` angles

orbeckst · 2018-08-09T21:25:37Z

testsuite/MDAnalysisTests/datafiles.py

+LYSJaninArray = resource_filename(__name__, 'data/adk_oplsaa_LYS_janin.npy')
+
+Rama_ref = resource_filename(__name__, 'data/rama_ref_data.npy')
+Janin_ref = resource_filename(__name__, 'data/janin_ref_data.npy')


Needs to go into MDAnalysis (see above).

Can you also prepare references for each amino acid separately? I think that this would be more interesting than all sidechains averaged. The peptide backbone is not strongly dependent on the chemical identity of the residue (GLY, PRO are exceptions) but the Janin plots should depend strongly on the nature of the sidechain.

orbeckst · 2018-08-09T21:31:10Z

testsuite/MDAnalysisTests/analysis/test_dihedrals.py

@@ -71,7 +109,61 @@ def test_protein_ends(self, universe):
        with pytest.warns(UserWarning):
            rama = Ramachandran(universe.select_atoms("protein")).run()

+    def test_None_removal(self):
+        with pytest.warns(UserWarning):
+            u = mda.coordinates.MMTF.fetch_mmtf('19hc')


What happens if fetch_mmtf() cannot download the PDB? You should either XFAIL the test if this happens or include a gzipped 19hc.pdb.gz as a data file.

orbeckst · 2018-08-09T21:34:57Z

testsuite/MDAnalysisTests/analysis/test_dihedrals.py

+
+    def test_janin(self, universe):
+        janin = Janin(universe.select_atoms("protein")).run()
+        test_janin = np.load(JaninArray)


make the test_janin a class-level fixture so that it only gets loaded once:

class ....: @pytest.fixture def janin_ref_array(self): return np.load(JaninArray) def test_janin(self, universe, janin_ref_array): janin = Janin(universe.select_atoms("protein")).run() assert_almost_equal(janin.angles, janin_ref_array, 4, ...)

orbeckst · 2018-08-09T21:38:07Z

testsuite/MDAnalysisTests/analysis/test_dihedrals.py

@@ -40,7 +78,7 @@ def universe(self):

    def test_ramachandran(self, universe):
        rama = Ramachandran(universe.select_atoms("protein")).run()
-        test_rama = np.load(DihedralsArray)
+        test_rama = np.load(RamaArray)


make it a fixture (see comments below for JaninArray – this is the common way to treat "reference data" in pytest

orbeckst · 2018-08-09T21:39:00Z

testsuite/MDAnalysisTests/analysis/test_dihedrals.py

+        janin = Janin(universe.select_atoms("protein")).run()
+        test_janin = np.load(JaninArray)
+
+        assert_almost_equal(janin.angles, test_janin, 4,


Is decimal=4 sufficient to let the tests pass on Linux and OSX?

Add a one line comment for the justification for reducing the test precision.

orbeckst · 2018-08-10T20:45:55Z

package/MDAnalysis/analysis/dihedrals.py

-    `atomgroup` for each time step in the trajectory. A :class:`~MDAnalysis.ResidueGroup`
-    is generated from `atomgroup` which is compared to the protein to determine
-    if it is a legitimate selection.
+    :math:`\phi` and :math:`\psi` angles will be calculated for each residue \


backslash not needed at end of line

orbeckst · 2018-08-10T20:46:25Z

package/MDAnalysis/analysis/dihedrals.py

+       for each residue as an ``n_frames×n_residues×2`` :class:`numpy.ndarray`
+       with content ``[[[chi1, chi2], [residue 2], ...], [time step 2], ...]``.
+
+References


orbeckst

see comments

orbeckst · 2018-08-10T20:54:51Z

package/MDAnalysis/analysis/data/filenames.py

@@ -0,0 +1,35 @@
+# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-


The data directory also needs an empty file __init__.py to mark it as an importable module. Just add it with

touch analysis/data/__init__.py git add analysis/data/__init__.py

The absence of the file is the reason for the test failures.

orbeckst · 2018-08-10T20:56:59Z

package/CHANGELOG

@@ -64,7 +63,8 @@ Enhancements
    generated with gromacs -noappend (PR #1728)
  * MDAnalysis.lib.mdamath now supports triclinic boxes and rewrote in Cython (PR #1965)
  * AtomGroup.write can write a trajectory of selected frames (Issue #1037)
-  * Added analysis.dihedrals with Ramachandran class to analysis module (PR #1997)
+  * Added dihedrals.py with Dihedral, Ramachandran, and Janin classes to
+    analysis module (PR #1997, PR #2033) 


Also add entry under Enhancements

* added the analysis/data module for reference data used in analysis

orbeckst · 2018-08-10T21:02:25Z

package/MDAnalysis/analysis/data/filenames.py

+# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
+# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
+#
+


Please add brief documentation:

Analysis data files ------------------ .. data:: Rama_ref Reference Ramachandran histogram for :class:`MDAnalysis.analysis.dihedrals.Ramachandran`. The data were calculated on a data set of 500 PDB structures taken from [Lovell2003]_. .. data:: Janin_ref ...

You are starting this data module so you should set a good example for documentation in the future.

Insert the docs for analysis.docs.data.filenames into the sphinx docs as you did for dihedrals. Or ping me an I will add it when you have done the docs inside filenames.py and dihedrals.py.

orbeckst · 2018-08-10T21:07:01Z

package/MDAnalysis/analysis/dihedrals.py

-   for ts in R.angles:
-       ax.scatter(ts[:,0], ts[:,1], color='k', marker='s')
+Reference plots can be added to the axes for both the Ramachandran and Janin
+classes using the kwarg ``ref=True``. These were made using data obtained from


Replace "These were made using" with

The Ramachandran reference data (:data:`~MDAnalysis.analysis.data.filenames.Rama_ref`) and Janin reference data (:data:`~MDAnalysis.analysis.data.filenames.Janin_ref`) were made ...

It's good to put in cross links between the docs.

orbeckst · 2018-08-10T21:08:49Z

package/MDAnalysis/analysis/dihedrals.py

-Alternatively, if you wanted to plot the data yourself, the angles themselves
-can be accessed using :attr:`Ramachandran.angles`::
+The Janin class works in the same way, only needing a list of residues. To plot
+the data yourself, the angles can be accessed using :attr:`Ramachandran.angles`.


or :attr:Janin.angles

orbeckst · 2018-08-10T21:11:14Z

testsuite/MDAnalysisTests/datafiles.py

@@ -85,7 +85,7 @@
    "XTC_sub_sol",
    "XYZ", "XYZ_psf", "XYZ_bz2",
    "XYZ_mini", "XYZ_five", # 3 and 5 atoms xyzs for an easy topology
-    "TXYZ", "ARC", "ARC_PBC",        # Tinker files


I am not sure why ARC_PBC was removed. This might have been a merge accident because it is present in current develop. Please add it back! The line should be

"TXYZ", "ARC", "ARC_PBC", # Tinker files

Make sure that you pull from your branch – maybe you didn't get the merge commit?

orbeckst · 2018-08-10T21:11:56Z

testsuite/MDAnalysisTests/datafiles.py

@@ -294,7 +297,6 @@
 XYZ_five = resource_filename(__name__, 'data/five.xyz')
 TXYZ = resource_filename(__name__, 'data/coordinates/test.txyz')
 ARC = resource_filename(__name__, 'data/coordinates/test.arc')
-ARC_PBC = resource_filename(__name__, 'data/coordinates/new_hexane.arc')


Add this line back in

ARC_PBC = resource_filename(__name__, 'data/coordinates/new_hexane.arc')

orbeckst · 2018-08-12T17:33:43Z

One test failed but not because of anything you did (but more likely #1988). I restarted the test and hopefully everything will light up "passing".

I am approving the PR and then it can be merged whenever Travis is happy.

Excellent work, @hfmull !

orbeckst · 2018-08-13T05:12:48Z

FYI: @hfmull wrote a short report on his REU project https://figshare.com/articles/Technical_Report_SPIDAL_Summer_REU_2018_Dihedral_Analysis_in_MDAnalysis/6957296

orbeckst · 2018-08-27T16:57:51Z

@hfmull FYI: I benchmarked your Dihedral analysis class against the naive approach in Gist https://gist.github.com/orbeckst/26081375f3ea3152f08bbcc90c14c5eb and your approach is between 80 and 100 times faster for a typical protein. Good job!

hfmull commented Aug 8, 2018

View reviewed changes

orbeckst requested changes Aug 9, 2018

View reviewed changes

orbeckst reviewed Aug 10, 2018

View reviewed changes

orbeckst requested changes Aug 10, 2018

View reviewed changes

Henry Mull added 15 commits August 10, 2018 14:56

Added Janin class to dihedrals.py

93e6017

updated test_dihedrals for Janin class, added test data to datafiles.py

e03fd0d

cleaned up janin class

c8f5d82

Made Janin class a subclass of Ramachandran

8eadbe8

Added general Dihedral class, with tests and test data

8c873c8

fixed selections returning none in the middle of a protein

8002d04

updated to deal with selection failures

5efef65

Updated test_dihedrals.py for new features

92c99b7

updated docstring for dihedrals.py

b72815b

updated CHANGELOG

6a6ec1b

Added reference plots for Ramachandran and Janin classes

2dd296f

changed test_janin to check fewer decimals

556cf41

Fixed Janin plot method

4a52cc2

moved ref plot data to new analysis/data directory, updated setup.py

44d48ad

updated docstring and added comments

e321165

hfmull force-pushed the general_dihedrals branch from 9416f48 to 776fad4 Compare August 10, 2018 22:10

Fixed test_dihedrals, made analysis.data module

f887f76

hfmull force-pushed the general_dihedrals branch from 776fad4 to f887f76 Compare August 10, 2018 22:32

Henry Mull added 2 commits August 10, 2018 15:45

Added docs for analysis.data

3814fc5

Updated docstrings and CHANGELOG

8d94491

Henry Mull added 2 commits August 10, 2018 16:16

Added image directory for documentation

7c4a700

added ref=True to tests to increase coverage

b4f8484

orbeckst approved these changes Aug 12, 2018

View reviewed changes

orbeckst self-assigned this Aug 12, 2018

integrate Ramachandran and Janin plots into docs and updated references

80533cc

orbeckst merged commit 85ec60a into MDAnalysis:develop Aug 13, 2018

orbeckst mentioned this pull request Aug 13, 2018

tests and updated docs for analysis.data #2047

Merged

4 tasks

		R.plot(ax=ax, color='k', marker='s')

		Alternatively, if you wanted to plot the data yourself, the angles themselves


		.. attribute:: angles

		Contains the time steps of the phi and psi angles for each residue as

		@@ -0,0 +1,35 @@
		# -- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 --

Expansion of dihedrals analysis module #2033

Expansion of dihedrals analysis module #2033

Conversation

hfmull commented Aug 8, 2018 • edited Loading

PR Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst Aug 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 9, 2018 • edited Loading

Codecov Report

orbeckst commented Aug 9, 2018

richardjgowers commented Aug 9, 2018 via email

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Aug 12, 2018

orbeckst commented Aug 13, 2018

orbeckst commented Aug 27, 2018

hfmull commented Aug 8, 2018 •

edited

Loading

orbeckst Aug 9, 2018 •

edited

Loading

codecov bot commented Aug 9, 2018 •

edited

Loading