-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expansion of dihedrals analysis module #2033
Conversation
@@ -147,6 +235,16 @@ def __init__(self, atomgroup, **kwargs): | |||
|
|||
phi_sel = [res.phi_selection() for res in residues] | |||
psi_sel = [res.psi_selection() for res in residues] | |||
if any(sel is None for sel in phi_sel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added because phi_selection()
and psi_selection()
do not account for altloc selections. They will return None
in the middle of the protein if that is the case, so this removes the None
and the corresponding phi or psi angle. The amount of residues for which this is the case is very small so it seemed easier to just ignore them instead of rewriting phi_selection()
and psi_selection()
. If a user is interested in those specific residues, they can use the Dihedral
class to look at them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to the code, summarizing what you explained above.
|
||
if any(len(self.ag1) != len(ag) for ag in [self.ag2, self.ag3, | ||
self.ag4, self.ag5]): | ||
raise ValueError("Too many or too few atoms selected. Check for " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to Ramachandran
, altloc will make the selections fail, so here the class simply fails. Unlike Ramachandran
, the user can then select for only on altloc and the analysis will run. Also I've found some topologies that were incomplete and did not have all the atoms it was supposed to so that's why it says 'too few'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
altloc (and anything else that specifies as "dual topology") is a problem.
@richardjgowers is there a sensible way how a user can switch between different topologies? The problem is that at the moment, ag.residues
will contain all atoms, including altloc ones, even if my selection for ag
excluded them. This is problematic for the residue.phi_selection()
, residue.psi_selection()
and residue.chi1_selection()
methods, which then see groups that are not 4 atoms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, not really. This would have to be something done when the file is parsed. It sounds like maybe Residue.phi_selection()
should take an altloc kwarg or something to tell it what to identify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MMTF switches between topologies... somehow, so we have that.
We could/should use the fact that the residue.X_selections()
are methods. Perhaps indeed use a default Residue.phi_selection(selection="altloc ' '")
or similar – with the issue that we also have to catch the case when this selection makes no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mmtf does the topologies as frames, (or rather we read models as frames, like how PDB is abused (or at least one of the various ways..)).
Maybe naïve question, is altloc just a different position? Or can it be different functionalisation? If the former, I guess you could write an altloc aware reader/parser which expands out the different possibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that altloc just refers to different positions, but that those positions correspond to separate atoms that are both present in the topology, so they are treated just like extra atoms that are part of the residue. One of the issues with altloc selection is that there doesn't seem to be a way to select the atoms that have no altloc (at least I haven't figured it out). So to get the atoms with no altloc and those with altloc A, you have to say 'not altloc B', assuming there is only A and B for options. I think there can be more altlocs but I have yet to see any that go into C or higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have the PDB ID of a file with altloc handy?
If it has altlocs then we assign a default, which is either ''
(empty string) or ' '
(one space). I don't know if our selection language can deal with it. But there's always the object-oriented/pandas-style approach
ag[ag.altloc == '']
where you use boolean indexing. (Assuming that we expose an .altloc
attribute somewhere... @richardjgowers would know.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using PDB ID 19hc, but that method returns an AttributeError and says that AtomGroup
has no altloc attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardjgowers was right
>>> import MDAnalysis as mda
>>> u = mda.fetch_mmtf('19hc')
>>> u.atoms.altLocs
array(['', '', '', ..., '', '', ''], dtype=object)
>>> set(u.atoms.altLocs)
{'', 'A', 'B'}
# select no A or B
>>> u.atoms[u.atoms.altLocs == '']
<AtomGroup with 5944 atoms>
>>> u.atoms
<AtomGroup with 6098 atoms>
The default altLoc indicator is the empty string.
" have been removed from the selection.") | ||
residues = residues.difference(remove) | ||
|
||
self.ag1 = residues.atoms.select_atoms("name N") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided not to use chi1_selection()
for this one because it does not capture all of the atoms it should, specifically the residues with multiple branches or non-carbon side chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the standard residue.chi1_selection()
be updated?
If so, just open an issue so that it isn't forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add your comment above as a comment in the code.
It will be helpful to anyone else coming after you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## develop #2033 +/- ##
===========================================
+ Coverage 88.9% 88.93% +0.03%
===========================================
Files 143 144 +1
Lines 17386 17462 +76
Branches 2665 2684 +19
===========================================
+ Hits 15457 15530 +73
Misses 1321 1321
- Partials 608 611 +3
Continue to review full report at Codecov.
|
@hfmull I merged develop into your branch because Appveyor was complaining about it not being in a conflict-free state. I think just adding this new commit kicked Travis CI into doing something, too – sometimes one just has to add another commit... |
Iirc it has a capital L?
…On Thu, Aug 9, 2018 at 4:09 PM, hfmull ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In package/MDAnalysis/analysis/dihedrals.py
<#2033 (comment)>
:
> + "inside of a 'protein' selection can be used to "
+ "calculate dihedrals.")
+ elif len(remove) != 0:
+ warnings.warn("All ALA, CYS, GLY, PRO, SER, THR, and VAL residues"
+ " have been removed from the selection.")
+ residues = residues.difference(remove)
+
+ self.ag1 = residues.atoms.select_atoms("name N")
+ self.ag2 = residues.atoms.select_atoms("name CA")
+ self.ag3 = residues.atoms.select_atoms("name CB")
+ self.ag4 = residues.atoms.select_atoms("name CG CG1")
+ self.ag5 = residues.atoms.select_atoms("name CD CD1 OD1 ND1 SD")
+
+ if any(len(self.ag1) != len(ag) for ag in [self.ag2, self.ag3,
+ self.ag4, self.ag5]):
+ raise ValueError("Too many or too few atoms selected. Check for "
I've been using PDB ID 19hc, but that method returns an AttributeError and
says that AtomGroup has no altloc attribute
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2033 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AI0jB-FnEPyaPIgWA6HerU0HpbYF806vks5uPKUggaJpZM4V00md>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice code and cool new features!
See comments – main issue is that datafiles for MDAnalysis cannot reside in MDAnalysisTests.
Ramachandran analysis | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The :class:`~MDAnalysis.analysis.dihedrals.Ramachandran` class allows for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add references to the papers (Ramachandran and Janin) and add a section References.
R.plot(ax=ax, color='k', marker='s') | ||
|
||
Alternatively, if you wanted to plot the data yourself, the angles themselves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still mention that the data is in :attr:angles
– most people will want to use it.
for ts in R.angles: | ||
ax.scatter(ts[:,0], ts[:,1], color='k', marker='s') | ||
Reference plots can be added to the axes for both the Ramachandran and Janin | ||
classes using the kwarg ``ref=True``. These were made using data obtained from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Can you generate example plots for Ramachandran and Janin so that we can (eventually) add them to the docs? I can't quite remember how best to do it... I did it for the MDA tutorial.
Just create a directory package/doc/sphinx/source/images
, add them there, and we'll figure out the rest later.
.. attribute:: angles | ||
|
||
Contains the time steps of the angles for each atomgroup in the list as | ||
an n_frames×len(atomgroups) :class:`numpy.ndarray` with content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooooh... proper typography!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the variables in double-back ticks
``n_frames×len(atomgroups)``
so that they get code formatting.
(Still ❤️ ing the ×)
|
||
.. attribute:: angles | ||
|
||
Contains the time steps of the phi and psi angles for each residue as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chi1 and chi2
Btw, you can use proper Greek characters with LaTeX-like formatting
contains :math:`\chi_` and :math:`\chi_2` angles
LYSJaninArray = resource_filename(__name__, 'data/adk_oplsaa_LYS_janin.npy') | ||
|
||
Rama_ref = resource_filename(__name__, 'data/rama_ref_data.npy') | ||
Janin_ref = resource_filename(__name__, 'data/janin_ref_data.npy') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to go into MDAnalysis (see above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also prepare references for each amino acid separately? I think that this would be more interesting than all sidechains averaged. The peptide backbone is not strongly dependent on the chemical identity of the residue (GLY, PRO are exceptions) but the Janin plots should depend strongly on the nature of the sidechain.
@@ -71,7 +109,61 @@ def test_protein_ends(self, universe): | |||
with pytest.warns(UserWarning): | |||
rama = Ramachandran(universe.select_atoms("protein")).run() | |||
|
|||
def test_None_removal(self): | |||
with pytest.warns(UserWarning): | |||
u = mda.coordinates.MMTF.fetch_mmtf('19hc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if fetch_mmtf()
cannot download the PDB? You should either XFAIL the test if this happens or include a gzipped 19hc.pdb.gz as a data file.
|
||
def test_janin(self, universe): | ||
janin = Janin(universe.select_atoms("protein")).run() | ||
test_janin = np.load(JaninArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the test_janin
a class-level fixture so that it only gets loaded once:
class ....:
@pytest.fixture
def janin_ref_array(self):
return np.load(JaninArray)
def test_janin(self, universe, janin_ref_array):
janin = Janin(universe.select_atoms("protein")).run()
assert_almost_equal(janin.angles, janin_ref_array, 4, ...)
@@ -40,7 +78,7 @@ def universe(self): | |||
|
|||
def test_ramachandran(self, universe): | |||
rama = Ramachandran(universe.select_atoms("protein")).run() | |||
test_rama = np.load(DihedralsArray) | |||
test_rama = np.load(RamaArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it a fixture (see comments below for JaninArray
– this is the common way to treat "reference data" in pytest
janin = Janin(universe.select_atoms("protein")).run() | ||
test_janin = np.load(JaninArray) | ||
|
||
assert_almost_equal(janin.angles, test_janin, 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is decimal=4
sufficient to let the tests pass on Linux and OSX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a one line comment for the justification for reducing the test precision.
`atomgroup` for each time step in the trajectory. A :class:`~MDAnalysis.ResidueGroup` | ||
is generated from `atomgroup` which is compared to the protein to determine | ||
if it is a legitimate selection. | ||
:math:`\phi` and :math:`\psi` angles will be calculated for each residue \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backslash not needed at end of line
for each residue as an ``n_frames×n_residues×2`` :class:`numpy.ndarray` | ||
with content ``[[[chi1, chi2], [residue 2], ...], [time step 2], ...]``. | ||
|
||
References |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
@@ -0,0 +1,35 @@ | |||
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data
directory also needs an empty file __init__.py
to mark it as an importable module. Just add it with
touch analysis/data/__init__.py
git add analysis/data/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The absence of the file is the reason for the test failures.
package/CHANGELOG
Outdated
@@ -64,7 +63,8 @@ Enhancements | |||
generated with gromacs -noappend (PR #1728) | |||
* MDAnalysis.lib.mdamath now supports triclinic boxes and rewrote in Cython (PR #1965) | |||
* AtomGroup.write can write a trajectory of selected frames (Issue #1037) | |||
* Added analysis.dihedrals with Ramachandran class to analysis module (PR #1997) | |||
* Added dihedrals.py with Dihedral, Ramachandran, and Janin classes to | |||
analysis module (PR #1997, PR #2033) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add entry under Enhancements
* added the analysis/data module for reference data used in analysis
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. | ||
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787 | ||
# | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add brief documentation:
Analysis data files
------------------
.. data:: Rama_ref
Reference Ramachandran histogram for :class:`MDAnalysis.analysis.dihedrals.Ramachandran`.
The data were calculated on a data set of 500 PDB structures taken from [Lovell2003]_.
.. data:: Janin_ref
...
You are starting this data module so you should set a good example for documentation in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert the docs for analysis.docs.data.filenames
into the sphinx docs as you did for dihedrals. Or ping me an I will add it when you have done the docs inside filenames.py and dihedrals.py.
for ts in R.angles: | ||
ax.scatter(ts[:,0], ts[:,1], color='k', marker='s') | ||
Reference plots can be added to the axes for both the Ramachandran and Janin | ||
classes using the kwarg ``ref=True``. These were made using data obtained from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace "These were made using" with
The Ramachandran reference data (:data:`~MDAnalysis.analysis.data.filenames.Rama_ref`) and
Janin reference data (:data:`~MDAnalysis.analysis.data.filenames.Janin_ref`) were made ...
It's good to put in cross links between the docs.
Alternatively, if you wanted to plot the data yourself, the angles themselves | ||
can be accessed using :attr:`Ramachandran.angles`:: | ||
The Janin class works in the same way, only needing a list of residues. To plot | ||
the data yourself, the angles can be accessed using :attr:`Ramachandran.angles`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or :attr:Janin.angles
@@ -85,7 +85,7 @@ | |||
"XTC_sub_sol", | |||
"XYZ", "XYZ_psf", "XYZ_bz2", | |||
"XYZ_mini", "XYZ_five", # 3 and 5 atoms xyzs for an easy topology | |||
"TXYZ", "ARC", "ARC_PBC", # Tinker files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why ARC_PBC was removed. This might have been a merge accident because it is present in current develop. Please add it back! The line should be
"TXYZ", "ARC", "ARC_PBC", # Tinker files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that you pull from your branch – maybe you didn't get the merge commit?
@@ -294,7 +297,6 @@ | |||
XYZ_five = resource_filename(__name__, 'data/five.xyz') | |||
TXYZ = resource_filename(__name__, 'data/coordinates/test.txyz') | |||
ARC = resource_filename(__name__, 'data/coordinates/test.arc') | |||
ARC_PBC = resource_filename(__name__, 'data/coordinates/new_hexane.arc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this line back in
ARC_PBC = resource_filename(__name__, 'data/coordinates/new_hexane.arc')
9416f48
to
776fad4
Compare
776fad4
to
f887f76
Compare
FYI: @hfmull wrote a short report on his REU project https://figshare.com/articles/Technical_Report_SPIDAL_Summer_REU_2018_Dihedral_Analysis_in_MDAnalysis/6957296 |
@hfmull FYI: I benchmarked your Dihedral analysis class against the naive approach in Gist https://gist.github.com/orbeckst/26081375f3ea3152f08bbcc90c14c5eb and your approach is between 80 and 100 times faster for a typical protein. Good job! |
Changes made in this Pull Request:
PR Checklist