Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AtomGroup.center fails for compounds+wrapping #2984

Closed
mnmelo opened this issue Oct 15, 2020 · 3 comments · Fixed by #2992
Closed

AtomGroup.center fails for compounds+wrapping #2984

mnmelo opened this issue Oct 15, 2020 · 3 comments · Fixed by #2992

Comments

@mnmelo
Copy link
Member

mnmelo commented Oct 15, 2020

Digging into the optimization of AtomGroup.unwrap (from the discussion in #2982) I am trying to replicate the vectorization done in AtomGroup.center. I noticed that in there there's a sorting of compound indices that isn't being done if the user requests compound unwrapping. This breaks center calculation if for some reason a compound's atoms are not initially contiguous.

This is a corner case, but still possible. The fix is quite simple, and if it's ok with you I'd include it in the upcoming PR dedicated to the wrap/unwrap enhancement because that will also be dealing with similar code patterns, in the same source file (core/groups.py).

Code to reproduce the behavior

With a simple 5-water system:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import waterPSF, waterDCD

u = mda.Universe(waterPSF, waterDCD)

testgrp = u.atoms
testndx = np.arange(len(testgrp))
np.random.shuffle(testndx)
testgrp_unordered = testgrp[testndx]
testgrp_unordered.fragindices
>> array([3, 0, 1, 1, 2, 2, 2, 4, 3, 4, 1, 0, 0, 4, 3])

The fragment COGs of the test group:

testgrp.center_of_geometry(compound='fragments')
>> array([[ 16.86431313,  -1.89109834,  23.81008784],
          [ 18.23819478,   0.2692619 , -24.17409579],
          [ 18.62940788,  -3.88507994, -24.22752635],
          [ 20.89391009,  -0.74630801, -24.41894786],
          [ 21.33577983,  -3.48827068, -23.58818181]])

Same answer if atoms are unwrapped but consecutive (molecules are already whole in this test traj):

testgrp.center_of_geometry(compound='fragments', unwrap=True)
>> array([[ 16.86431313,  -1.89109834,  23.81008784],
          [ 18.23819478,   0.2692619 , -24.17409579],
          [ 18.62940788,  -3.88507994, -24.22752635],
          [ 20.89391009,  -0.74630801, -24.41894786],
          [ 21.33577983,  -3.48827068, -23.58818181]])

Also ok if randomized but not unwrapped:

testgrp_unordered.center_of_geometry(compound='fragments')
>> array([[ 16.86431313,  -1.89109834,  23.81008784],
          [ 18.23819478,   0.2692619 , -24.17409579],
          [ 18.62940788,  -3.88507994, -24.22752635],
          [ 20.89391009,  -0.74630801, -24.41894786],
          [ 21.33577983,  -3.48827068, -23.58818181]])

Oops!

testgrp_unordered.center_of_geometry(compound='fragments', unwrap=True)
>> array([[ 18.76917013,  -0.9612124 ,  -8.50591214],
          [ 18.56716092,  -2.47748999, -23.95220439],
          [ 20.31393941,  -2.90605104, -23.84455808],
          [ 18.98535792,  -1.3438748 ,  -8.07818158],
          [ 19.32597733,  -2.05286684,  -8.21780777]])

Current version of MDAnalysis

2.0.0-dev0 (at commit 64a7c05), Python 3.6.9, Ubuntu 18.04

EDIT: output clarification

@mnmelo mnmelo self-assigned this Oct 15, 2020
@orbeckst
Copy link
Member

Because we're still fixing 1.0.x, it is a lot easier if bug fixes are separate from enhancements of any kind.

We have no problem adding performance enhancements back into 1.x but in general, micro PRs are easier to handle because sometimes the discussion on something that looked like an obvious thing leads us to a more nuanced view.

@mnmelo
Copy link
Member Author

mnmelo commented Oct 17, 2020

So, when testing, I realized this bug is more serious than I first thought:

When AtomGroup.center is used with compounds, the compound_indices are sorted, and the coordinate/weights arrays are as well, with the same sorting indices. When unwrapping, however, the unwrapped coordinates are not being sorted, and that's the bug.

But rather than just impacting center calculations when unwrapping + unordered compounds, this bug introduces a mismatch between the weight sorting (which always happens) and coordinate sorting (which isn't happening for unwrap). This means that all center_of_mass calculations with unwrap=True and compound != 'group' have been returning wrong values! This seems like a much more frequent use case than what I first described.

The upcoming PR with a fix ensures coordinates are sorted in tandem with weights. It also introduces a check if sorting really is needed: it's an expensive operation, and, in this case, unnecessarily broadened the impact of the bug.

mnmelo added a commit that referenced this issue Oct 17, 2020
mnmelo added a commit that referenced this issue Oct 17, 2020
@mnmelo mnmelo changed the title AtomGroup.center fails for compounds+wrapping with nonccontiguous atoms AtomGroup.center fails for compounds+wrapping Oct 17, 2020
@mnmelo
Copy link
Member Author

mnmelo commented Oct 17, 2020

Just an update on some further checks: the bug with center_of_mass will more likely manifest itself when dealing with compounds with a large number of atoms. In that case, numpy's default quicksort algorithm will reorder atoms even if not needed. For smaller compounds (tested with the 5-water system above) quicksort will leave the atom ordering alone, but it already kicks in in the TRZ_psf, TRZ universe that is used for unwrap testing. I imagine that the limit that triggers quicksort has to do with list size, and possibly even other machine-dependent optimizations.

jbarnoud pushed a commit that referenced this issue Oct 18, 2020
@jbarnoud jbarnoud added this to To do in backport via automation Oct 18, 2020
orbeckst pushed a commit that referenced this issue Oct 23, 2020
… with unwrapping (PR #2992)

fixes #2984

(cherry picked from commit beb5232)
@orbeckst orbeckst moved this from To do to In progress in backport Oct 24, 2020
orbeckst pushed a commit that referenced this issue Oct 24, 2020
… with unwrapping (PR #2992)

fixes #2984

(cherry picked from commit beb5232)
@orbeckst orbeckst moved this from In progress to Done in backport Oct 24, 2020
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this issue Dec 9, 2020
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this issue Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
backport
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants