Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PositionAverager Transformation ends up with wrong results with parallel analysis #2996

Open
yuxuanzhuang opened this issue Oct 19, 2020 · 6 comments · Fixed by #2950
Open
Labels

Comments

@yuxuanzhuang
Copy link
Contributor

Expected behavior

The results are the same as serial analysis

Actual behavior

Due to the splitting approach, a new PositionAverager will be created for each block; no previous memory (self.coord_array) is saved.

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysisData.adk_equilibrium import fetch_adk_equilibrium
adk = fetch_adk_equilibrium()
import matplotlib.pyplot as plt

from MDAnalysis.transformations.positionaveraging import PositionAverager

from MDAnalysis.analysis.rms import RMSD as serial_RMSD
from pmda.rms import RMSD as parallel_RMSD

u = mda.Universe(adk.topology, adk.trajectory)

pos_avg_trans = PositionAverager(1000)
u.trajectory.add_transformations(pos_avg_trans)

rmsd = serial_RMSD(u.atoms, u.atoms).run()
rmsd = parallel_RMSD(u.atoms, u.atoms).run(n_blocks=8)

# plot rmsd
  • Serial RMSD
    serial RMSD
  • Parallel RMSD
    parallel RMSD

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") 2.0.0-dev
  • Which version of Python (python -V)? 3.8
  • Which operating system? Ubuntu 20
@orbeckst
Copy link
Member

There are going to be some transformations that are not parallel-safe. It's great when we can make it work but that might not always be easy and require different algorithms.

Can we just add a boolean attribute to the TransformationBase class parallelizable = False and then PMDA and friends can check? If we know that the Transformation can be run in parallel, we set it explicitly to True.

@mnmelo
Copy link
Member

mnmelo commented Oct 20, 2020

Yes, I think this is precisely the case here. Position averaging is intrinsically history-dependent, and as such it'll not play nice with block parallelization.

@orbeckst
Copy link
Member

@yuxuanzhuang let's add parallelizable = False as an attribute to TransformationBase and have derived classes change it if they can be parallelized with split-apply-combine/block parallelization.

IAlibay pushed a commit that referenced this issue Apr 10, 2021
Fixes #2996 

## Work done in this PR
- Adds a TransformationBase class
- TransformationBase uses threadpoolctl to allow the number of threads used to be limited in order to improve performance.
- TransformationBase also includes a check for whether a transformation is parallelizable.
- Refactors existing transformation classes to use TransformationBase.
@mnmelo
Copy link
Member

mnmelo commented May 31, 2021

I am not sure I agree with the way this was implemented. parallelizable is now used as a kwarg to the Analysis __init__ to indicate parallelization compatibility. I think it'd have been much more pythonic to instead have parallelizable be a class attribute, since it should be a general characteristic of each Analysis, and not dependent on each instantiation.

Later, if the user wants to control parallelization from the instantiation/run of an Analysis, PMDA and friends will/should provide ways to force serial behavior.

What do you think? If you agree with a change, we're still in time to correct the API before 2.0.0.

@mnmelo mnmelo reopened this May 31, 2021
@yuxuanzhuang
Copy link
Contributor Author

I agree it is more pythonic to have it as a class attribute but given we don't yet have a definite API for parallel analysis nor is this parallelizable checked anywhere yet, it still feels less defined whether it is an internal indicator or something differs from instance to instance. For example, this parallelizable only indicates the ability to use this Transformation in block analysis, but might be different in other parallel conditions, e.g. parallel analysis among ensemble simulations. How should we deal with that?

@mnmelo
Copy link
Member

mnmelo commented May 31, 2021

First of all apologies that I mistakenly exemplified with the Analysis case, not the Transformations.

Regarding parallelizable I was assuming we were interpreting this as frame-wise 'split-apply-combine' parallelizability. It's perhaps best not to overload this single attr with other meanings.

Are there examples where the same transformation might be parallelizable or not, depending on intialization? I mean here the framewise parallelizability, but I guess we could discriminate multiple parallel possibilities if instead of a single attr we have this as a dict; i.e.: {'split-apply-combine': True, 'ensemble':True}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants