PositionAverager Transformation ends up with wrong results with parallel analysis #2996

yuxuanzhuang · 2020-10-19T07:54:27Z

Expected behavior

The results are the same as serial analysis

Actual behavior

Due to the splitting approach, a new PositionAverager will be created for each block; no previous memory (self.coord_array) is saved.

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysisData.adk_equilibrium import fetch_adk_equilibrium
adk = fetch_adk_equilibrium()
import matplotlib.pyplot as plt

from MDAnalysis.transformations.positionaveraging import PositionAverager

from MDAnalysis.analysis.rms import RMSD as serial_RMSD
from pmda.rms import RMSD as parallel_RMSD

u = mda.Universe(adk.topology, adk.trajectory)

pos_avg_trans = PositionAverager(1000)
u.trajectory.add_transformations(pos_avg_trans)

rmsd = serial_RMSD(u.atoms, u.atoms).run()
rmsd = parallel_RMSD(u.atoms, u.atoms).run(n_blocks=8)

# plot rmsd

Serial RMSD
Parallel RMSD

Current version of MDAnalysis

Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") 2.0.0-dev
Which version of Python (python -V)? 3.8
Which operating system? Ubuntu 20

The text was updated successfully, but these errors were encountered:

orbeckst · 2020-10-19T20:49:50Z

There are going to be some transformations that are not parallel-safe. It's great when we can make it work but that might not always be easy and require different algorithms.

Can we just add a boolean attribute to the TransformationBase class parallelizable = False and then PMDA and friends can check? If we know that the Transformation can be run in parallel, we set it explicitly to True.

mnmelo · 2020-10-20T11:39:30Z

Yes, I think this is precisely the case here. Position averaging is intrinsically history-dependent, and as such it'll not play nice with block parallelization.

orbeckst · 2020-10-21T22:10:11Z

@yuxuanzhuang let's add parallelizable = False as an attribute to TransformationBase and have derived classes change it if they can be parallelized with split-apply-combine/block parallelization.

Fixes #2996 ## Work done in this PR - Adds a TransformationBase class - TransformationBase uses threadpoolctl to allow the number of threads used to be limited in order to improve performance. - TransformationBase also includes a check for whether a transformation is parallelizable. - Refactors existing transformation classes to use TransformationBase.

mnmelo · 2021-05-31T01:57:20Z

I am not sure I agree with the way this was implemented. parallelizable is now used as a kwarg to the Analysis __init__ to indicate parallelization compatibility. I think it'd have been much more pythonic to instead have parallelizable be a class attribute, since it should be a general characteristic of each Analysis, and not dependent on each instantiation.

Later, if the user wants to control parallelization from the instantiation/run of an Analysis, PMDA and friends will/should provide ways to force serial behavior.

What do you think? If you agree with a change, we're still in time to correct the API before 2.0.0.

yuxuanzhuang · 2021-05-31T13:36:11Z

I agree it is more pythonic to have it as a class attribute but given we don't yet have a definite API for parallel analysis nor is this parallelizable checked anywhere yet, it still feels less defined whether it is an internal indicator or something differs from instance to instance. For example, this parallelizable only indicates the ability to use this Transformation in block analysis, but might be different in other parallel conditions, e.g. parallel analysis among ensemble simulations. How should we deal with that?

mnmelo · 2021-05-31T15:18:20Z

First of all apologies that I mistakenly exemplified with the Analysis case, not the Transformations.

Regarding parallelizable I was assuming we were interpreting this as frame-wise 'split-apply-combine' parallelizability. It's perhaps best not to overload this single attr with other meanings.

Are there examples where the same transformation might be parallelizable or not, depending on intialization? I mean here the framewise parallelizability, but I guess we could discriminate multiple parallel possibilities if instead of a single attr we have this as a dict; i.e.: {'split-apply-combine': True, 'ensemble':True}.

orbeckst added Component-Transformations On-the-fly transformations parallelization labels Oct 19, 2020

yuxuanzhuang mentioned this issue Nov 4, 2020

Limit numpy thread usage for Transformation classes #2950

Merged

4 tasks

IAlibay closed this as completed in #2950 Apr 10, 2021

mnmelo reopened this May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PositionAverager Transformation ends up with wrong results with parallel analysis #2996

PositionAverager Transformation ends up with wrong results with parallel analysis #2996

yuxuanzhuang commented Oct 19, 2020

orbeckst commented Oct 19, 2020

mnmelo commented Oct 20, 2020

orbeckst commented Oct 21, 2020

mnmelo commented May 31, 2021

yuxuanzhuang commented May 31, 2021

mnmelo commented May 31, 2021

PositionAverager Transformation ends up with wrong results with parallel analysis #2996

PositionAverager Transformation ends up with wrong results with parallel analysis #2996

Comments

yuxuanzhuang commented Oct 19, 2020

Expected behavior

Actual behavior

Code to reproduce the behavior

Current version of MDAnalysis

orbeckst commented Oct 19, 2020

mnmelo commented Oct 20, 2020

orbeckst commented Oct 21, 2020

mnmelo commented May 31, 2021

yuxuanzhuang commented May 31, 2021

mnmelo commented May 31, 2021