Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCD reader copies timestep while other readers update it. #3889

Open
hmacdope opened this issue Oct 27, 2022 · 11 comments
Open

DCD reader copies timestep while other readers update it. #3889

hmacdope opened this issue Oct 27, 2022 · 11 comments
Assignees
Milestone

Comments

@hmacdope
Copy link
Member

hmacdope commented Oct 27, 2022


If you got a warning

MDAnalysis/coordinates/DCD.py:165: DeprecationWarning: DCDReader currently makes independent timesteps by copying self.ts while other readers update self.ts inplace. This behavior will be changed in 3.0 to be the same as other readers. Read more at #3889 to learn if this change in behavior might affect you.
warnings.warn("DCDReader currently makes independent timesteps"

and now you are wondering what this is all about, please see explanation below.


Expected behavior

The readers should all have the same behaviour with respect to iterating through the trajectory and changing the base values in the position attribute.

Actual behavior

The DCD reader creates a new copy of the current timestep which allows the coordinates in each frame to vary independently. This is best illustrated in the below code snippet.

Copying the timestep incurs a ~30% performance penalty.

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD,  GRO, PDB, TPR, XTC, TRR,  PRMncdf, NCDF

u = mda.Universe(PSF, DCD)
frames = [2,3,3,1]
selection  = u.trajectory[frames] 
positions = []
for ts in selection:
    print(ts)
    positions.append(ts.positions)

positions # each element of the list will be different. 

u = mda.Universe(GRO, XTC)
frames = [2,3,3,1]
selection  = u.trajectory[frames] 
positions = []
for ts in selection:
    print(ts)
    positions.append(ts.positions)

positions # each element of the list will be the same (the coordinates in frame 1). 

A number of tests seem to rely on this behaviour (test_atomgroup_write) and some of the LAMMPSDumpDCD tests

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") 2.4.0-dev0
@hmacdope hmacdope added this to the Release 3.0 milestone Oct 27, 2022
@hmacdope hmacdope self-assigned this Oct 27, 2022
@hmacdope hmacdope mentioned this issue Oct 27, 2022
4 tasks
@orbeckst
Copy link
Member

I don’t remember if this was intentional or not. Did you run git blame to see who added the line?

It might have been an attempt to make the behavior of ts more intuitive (namely for the case of pulling a list of ts). But I’m with you: this should be consistent and the performance is important, especially as the DCD reader became a lot slower when we switched to cython (but it was necessary for maintainability).

@orbeckst
Copy link
Member

Changing this behavior counts as a “fix” for me—I don’t think it was ever promised to behave in the current way.

@hmacdope
Copy link
Member Author

I'll add proposed changes to #3888

@hmacdope
Copy link
Member Author

Now that there is an issue for adding a deprecation warning #3923 this can be the main issue to actually deprecate the behaviour pinned to 3.0.

orbeckst pushed a commit that referenced this issue Nov 29, 2022
* Fixes #3882 and #3923
* Improves the cython in the DCD reader to be more compiled.
* add annotate_cython setup.cfg option
* add types and fix is_periodic
* make int cast explicit
* change whence dict
* move DCD reader to no copy API and update tests accordingly
* add temporary buffer to DCD reader
* add module level directives
* pxdify libdcd: add new lib/formats/libdcd.pxd
* Add deprecation warning for `timestep` copying in DCDReader (Issue #3889)
* add DCD to public libmdanalysis API
* update tests
* update CHANGELOG
@orbeckst
Copy link
Member

If the finding in #4008 is true that DCDReader effectively ignores transformations then we need to fix this here asap and not just do a deprecation and change in 3.0.

@hmacdope
Copy link
Member Author

hmacdope commented Feb 13, 2023 via email

@orbeckst
Copy link
Member

orbeckst commented Feb 19, 2023

You got a warning

MDAnalysis/coordinates/DCD.py:165: DeprecationWarning: DCDReader currently makes independent timesteps by copying self.ts while other readers update self.ts inplace. This behavior will be changed in 3.0 to be the same as other readers. Read more at #3889 to learn if this change in behavior might affect you.
warnings.warn("DCDReader currently makes independent timesteps"

and now you are wondering what this is all about. Read on.

Does the upcoming change in DCD reading behavior affect you?

Possibly YES if

  1. you are using trajectories in DCD format
  2. you are using container attributes of the Timestep directly (such as positions, velocities, forces, dimensions), i.e., your code includes something like ts.positions or ts.dimensions.
  3. you store these data or slices of them in another data structure (eg a list) with the goal to process them after the trajectory has been read

In MDAnalysis 1.x and 2.x, the following works (for DCD only!) in the sense that you can pull out a list of coordinates from a trajectory:

import MDAnalysis as mda
from MDAnalysis.tests import datafiles as data
import numpy as np

u = mda.Universe(data.PSF, data.DCD)                       # load a DCD file

### ONLY WORKS AS INTENDED IN MDAnalysis < 3.0
all_coordinates = [ts.positions for ts in u.trajectory]    # store ALL coordinates
all_coordinates = np.array(all_coordinates)                # make numpy array

Check that all coordinates are identical, frame by frame

for i, ts in enumerate(u.trajectory):
   match = np.allclose(ts.positions, all_coordinates[i])
   if not match:
       print(f"Coordinate mismatch at frame {i}")
print("done")

With MDAnalysis < 3.0 the above will NOT find any mismatches.

# current MDAnalysis 
done

However, with MDAnalysis >= 3.0 you will get mismatches beyond frame 0:

# MDAnalysis >= 3.0
Coordinate mismatch at frame 1
Coordinate mismatch at frame 2
Coordinate mismatch at frame 3
...
done

What to do?

If your code relies on the behavior of the DCDReader to make a copy of Timestep then you must change your code or it will produce WRONG results.

The easiest change is to make a copy of any data yourself. This means for the positions array (and other arrays/data structures of Timestep) to use the appropriate copy methods. For example, for ts.positions, change

ts.positions

to

ts.positions.copy()

(positions is a NumPy array, which has the copy() method).

The code from above would then be written as

all_coordinates = [ts.positions.copy() for ts in u.trajectory]   # store ALL coordinates

See also

For the specific application of pulling coordinates out of a trajectory, all trajectory readers have the timeseries() method that you can use instead of writing an explicit loop over the trajectory.

Using timeseries is preferred because it reliably works in all versions of MDAnalysis with all trajectory formats.

@hmacdope
Copy link
Member Author

@orbeckst thanks for the clarification, much better than my developer speak.

@RMCrean
Copy link

RMCrean commented Apr 4, 2023

Hello, I've run into this warning (so thank you for placing it). Please could you confirm if I have understood the above messages correctly and now configured my code to avoid this issue once version 3.0 is released? I've tried to provide a minimal example so hopefully that is okay to just glance over.

For what it is worth, different trajectory file types work fine with this process.

res1_sele = "not name H* and resid " + str(res1)
res1_atoms = universe.select_atoms(res1_sele)

res2_sele = "not name H* and resid " + str(res2)
res2_atoms = universe.select_atoms(res2_sele)

contact_scores = []
for timestep in universe.trajectory:
    res_res_dists = contacts.distance_array(
        res1_atoms.positions.copy(), 
        res2_atoms.positions.copy()
    )
    contact_score = _score_residue_contact(res_res_dists)
    contact_scores.append(contact_score)

Thanks!

@orbeckst
Copy link
Member

orbeckst commented Apr 4, 2023

Did you change the lines

        res1_atoms.positions.copy(), 
        res2_atoms.positions.copy()

from originally

        res1_atoms.positions, 
        res2_atoms.positions

?

If this is the case, then there are two reasons why the original lines can remain (no copy() needed):

  1. AtomGroup.positions actually already behaves differently from Timestep.positions and always make a copy so you shouldn't make another copy. Thus, this issue does not actually directly pertain to you. Unfortunately, it's not really possible for us to make the warning only appear if someone uses Timestep.positions directly.
  2. Even if you had used ts.positions directly, you wouldn't need copy() because contacts.distance_array() uses the information in positions right away and produces a new data structure for every frame in the trajectory.

@RMCrean
Copy link

RMCrean commented Apr 5, 2023

Thanks for the quick response! Yes it went exactly as you described, originally I didn't have the .copy() but after seeing the warning I thought it might be safer to place it just in case.

And thanks for the explanation, it's good to know that I can remove the .copy() and safely ignore this warning.

RMCrean added a commit to kamerlinlab/KIF that referenced this issue Apr 5, 2023
Checked and validated the warning is not relevant for this code, see: MDAnalysis/mdanalysis#3889
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

3 participants