Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No need to redirect to the original frame explicitly during deserialization #3722

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from

Conversation

yuxuanzhuang
Copy link
Contributor

@yuxuanzhuang yuxuanzhuang commented Jun 19, 2022

Fixes #3723
Partial fix: #3721

Changes made in this Pull Request:

  • no redirection to the original frame (explicitly) during deserialization
  • TextIOPicklable serializes the raw class instance instead of class name.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jun 19, 2022

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-04-03 14:10:03 UTC

@yuxuanzhuang yuxuanzhuang changed the title Not redirect to the original frame during deserialization (WIP) No need to redirect to the original frame explicitly during deserialization (WIP) Jun 19, 2022
@codecov
Copy link

codecov bot commented Jun 19, 2022

Codecov Report

Attention: Patch coverage is 95.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 93.64%. Comparing base (73acc9b) to head (1a6272e).

Files Patch % Lines
package/MDAnalysis/lib/picklable_file_io.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3722      +/-   ##
===========================================
- Coverage    93.66%   93.64%   -0.02%     
===========================================
  Files          168      180      +12     
  Lines        21248    22332    +1084     
  Branches      3917     3918       +1     
===========================================
+ Hits         19902    20913    +1011     
- Misses         888      961      +73     
  Partials       458      458              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@orbeckst
Copy link
Member

Once you need someone to review, feel free to ping/request reviewers.

I'd also suggest using the "convert to draft" in GitHub instead of just adding "WIP" to the issue title as it shows more intentionally if you currently want reviews or not.

@yuxuanzhuang yuxuanzhuang marked this pull request as draft June 23, 2022 09:47
Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably an improvement, but I think it'll be a change in behaviour so we'll need to raise warnings etc.

@yuxuanzhuang
Copy link
Contributor Author

This is probably an improvement, but I think it'll be a change in behaviour so we'll need to raise warnings etc.

The behaviour change is how the reading location TextIOPicklable is recovered. Before it is done by universe.trajectory[i]. Now it is directly serialized. Not sure if there's any explicitly warning that should be raised anywhere?

@yuxuanzhuang yuxuanzhuang marked this pull request as ready for review June 30, 2022 12:06
Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuxuanzhuang sorry for leaving this for a while. So is the behaviour now that serialising/deserialising a Reader preserves the Timestep correctly? I.e. if I'd modified the coordinates somehow, this data will be preserved?

@yuxuanzhuang yuxuanzhuang changed the title No need to redirect to the original frame explicitly during deserialization (WIP) No need to redirect to the original frame explicitly during deserialization Sep 12, 2022
@yuxuanzhuang
Copy link
Contributor Author

@richardjgowers Okay, I never thought about preserving the modification of coordinates (before).

But the behavior does change after this PR, i.e. now the modification is preserved (because no data reloading from the file is performed after this line is removed) which is more "correct".

u = mda.Universe(XYZ)

print('Orig pos:,', u.trajectory.ts.positions[0])
u.trajectory.ts.positions[0] = np.array([1, 2, 3])
print('New pos:,', u.trajectory.ts.positions[0])
u_new = pickle.loads(pickle.dumps(u))
print('After serializing pos:,', u_new.trajectory.ts.positions[0])

# original output
Orig pos:, [ 0.931 17.318 16.423]
New pos:, [1. 2. 3.]
After serializing pos:, [ 0.931 17.318 16.423]

# new output after #PR 3722
Orig pos:, [ 0.931 17.318 16.423]
New pos:, [1. 2. 3.]
After serializing pos:, [1. 2. 3.]

@richardjgowers
Copy link
Member

@yuxuanzhuang thanks for adding the test - it looks like it found that h5md format has a custom serialisation mechanism

@yuxuanzhuang
Copy link
Contributor Author

Should work now! (too many formats to keep track of :) )

@orbeckst
Copy link
Member

@yuxuanzhuang could you resolve the conflicts please?

@orbeckst orbeckst self-assigned this Dec 15, 2023
@orbeckst
Copy link
Member

@yuxuanzhuang and @richardjgowers this PR still looks relevant. Why did it stall? What needs to be done to complete it?

Copy link

github-actions bot commented Dec 16, 2023

Linter Bot Results:

Hi @yuxuanzhuang! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/8540187312/job/23396655061


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

@yuxuanzhuang
Copy link
Contributor Author

@yuxuanzhuang could you resolve the conflicts please?

Done! Should I worry about the linter bot?

@IAlibay
Copy link
Member

IAlibay commented Dec 16, 2023

@yuxuanzhuang could you resolve the conflicts please?

Done! Should I worry about the linter bot?

Unless it outlines a flake8 failure it's ok to ignore.

Just checked - no flake8 so you're good to go.

@yuxuanzhuang
Copy link
Contributor Author

I understand why TNGReader needs to re-read the current frame during serialization; however, due to this, the modification of ts cannot be preserved in the TNG format. If I comment it out, all the tests will pass.

        # make sure we re-read the current frame to update C level objects in
        # the file iterator
        self._read_frame(self._frame)

Do you think it's reasonable to only make sure _file_iterator reads the step but no change of ts? @hmacdope

        # convert from frames to integrator steps
        step = self._frame_to_step(self._frame + 1)
        iterator_step = self._file_iterator.read_step(step)

@hmacdope
Copy link
Member

I understand why TNGReader needs to re-read the current frame during serialization; however, due to this, the modification of ts cannot be preserved in the TNG format. If I comment it out, all the tests will pass.

        # make sure we re-read the current frame to update C level objects in
        # the file iterator
        self._read_frame(self._frame)

Do you think it's reasonable to only make sure _file_iterator reads the step but no change of ts? @hmacdope

        # convert from frames to integrator steps
        step = self._frame_to_step(self._frame + 1)
        iterator_step = self._file_iterator.read_step(step)

Just checking my understanding @yuxuanzhuang, this means that the coordinate data will be re-read, but that any additional data in ts will become invalid? I am just not sure exactly what is meant by change of ts. Perhaps a super quick example that demos the difference?

Sorry about this mess, tbh this is being caused by my lazyness from way back where I didn't make the cython level classes pickleable in pytng (its a bit complicated for __cinnit__ using classes). See MDAnalysis/pytng#97. If I am understanding correctly it would be better to just be able to pickle everything there, rather than forcing a re-read of the step? I can look at addressing this but may take a while as super busy. Perhaps better to go for an interim solution that you suggested above?

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny cosmetic things :)

@@ -48,6 +49,8 @@ Enhancements
(Issue #3994, PR #4281)
* Add support for reading chainID info from Autodock PDBQT files (Issue #4207,
PR #4284)
* Improve performance in Universe serialization by no explicit redirection to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newest first :)

@@ -198,6 +201,10 @@ Fixes
* Fix MSD docs to use the correct error metric in example (Issue #3991)
* Add 'PairIJ Coeffs' to the list of sections in LAMMPSParser.py
(Issue #3336)
* Fix failure in double-serialization of TextIOPicklable file reader.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -364,6 +371,8 @@ Changes
use `pip install ./package[extra_formats]` instead (PR #3810)
* `Universe.empty` emmits less warnings (PR #3814)
* adding element attribute to TXYZParser if all atom names are valid element symbols (PR #3826)
* TextIOPicklable serializes the raw class instance instead of class name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@orbeckst
Copy link
Member

@hmacdope and @richardjgowers can you please check if your raised concerns have been addressed?

If yes, can we merge?

Otherwise, are there bigger issues that we cannot solve?

Forgot to move entries from 2.7 to 2.8 in merge
moved an entry from 2.4.1(!) to 2.8.0
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update versionchanged to 2.8.0

Note that I had to find the CHANGELOG entries from different sections to correctly merge. Please double check the CHANGELOG @yuxuanzhuang

package/MDAnalysis/coordinates/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/picklable_file_io.py Outdated Show resolved Hide resolved
@yuxuanzhuang
Copy link
Contributor Author

I am just not sure exactly what is meant by change of ts. Perhaps a super quick example that demos the difference?

I mean modification of e.g. coordinates.

u = mda.Universe(TNG_traj_gro, TNG_traj)
reader = u.trajectory
reader.ts.positions[0] = np.array([1, 2, 3])
reader_p = pickle.loads(pickle.dumps(reader))
assert_equal(reader.ts, reader_p.ts,
                       "Modification of ts not preserved after serialization")

Previously, during unpickling, self._read_frame(self._frame) will 1) update self._file_iterator to the current frame. 2) modify self.ts by reading from self._file_iterator. This will discard any change to the timestep itself.

Now, only self._file_iterator will be reopened and updated. self.ts will be preserved during serialization.

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @yuxuanzhuang, thanks for pushing through on this. 👍

@orbeckst
Copy link
Member

orbeckst commented Apr 4, 2024

@richardjgowers is your review still pertinent or should I just dismiss it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to double-serialization for some coordinate Readers Improve Universe serialization performance
6 participants