Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip observables contained in particle groups #4615

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package/AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ Chronological list of authors
- Sampurna Mukherjee
- Leon Wehrhan
- Valerij Talagayev
- Fabian Zills



Expand Down
13 changes: 9 additions & 4 deletions package/MDAnalysis/coordinates/H5MD.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@
:members:
"""
import contextlib

import numpy as np
import MDAnalysis as mda
Expand All @@ -231,6 +232,7 @@ class MockH5pyFile:
HAS_H5PY = True



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo blank line change.

class H5MDReader(base.ReaderBase):
r"""Reader for the H5MD format.
Expand Down Expand Up @@ -673,10 +675,13 @@ def _read_frame(self, frame):
def _copy_to_data(self):
"""assigns values to keys in data dictionary"""

if 'observables' in self._file:
for key in self._file['observables'].keys():
self.ts.data[key] = self._file['observables'][key][
'value'][self._frame]
if "observables" in self._file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should issue a warning that observables of the form observables/group1/observable1 are being ignored if present.

for key in self._file["observables"].keys():
with contextlib.suppress(KeyError):
self.ts.data[key] = self._file["observables"][key][
"value"
][self._frame]


# pulls 'time' and 'step' out of first available parent group
for name, value in self._has.items():
Expand Down
20 changes: 19 additions & 1 deletion testsuite/MDAnalysisTests/coordinates/test_h5md.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@
from MDAnalysis.coordinates.H5MD import HAS_H5PY
if HAS_H5PY:
import h5py
from MDAnalysis.coordinates.H5MD import H5MDReader
from MDAnalysis.exceptions import NoDataError
from MDAnalysisTests import make_Universe
from MDAnalysisTests.datafiles import (H5MD_xvf, TPR_xvf, TRR_xvf,
COORDINATES_TOPOLOGY,
COORDINATES_H5MD)
COORDINATES_H5MD, H5MD_energy)
from MDAnalysisTests.coordinates.base import (MultiframeReaderTest,
BaseReference, BaseWriterTest,
assert_timestep_almost_equal)
Expand Down Expand Up @@ -894,3 +895,20 @@ def test_writer_no_h5py(self, Writer, outfile):
u.atoms.n_atoms) as W:
for ts in u.trajectory:
W.write(universe)

@pytest.mark.skipif(not HAS_H5PY, reason="h5py not installed")
class TestH5MDReaderWithObservables(object):
"""Read H5MD file with 'observables/atoms/energy'."""

prec = 3
ext = 'h5md'

@pytest.fixture(scope='class')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be mistaken, but I think the test_n_frames method isn't testing the issue- if the reader was going to except out due to a key error, this would happen during the line in the universe fixture:

reader = H5MDReader(H5MD_energy, convert_units=True)

when the H5MDReader's __init__ is called and the first call to _read_next_timestep is made. The test method doesn't call any code that reads a frame, it just ends up calling the reader's n_frames method. I agree with your approach to fixing the issue, just maybe have the test initialize or initialize and iterate through the reader so that the potential issue will occur in the scope of the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure to update the CHANGELOG!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now. The CI failing seems not related to this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick- can you change the name of the test to make it represent what it is testing? Also, the assert statement seems unrelated to what the test is testing. Maybe write an except and pytest.fail statement upon raising a KeyError but otherwise don't have an assert statement. Here's an example from testsuite/coordinates/base.py:

    def test_frame_jump_issue1942(self, ref, reader):
        """Test for issue 1942 (especially XDR on macOS)"""
        reader.rewind()
        try:
            for ii in range(ref.n_frames + 2):
                reader[0]
        except StopIteration:
            pytest.fail("Frame-seeking wrongly iterated (#1942)")

Otherwise LGTM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test and used pytest.fail. I hope the test structure now is a bit cleaner. Thanks for the review.

def universe(self):
u = mda.Universe.empty(n_atoms=108, trajectory=True)
reader = H5MDReader(H5MD_energy, convert_units=True)
u.trajectory = reader
return u

def test_n_frames(self, universe):
assert len(universe.trajectory) == 20
Binary file added testsuite/MDAnalysisTests/data/cu.h5
Binary file not shown.
2 changes: 2 additions & 0 deletions testsuite/MDAnalysisTests/datafiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
"GRO_sameresid_diffresname", # Case where two residues share the same resid
"PDB_xvf", "TPR_xvf", "TRR_xvf", # Gromacs coords/veloc/forces (cobrotoxin, OPLS-AA, Gromacs 4.5.5 tpr)
"H5MD_xvf", # TPR_xvf + TRR_xvf converted to h5md format
"H5MD_energy", # H5MD trajectory with observables/atoms/energy
"XVG_BZ2", # Compressed xvg file about cobrotoxin
"PDB_xlserial",
"TPR400", "TPR402", "TPR403", "TPR404", "TPR405", "TPR406", "TPR407",
Expand Down Expand Up @@ -372,6 +373,7 @@
TPR_xvf = (_data_ref / 'cobrotoxin.tpr').as_posix()
TRR_xvf = (_data_ref / 'cobrotoxin.trr').as_posix()
H5MD_xvf = (_data_ref / 'cobrotoxin.h5md').as_posix()
H5MD_energy = (_data_ref / 'cu.h5').as_posix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the convention for other h5md files out there to use the *.h5 extension rather than *.h5md?

Copy link
Author

@PythonFZ PythonFZ Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename the file to *.h5md. I've seen both and personally prefer *.h5 because I was lazy and did not configure the VS Code H5WEB extension to also read *.h5md. Afaik H5MD is the only format specification for H5 files, so there is no risk of confusion.

Edit: I haven't seen a recommendation for the *.h5md suffix in the H5MD definition so I would actually prefer keeping the more general *.h5.

XVG_BZ2 = (_data_ref / 'cobrotoxin_protein_forces.xvg.bz2').as_posix()

XPDB_small = (_data_ref / '5digitResid.pdb').as_posix()
Expand Down
Loading