Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip observables contained in particle groups #4615

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

PythonFZ
Copy link

@PythonFZ PythonFZ commented Jun 15, 2024

This is not a fix for #4598 but enables reading files that contain data in

observables
 \-- <group1>
 |    \-- <observable2>
 |         \-- step: Integer[variable]
 |         \-- time: Float[variable]
 |         \-- value: <type>[variable][D][D]

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4615.org.readthedocs.build/en/4615/

This file hasn't been touched in 9 months and I don't think there are some open PRs so there would be a chance to format the entire file using ruff which I accidently did here faafe8b - is this of interest?

@pep8speaks
Copy link

pep8speaks commented Jun 15, 2024

Hello @PythonFZ! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-06-25 01:35:53 UTC

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

github-actions bot commented Jun 15, 2024

Linter Bot Results:

Hi @PythonFZ! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/9655005645/job/26630186411


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

Copy link

codecov bot commented Jun 15, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 93.57%. Comparing base (cfa4438) to head (6faa14d).

Files Patch % Lines
package/MDAnalysis/coordinates/H5MD.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4615      +/-   ##
===========================================
- Coverage    93.60%   93.57%   -0.03%     
===========================================
  Files          171      183      +12     
  Lines        21235    22311    +1076     
  Branches      3933     3935       +2     
===========================================
+ Hits         19876    20877    +1001     
- Misses         899      974      +75     
  Partials       460      460              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PythonFZ PythonFZ force-pushed the h5md-skip-per-particles-observables branch from faafe8b to 078d168 Compare June 15, 2024 07:52
@orbeckst
Copy link
Member

@edisj and @ljwoods2 could you have a quick look at this PR and comment?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the logger but logging at every step is potentially a lot of output and performance hit. Furthermore, I don’t think users care. Thus I suggest you remove the logging and just pass.

We also need at least one test case.

@orbeckst
Copy link
Member

@hmacdope would you be able to shepherd this PR? If not switch it to me.

prec = 3
ext = 'h5md'

@pytest.fixture(scope='class')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be mistaken, but I think the test_n_frames method isn't testing the issue- if the reader was going to except out due to a key error, this would happen during the line in the universe fixture:

reader = H5MDReader(H5MD_energy, convert_units=True)

when the H5MDReader's __init__ is called and the first call to _read_next_timestep is made. The test method doesn't call any code that reads a frame, it just ends up calling the reader's n_frames method. I agree with your approach to fixing the issue, just maybe have the test initialize or initialize and iterate through the reader so that the potential issue will occur in the scope of the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure to update the CHANGELOG!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now. The CI failing seems not related to this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick- can you change the name of the test to make it represent what it is testing? Also, the assert statement seems unrelated to what the test is testing. Maybe write an except and pytest.fail statement upon raising a KeyError but otherwise don't have an assert statement. Here's an example from testsuite/coordinates/base.py:

    def test_frame_jump_issue1942(self, ref, reader):
        """Test for issue 1942 (especially XDR on macOS)"""
        reader.rewind()
        try:
            for ii in range(ref.n_frames + 2):
                reader[0]
        except StopIteration:
            pytest.fail("Frame-seeking wrongly iterated (#1942)")

Otherwise LGTM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test and used pytest.fail. I hope the test structure now is a bit cleaner. Thanks for the review.

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This is a good start until we decide what to do with the other form of observables.

@@ -231,6 +232,7 @@ class MockH5pyFile:
HAS_H5PY = True



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo blank line change.

Comment on lines 912 to 915
try:
_ = H5MDReader(H5MD_energy, convert_units=True)
except KeyError:
pytest.fail("Could not read H5MD file with observables.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just have

_ = H5MDReader(H5MD_energy, convert_units=True)
As the body of the test, no need to try, except. Pytest can catch all the errors.

See also comment about warnings, we should catch suggested warning.

for key in self._file['observables'].keys():
self.ts.data[key] = self._file['observables'][key][
'value'][self._frame]
if "observables" in self._file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should issue a warning that observables of the form observables/group1/observable1 are being ignored if present.

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW @PythonFZ the following will successfully read into ts.data with the key form group/observable

    def _copy_to_data(self):
        """assigns values to keys in data dictionary"""

        if "observables" in self._file:
            for key in self._file["observables"].keys():
                try:
                    # if has value as subkey read directly into data
                    if "value" in self._file["observables"][key]:
                        self.ts.data[key] = self._file["observables"][key][
                            "value"
                        ][self._frame]
                    # if value is not a subkey, read dict of subkeys
                    else:
                        for subkey in self._file["observables"][key].keys():
                            self.ts.data[key + "/" + subkey] = self._file["observables"][key][
                                subkey
                            ]["value"][self._frame]
                except KeyError:
                    warnings.warn(f"Could not read {key} from observables group, not a legal H5MD observable specification or data not ")

@PythonFZ
Copy link
Author

@hmacdope What is the expected way to auto-format the code? I'm used to running either black or ruff format but once I run that on the code base it will yield to changes in most of the files. I haven't found anything at https://userguide.mdanalysis.org/stable/contributing.html?

Maybe there are some pre-commit hooks I haven't found?

@orbeckst
Copy link
Member

orbeckst commented Jun 18, 2024 via email

@@ -372,6 +373,7 @@
TPR_xvf = (_data_ref / 'cobrotoxin.tpr').as_posix()
TRR_xvf = (_data_ref / 'cobrotoxin.trr').as_posix()
H5MD_xvf = (_data_ref / 'cobrotoxin.h5md').as_posix()
H5MD_energy = (_data_ref / 'cu.h5').as_posix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the convention for other h5md files out there to use the *.h5 extension rather than *.h5md?

Copy link
Author

@PythonFZ PythonFZ Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename the file to *.h5md. I've seen both and personally prefer *.h5 because I was lazy and did not configure the VS Code H5WEB extension to also read *.h5md. Afaik H5MD is the only format specification for H5 files, so there is no risk of confusion.

Edit: I haven't seen a recommendation for the *.h5md suffix in the H5MD definition so I would actually prefer keeping the more general *.h5.

@hmacdope
Copy link
Member

@PythonFZ lets go with .h5md as I think the missing extension may be causing some test failures. Other than that this is very close.

@PythonFZ
Copy link
Author

@PythonFZ lets go with .h5md as I think the missing extension may be causing some test failures. Other than that this is very close.

I renamed the files and ensured everything is properly formatted. I hope everything is good to go?

@PythonFZ PythonFZ requested a review from hmacdope June 27, 2024 18:34
@hmacdope
Copy link
Member

There seems to be some failing test on the azure runner. Can your reproduce locally @PythonFZ ?

@PythonFZ
Copy link
Author

There seems to be some failing test on the azure runner. Can your reproduce locally @PythonFZ ?

All tests are passing for me locally. From what I see in the logs, it is the following test (correct me if I am wrong)

/home/vsts/work/1/s/testsuite/MDAnalysisTests/parallelism/test_multiprocessing.py

All tests from that file run locally for me. Also, these tests should not have been affected by this PR and the tests I changed also all run locally. The same seems to be true for the GH hosted runners. I'm a bit lost here on how to fix this, because I don't know the differences between the Azure runner and the GH runners.

@orbeckst
Copy link
Member

orbeckst commented Jun 28, 2024 via email

@hmacdope
Copy link
Member

Let me kick CI IIRC there were some failing tests on mdanalysis-CI (Azure_Tests Linux-Python310-64bit-full-wheel)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants