Skip to content

Conversation

@jla-gardner
Copy link
Contributor

Hi there 👋 thanks for the awesome work with torch-sim 😊

Summary

I wrote and maintain graph-pes, a package for creating and training models of the potential energy surface that act on graph representations of atomic structures. It would be amazing if my users could make use of torch-sim to run fast MD in a hassle free manner! (I currently have LAMMPS and ASE calculators which are annoying to set up and ~slow respectively).

This PR adds support for using arbitrary GraphPESModels from the graph-pes package within torch-sim.

I've also added a small tutorial notebook to give a concrete example of how to use the GraphPESWrapper class in the docs.

Checklist

  • Doc strings have been added in the Numpy docstring format.
  • Run ruff on your code.
  • Tests have been added for any new functionality or bug fixes.
  • All linting and tests pass.

@cla-bot cla-bot bot added the cla-signed Contributor license agreement signed label Apr 8, 2025
Copy link
Collaborator

@orionarcher orionarcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jla-gardner, thanks for the PR! I would be very happy to have this interface in TorchSim.

For added context, I want to point you to our current posture on model interfaces, just to share how we are thinking about supporting models over time.

I've included a few comments to help get the tests working and make sure that the code correctly interfaces with TorchSim.

@jla-gardner
Copy link
Contributor Author

Hi @jla-gardner, thanks for the PR! I would be very happy to have this interface in TorchSim.

For added context, I want to point you to our current posture on model interfaces, just to share how we are thinking about supporting models over time.

I've included a few comments to help get the tests working and make sure that the code correctly interfaces with TorchSim.

Thanks for the extremely quick response! Yes this is slightly a-typical as being an interface to an interface. Going through all your comments point by point now.

@janosh janosh added enhancement New feature or request ecosystem Comp-chem ecosystem related labels Apr 8, 2025
@jla-gardner
Copy link
Contributor Author

jla-gardner commented Apr 9, 2025

Hi guys - I'm looking into the test failures from the graph-pes side. I note that you've had to use a high rtol and atol in your dedicated test_mace.py file, so it is perhaps unexpected that I am also running into some inconsistency errors.

I'm seeing that the tests are only failing for the rattled_sio2_sim_state: from what I can see, this rattles atoms significantly! In particular, one of the oxygen atoms is moved by 1.8Å, and ends up in close proximity to a Si - my hypothesis is that this is therefore a very high energy + high force conformation, and so small precision effects can lead to a large absolute (and relative) differences. Perhaps less extreme rattling here would make this test also pass for your dedicated mace test.

I've pushed a change to that effect now as a test

@CompRhys
Copy link
Member

CompRhys commented Apr 9, 2025

Hi guys - I'm looking into the test failures from the graph-pes side. I note that you've had to use a high rtol and atol in your dedicated test_mace.py file, so it is perhaps unexpected that I am also running into some inconsistency errors.

I'm seeing that the tests are only failing for the rattled_sio2_sim_state: from what I can see, this rattles atoms significantly! In particular, one of the oxygen atoms is moved by 1.8Å, and ends up in close proximity to a Si - my hypothesis is that this is therefore a very high energy + high force conformation, and so small precision effects can lead to a large absolute (and relative) differences. Perhaps less extreme rattling here would make this test also pass for your dedicated mace test.

I've pushed a change to that effect now as a test

Yes I did reduce the MACE tolerances due to the SiO2 rattling test breaking. It was on my TODO list to dig a little bit deeper on that incase it actually is a real difference vs numerical non-determinism that might be exaggerated by the MACE architecture. It's worth noting that I do not see that rattling test breaking locally when running on M3 macbook.

@abhijeetgangan
Copy link
Collaborator

abhijeetgangan commented Apr 9, 2025

I did some tests locally and it seems that the small model predicts very high energy and large forces. If you switch to a better model like medium-mpa-0 or medium-omat-0 it will pass the test.

@jla-gardner
Copy link
Contributor Author

jla-gardner commented Apr 9, 2025

All tests pass locally on my MacBook M2 too for what it's worth. Do you have a preference between reducing the rattling amplitude and using a larger model (as suggested by @abhijeetgangan!) for testing to keep this test passing?

@abhijeetgangan
Copy link
Collaborator

If the larger models don't have a big overhead in testing then I would probably use those. @CompRhys What do you think?

@CompRhys
Copy link
Member

CompRhys commented Apr 9, 2025

If the larger models don't have a big overhead in testing then I would probably use those. @CompRhys What do you think?

All tests pass locally on my MacBook M2 too for what it's worth. Do you have a preference between reducing the rattling amplitude and using a larger model (as suggested by @abhijeetgangan!) for testing to keep this test passing?

biggest MACE still smaller than smallest Fairchem so I would suspect it wouldn't add too much time to the tests. I would be happy to swap to testing medium-mpa-0. ASL license means that we probably want to avoid medium-omat-0 to make sure there's no potential to misconstrue torch-sim as a commercial use and create a headache.

@abhijeetgangan
Copy link
Collaborator

If the larger models don't have a big overhead in testing then I would probably use those. @CompRhys What do you think?

All tests pass locally on my MacBook M2 too for what it's worth. Do you have a preference between reducing the rattling amplitude and using a larger model (as suggested by @abhijeetgangan!) for testing to keep this test passing?

biggest MACE still smaller than smallest Fairchem so I would suspect it wouldn't add too much time to the tests. I would be happy to swap to testing medium-mpa-0. ASL license means that we probably want to avoid medium-omat-0 to make sure there's no potential to misconstrue torch-sim as a commercial use.

medium-mpa-0 should be good enough in that case.

@jla-gardner
Copy link
Contributor Author

updated to use medium-mpa-0

@jla-gardner
Copy link
Contributor Author

Looking into these continuing errors on the graph-pes side tonight and tomorrow morning to see if I can ascertain their root cause and reproduce locally - I'm on UK time so I'm hoping to have fixed these issues by tomorrow. Sorry for the delays, and thanks for all your help so far with this everyone 😄

@jla-gardner
Copy link
Contributor Author

Hi guys, after some investigating I have found that MACE models are susceptible to surprisingly high variance when one changes the order of the entries in the neighbour list (~1e-6 absolute difference in forces on my MacBook M2, so perhaps larger on the ubuntu-latest runners which seem to be the ones most often failing above)

I have ensured that the graph-pes ASE calculator and GraphPESWrapper instances now use the same ordering, and am hoping that this resolves the issue in the CI 🤞

Copy link
Collaborator

@orionarcher orionarcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (one tiny nit), thanks for the quick edits @jla-gardner!

I pushed a small change to docs.yml that will test the docs build in the branch. That should be in main anyway and I'd like to check it before merging.

calculator_fixture_name="ase_mace_calculator",
sim_state_names=consistency_test_simstate_fixtures,
# see test_mace.py for similar issue
rtol=6e-4, # FIXME: unclear why this needs to be so high for mace. # noqa: FIX001
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you take out the comment if this has been resolved or add a note explaining your findings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentatively removed the high rtol requirements in the latest commit to see whether this is actually completely resolved. I fear that it won't be, as graph-pes is just passing data straight through to mace-torch, and so if the base MACE model requires these high rtols in test_mace.py, we probably need them here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that didn't work. Reverting now and adding comments to this effect

@orionarcher
Copy link
Collaborator

One more thing, could you modify docs/conf.py to include the model in autodoc_mock_imports? Per step 5 in Adding New Models

@orionarcher
Copy link
Collaborator

To get the docs build to work you'll need to add docs.yml file to include a graph-pes install. This isn't the most scalable approach but it works fine for now, I can loop back in a future PR if it gets annoying.

@jla-gardner
Copy link
Contributor Author

To get the docs build to work you'll need to add docs.yml file to include a graph-pes install. This isn't the most scalable approach but it works fine for now, I can loop back in a future PR if it gets annoying.

Added this but have now managed to break the docs build further down the track with the cryptic error "Extension error (sphinx.ext.autosummary): Handler <function process_generate_options at 0x7ff7fd71a980> for event 'builder-inited' threw an exception (exception: no module named torch_sim)"

Clearly my graph-pes install has made the docs fall over somewhere - trying to reproduce locally.

@orionarcher
Copy link
Collaborator

orionarcher commented Apr 10, 2025

Hmmm, very strange. @janosh has encountered that error when trying to build the docs locally, too. He might have input. I'll take a look too.

Ignore the 5.3 failure, thats intermittent.

@jla-gardner
Copy link
Contributor Author

Hmmm, very strange. @janosh has encountered that error when trying to build the docs locally, too. He might have input. I'll take a look too.

I've reproduced locally - I think there is a subtle incompatibility between latest versions of graph-pes and mace-torch.
This leads to an error that gets caught during doc-building, but the actual error itself gets hidden somewhere up the stack trace.
I'm fixing this incompatibility over on the graph-pes repo, and adding more tests to catch for this - thanks for the rigorous tests here that have led me to find this!

Will update with a new minimum requirement for graph-pes in due course, and that should fix these errors 😄

@jla-gardner
Copy link
Contributor Author

Hmmm, very strange. @janosh has encountered that error when trying to build the docs locally, too. He might have input. I'll take a look too.

I've reproduced locally - I think there is a subtle incompatibility between latest versions of graph-pes and mace-torch. This leads to an error that gets caught during doc-building, but the actual error itself gets hidden somewhere up the stack trace. I'm fixing this incompatibility over on the graph-pes repo, and adding more tests to catch for this - thanks for the rigorous tests here that have led me to find this!

Will update with a new minimum requirement for graph-pes in due course, and that should fix these errors 😄

Updated. docs are now building locally with the newly pinned graph-pes version.

@orionarcher
Copy link
Collaborator

Thanks for the contribution @jla-gardner. Good to merge?

@jla-gardner
Copy link
Contributor Author

Thanks for the contribution @jla-gardner. Good to merge?

Absolutely 😄 thanks once again for the sterling work with TorchSim!

@orionarcher orionarcher merged commit caa5423 into TorchSim:main Apr 10, 2025
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed Contributor license agreement signed ecosystem Comp-chem ecosystem related enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants