Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting out SetupUnit from SimulationUnit in NonEquilibriumCyclingProtocol #3

Merged
merged 12 commits into from
Nov 8, 2023

Conversation

dotsdl
Copy link
Member

@dotsdl dotsdl commented Sep 30, 2023

We want to separate out the creation of system, state, and integrator from the SimulationUnit in the NonEquilibriumCyclingProtocol in order to enable the swapping in of alternate ProtocolUnits for the SimulationUnit, for example for use with Folding@Home via alchemiscale-fah.
This PR aims to accomplish this.

@dotsdl
Copy link
Member Author

dotsdl commented Oct 3, 2023

Pausing on this until #4 is in.

@ijpulidos ijpulidos mentioned this pull request Oct 10, 2023
This avoids:
- serlializing/deserializing/remaking the HybridTopologyFactory in the
  SimulationUnit
- makes it easier to make a Folding@Home-compatible version, if we want
  to hold on to trajectory data in a similar way
@codecov-commenter
Copy link

Codecov Report

All modified lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@cf0c2de). Click here to learn what that means.

Additional details and impacted files
@@           Coverage Diff           @@
##             main       #3   +/-   ##
=======================================
  Coverage        ?   92.12%           
=======================================
  Files           ?        7           
  Lines           ?      419           
  Branches        ?        0           
=======================================
  Hits            ?      386           
  Misses          ?       33           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dotsdl dotsdl self-assigned this Oct 17, 2023
Comment on lines +318 to +355
def extract_positions(context, initial_atom_indices, final_atom_indices):
"""
Extract positions from initial and final systems based from the hybrid topology.

Parameters
----------
context: openmm.Context
Current simulation context where from extract positions.
hybrid_topology_factory: perses.annihilation.relative.HybridTopologyFactory
Hybrid topology factory where to extract positions and mapping information

Returns
-------

Notes
-----
It achieves this by taking the positions and indices from the initial and final states of
the transformation, and computing the overlap of these with the indices of the complete
hybrid topology, filtered by some mdtraj selection expression.

1. Get positions from context
2. Get topology from HTF (already mdtraj topology)
3. Merge that information into mdtraj.Trajectory
4. Filter positions for initial/final according to selection string
"""
import numpy as np

# Get positions from current openmm context
positions = context.getState(getPositions=True).getPositions(asNumpy=True)

# Get indices for initial and final topologies in hybrid topology
initial_indices = np.asarray(initial_atom_indices)
final_indices = np.asarray(final_atom_indices)

initial_positions = positions[initial_indices, :]
final_positions = positions[final_indices, :]

return initial_positions, final_positions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to maintain the capability of filtering the positions using an mdtraj selection, as it originally was in https://github.com/choderalab/feflow/blob/81e1b80ea762e0ea78bbde6fc5e68d774ceadb2b/feflow/protocols/nonequilibrium_cycling.py#L100 . Maybe we want to change the default to all? This does mean we would depend on mdtraj (maybe we can use mdanalysis for this if desired).

This should help reducing the storage used for trajectories when needed and might be useful when debugging/analyzing results, though I agree there are probably better ways to achieve this. One of them is creating the sampler (and reporter?) object in openmmtools for NEq cycling (analogue to what we do with MultiStateSampler for repex), and have that one handling all the trajectory/topologies serialization.

Are there any specific reasons other than not wanting to depend on mdtraj/mdanalysis that motivates this change?

Copy link
Contributor

@ijpulidos ijpulidos Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the selection capabilities are not being tested in CI. This is something that we would like to be testing if we decide to include it back again.

Copy link
Member Author

@dotsdl dotsdl Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly I was motivated to not have to serialize the hybrid_topology_factory in the SetupUnit and pass it to (or re-create it in) the SimulationUnit. So instead of passing that object around and into this method, we pass in just the positions. And since we don't have hybridt_topology_factory, we don't have hybrid_topology, so it's not so easy to do mdtraj manipulations anymore. But they seemed largely unnecessary?

I don't believe we use this method anywhere for trajectory storage or filtering. Isn't this method just used here, in this protocol? My view is that if we need do something different later, then we make it do that, but not pre-emptively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seemed like a feature that we weren't even using here, and its effect under default usage ("not water") was really nothing at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense? It's possible I've missed some important detail, so please correct me if so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this was being used is that when serializing trajectories we were extracting the positions and velocities of all things not water. It was just a way to not having to store them for everything and make the storage requirements a bit lighter, but it was used when storing the trajectories.

That said, I do think that the advantage of not having to pass the HTF object around might be a better compromise in this case, since the stored trajectories are not things we were using anyway. I think it's okay as it is now. If we end up needing to store only selections of atoms in the future we can give this another try.

@dotsdl
Copy link
Member Author

dotsdl commented Oct 31, 2023

@ijpulidos any additional questions on this PR? I may want to start working on #2 soon, and we'd want to build on this one in building that out.

@ijpulidos ijpulidos self-requested a review October 31, 2023 17:07
Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think we can revisit the atom selection storage if needed in the future.

Comment on lines +318 to +355
def extract_positions(context, initial_atom_indices, final_atom_indices):
"""
Extract positions from initial and final systems based from the hybrid topology.

Parameters
----------
context: openmm.Context
Current simulation context where from extract positions.
hybrid_topology_factory: perses.annihilation.relative.HybridTopologyFactory
Hybrid topology factory where to extract positions and mapping information

Returns
-------

Notes
-----
It achieves this by taking the positions and indices from the initial and final states of
the transformation, and computing the overlap of these with the indices of the complete
hybrid topology, filtered by some mdtraj selection expression.

1. Get positions from context
2. Get topology from HTF (already mdtraj topology)
3. Merge that information into mdtraj.Trajectory
4. Filter positions for initial/final according to selection string
"""
import numpy as np

# Get positions from current openmm context
positions = context.getState(getPositions=True).getPositions(asNumpy=True)

# Get indices for initial and final topologies in hybrid topology
initial_indices = np.asarray(initial_atom_indices)
final_indices = np.asarray(final_atom_indices)

initial_positions = positions[initial_indices, :]
final_positions = positions[final_indices, :]

return initial_positions, final_positions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this was being used is that when serializing trajectories we were extracting the positions and velocities of all things not water. It was just a way to not having to store them for everything and make the storage requirements a bit lighter, but it was used when storing the trajectories.

That said, I do think that the advantage of not having to pass the HTF object around might be a better compromise in this case, since the stored trajectories are not things we were using anyway. I think it's okay as it is now. If we end up needing to store only selections of atoms in the future we can give this another try.

@jchodera
Copy link

jchodera commented Nov 6, 2023

I think we're ready to merge this!

@ijpulidos ijpulidos merged commit 40c966b into main Nov 8, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants