Splitting out SetupUnit from SimulationUnit in NonEquilibriumCyclingProtocol #3

dotsdl · 2023-09-30T02:16:59Z

We want to separate out the creation of system, state, and integrator from the SimulationUnit in the NonEquilibriumCyclingProtocol in order to enable the swapping in of alternate ProtocolUnits for the SimulationUnit, for example for use with Folding@Home via alchemiscale-fah.
This PR aims to accomplish this.

…rotocol

dotsdl · 2023-10-03T04:59:34Z

Pausing on this until #4 is in.

This avoids: - serlializing/deserializing/remaking the HybridTopologyFactory in the SimulationUnit - makes it easier to make a Folding@Home-compatible version, if we want to hold on to trajectory data in a similar way

codecov-commenter · 2023-10-14T03:22:31Z

Codecov Report

All modified lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@cf0c2de). Click here to learn what that means.

Additional details and impacted files

@@           Coverage Diff           @@
##             main       #3   +/-   ##
=======================================
  Coverage        ?   92.12%           
=======================================
  Files           ?        7           
  Lines           ?      419           
  Branches        ?        0           
=======================================
  Hits            ?      386           
  Misses          ?       33           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ijpulidos · 2023-10-19T21:20:14Z

feflow/protocols/nonequilibrium_cycling.py

+    def extract_positions(context, initial_atom_indices, final_atom_indices):
+        """
+        Extract positions from initial and final systems based from the hybrid topology.
+
+        Parameters
+        ----------
+        context: openmm.Context
+            Current simulation context where from extract positions.
+        hybrid_topology_factory: perses.annihilation.relative.HybridTopologyFactory
+            Hybrid topology factory where to extract positions and mapping information
+
+        Returns
+        -------
+
+        Notes
+        -----
+        It achieves this by taking the positions and indices from the initial and final states of
+        the transformation, and computing the overlap of these with the indices of the complete
+        hybrid topology, filtered by some mdtraj selection expression.
+
+        1. Get positions from context
+        2. Get topology from HTF (already mdtraj topology)
+        3. Merge that information into mdtraj.Trajectory
+        4. Filter positions for initial/final according to selection string
+        """
+        import numpy as np
+
+        # Get positions from current openmm context
+        positions = context.getState(getPositions=True).getPositions(asNumpy=True)
+
+        # Get indices for initial and final topologies in hybrid topology
+        initial_indices = np.asarray(initial_atom_indices)
+        final_indices = np.asarray(final_atom_indices)
+
+        initial_positions = positions[initial_indices, :]
+        final_positions = positions[final_indices, :]
+
+        return initial_positions, final_positions


I think we want to maintain the capability of filtering the positions using an mdtraj selection, as it originally was in https://github.com/choderalab/feflow/blob/81e1b80ea762e0ea78bbde6fc5e68d774ceadb2b/feflow/protocols/nonequilibrium_cycling.py#L100 . Maybe we want to change the default to all? This does mean we would depend on mdtraj (maybe we can use mdanalysis for this if desired).

This should help reducing the storage used for trajectories when needed and might be useful when debugging/analyzing results, though I agree there are probably better ways to achieve this. One of them is creating the sampler (and reporter?) object in openmmtools for NEq cycling (analogue to what we do with MultiStateSampler for repex), and have that one handling all the trajectory/topologies serialization.

Are there any specific reasons other than not wanting to depend on mdtraj/mdanalysis that motivates this change?

Note that the selection capabilities are not being tested in CI. This is something that we would like to be testing if we decide to include it back again.

Mainly I was motivated to not have to serialize the hybrid_topology_factory in the SetupUnit and pass it to (or re-create it in) the SimulationUnit. So instead of passing that object around and into this method, we pass in just the positions. And since we don't have hybridt_topology_factory, we don't have hybrid_topology, so it's not so easy to do mdtraj manipulations anymore. But they seemed largely unnecessary?

I don't believe we use this method anywhere for trajectory storage or filtering. Isn't this method just used here, in this protocol? My view is that if we need do something different later, then we make it do that, but not pre-emptively.

It just seemed like a feature that we weren't even using here, and its effect under default usage ("not water") was really nothing at all.

Does this make sense? It's possible I've missed some important detail, so please correct me if so.

The way this was being used is that when serializing trajectories we were extracting the positions and velocities of all things not water. It was just a way to not having to store them for everything and make the storage requirements a bit lighter, but it was used when storing the trajectories.

That said, I do think that the advantage of not having to pass the HTF object around might be a better compromise in this case, since the stored trajectories are not things we were using anyway. I think it's okay as it is now. If we end up needing to store only selections of atoms in the future we can give this another try.

dotsdl · 2023-10-31T00:16:39Z

@ijpulidos any additional questions on this PR? I may want to start working on #2 soon, and we'd want to build on this one in building that out.

ijpulidos

Looks good to me. I think we can revisit the atom selection storage if needed in the future.

ijpulidos · 2023-10-31T17:07:22Z

feflow/protocols/nonequilibrium_cycling.py

+    def extract_positions(context, initial_atom_indices, final_atom_indices):
+        """
+        Extract positions from initial and final systems based from the hybrid topology.
+
+        Parameters
+        ----------
+        context: openmm.Context
+            Current simulation context where from extract positions.
+        hybrid_topology_factory: perses.annihilation.relative.HybridTopologyFactory
+            Hybrid topology factory where to extract positions and mapping information
+
+        Returns
+        -------
+
+        Notes
+        -----
+        It achieves this by taking the positions and indices from the initial and final states of
+        the transformation, and computing the overlap of these with the indices of the complete
+        hybrid topology, filtered by some mdtraj selection expression.
+
+        1. Get positions from context
+        2. Get topology from HTF (already mdtraj topology)
+        3. Merge that information into mdtraj.Trajectory
+        4. Filter positions for initial/final according to selection string
+        """
+        import numpy as np
+
+        # Get positions from current openmm context
+        positions = context.getState(getPositions=True).getPositions(asNumpy=True)
+
+        # Get indices for initial and final topologies in hybrid topology
+        initial_indices = np.asarray(initial_atom_indices)
+        final_indices = np.asarray(final_atom_indices)
+
+        initial_positions = positions[initial_indices, :]
+        final_positions = positions[final_indices, :]
+
+        return initial_positions, final_positions


The way this was being used is that when serializing trajectories we were extracting the positions and velocities of all things not water. It was just a way to not having to store them for everything and make the storage requirements a bit lighter, but it was used when storing the trajectories.

That said, I do think that the advantage of not having to pass the HTF object around might be a better compromise in this case, since the stored trajectories are not things we were using anyway. I think it's okay as it is now. If we end up needing to store only selections of atoms in the future we can give this another try.

jchodera · 2023-11-06T15:10:08Z

I think we're ready to merge this!

dotsdl added 3 commits September 29, 2023 19:13

Splitting out SetupUnit from SimulationUnit in NonEquilibriumCyclingP…

b8e16d7

…rotocol

Merge branch 'main' into neqcyc-setup-unit

d990262

Moving things around from live prototyping

b930399

dotsdl added 2 commits October 9, 2023 17:08

Additional concept sketching

99abdae

Added utils/data.py

44c324d

ijpulidos mentioned this pull request Oct 10, 2023

NEq cycling rebase #4

Merged

dotsdl added 6 commits October 13, 2023 12:15

Merge branch 'main' into neqcyc-setup-unit

bdbf888

Removed need for HybridTopologyFactory in SimulationUnit

45e6568

This avoids: - serlializing/deserializing/remaking the HybridTopologyFactory in the SimulationUnit - makes it easier to make a Folding@Home-compatible version, if we want to hold on to trajectory data in a similar way

old -> initial, new -> final

afac441

Remove redundant extract_positions method

d1350ee

Add note on atom_selection_expression no longer used

af1d09b

Tests passing again. :D

fdb8232

Removing sample test module from cookiecutter

50cf548

dotsdl self-assigned this Oct 17, 2023

ijpulidos reviewed Oct 19, 2023

View reviewed changes

ijpulidos self-requested a review October 31, 2023 17:07

ijpulidos approved these changes Oct 31, 2023

View reviewed changes

ijpulidos merged commit 40c966b into main Nov 8, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting out SetupUnit from SimulationUnit in NonEquilibriumCyclingProtocol #3

Splitting out SetupUnit from SimulationUnit in NonEquilibriumCyclingProtocol #3

dotsdl commented Sep 30, 2023

dotsdl commented Oct 3, 2023

codecov-commenter commented Oct 14, 2023

ijpulidos Oct 19, 2023

ijpulidos Oct 19, 2023 •

edited

Loading

dotsdl Oct 24, 2023 •

edited

Loading

dotsdl Oct 24, 2023

dotsdl Oct 24, 2023

ijpulidos Oct 31, 2023

dotsdl commented Oct 31, 2023

ijpulidos left a comment

ijpulidos Oct 31, 2023

jchodera commented Nov 6, 2023

Splitting out SetupUnit from SimulationUnit in NonEquilibriumCyclingProtocol #3

Splitting out SetupUnit from SimulationUnit in NonEquilibriumCyclingProtocol #3

Conversation

dotsdl commented Sep 30, 2023

dotsdl commented Oct 3, 2023

codecov-commenter commented Oct 14, 2023

Codecov Report

ijpulidos Oct 19, 2023

Choose a reason for hiding this comment

ijpulidos Oct 19, 2023 • edited Loading

Choose a reason for hiding this comment

dotsdl Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

dotsdl Oct 24, 2023

Choose a reason for hiding this comment

dotsdl Oct 24, 2023

Choose a reason for hiding this comment

ijpulidos Oct 31, 2023

Choose a reason for hiding this comment

dotsdl commented Oct 31, 2023

ijpulidos left a comment

Choose a reason for hiding this comment

ijpulidos Oct 31, 2023

Choose a reason for hiding this comment

jchodera commented Nov 6, 2023

ijpulidos Oct 19, 2023 •

edited

Loading

dotsdl Oct 24, 2023 •

edited

Loading