Refactor RMSD analyses into MDAnalysis `AnaysisBase` classes by hannahbaumann · Pull Request #90 · OpenFreeEnergy/openfe_analysis

hannahbaumann · 2026-02-20T10:09:23Z

Refactor the rmsd analysis using this example: https://docs.mdanalysis.org/2.7.0/documentation_pages/analysis/base.html

Longer term we may not want to have the gather_rms_data function but access the individual analysis classes directly in the openfe Protocol

codecov · 2026-02-20T10:11:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.32%. Comparing base (12122bd) to head (fdf4531).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #90      +/-   ##
==========================================
+ Coverage   96.09%   96.32%   +0.23%     
==========================================
  Files           6        6              
  Lines         333      354      +21     
==========================================
+ Hits          320      341      +21     
  Misses         13       13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hannahbaumann · 2026-02-20T10:14:27Z

@talagayev and @jthorton : This is a first go at the refactor into the individual RMSD classes, based on the MDAnalysis AnalysisBase. Please let me know what you think!

jthorton · 2026-02-20T15:18:41Z

+class LigandRMSD(AnalysisBase):
+    """
+    1D RMSD time series for a ligand AtomGroup.
+    """
+
+    def __init__(self, atomgroup, **kwargs):
+        super(LigandRMSD, self).__init__(atomgroup.universe.trajectory, **kwargs)
+
+        self._ag = atomgroup
+
+    def _prepare(self):
+        self.results.rmsd = []
+        self._reference = self._ag.positions
+        self._weights = self._ag.masses / np.mean(self._ag.masses)
+
+    def _single_frame(self):
+        rmsd = rms.rmsd(
+            self._ag.positions,
+            self._reference,
+            self._weights,
+            center=False,
+            superposition=False,
+        )
+        self.results.rmsd.append(rmsd)
+
+    def _conclude(self):
+        self.results.rmsd = np.asarray(self.results.rmsd)


2 initial thoughts:

Can we make a more general RMSD class that can be reused for protein and ligand analysis, basically it should work on any atom group and does this not already exist in MDAnalysis?

Do we not want to switch to use the symmetry RMSD (spyrmsd I think its called)?

@jthorton : There's a difference between the RMSD class in MDAnalysis in what we've been doing, and I'm not sure yet what we want. In MDAnalysis, by default they are doing a superposition (rotational and translational), while we didn't do that. Now I'm not sure, what are we actually interested in, esp. for the ligand RMSD. Should this be the RMSD of the internal conformation of the ligand, or the RMSD of the ligand in the binding pocket/ligand pose? I think in the solvent we would definitely be more interested in the internal conformation of the ligand. In the binding pocket, I'm not sure. We also calculate the ligand COM displacement as a measure of stability in the pocket, but I'm not sure what we would want for the RMSD. What do you think?

talagayev

@hannahbaumann I like it, looks very good to me :) also the structure is like for the MDAnalysis AnalysisBase classes, which is I think @IAlibay wanted to have and also adressing it that it takes any atom group adressing @jthorton comment is good.

Overall Looks good to me :)

hannahbaumann · 2026-03-02T14:06:28Z

+                prot_rmsd = RMSDAnalysis(prot).run(step=skip)
+                output["protein_RMSD"].append(prot_rmsd.results.rmsd)
+                # # Using the MDAnalysis RMSD class instead
+                # gs = ["protein and name CA", "protein"]


Here is an example of how we could do the same analysis using the MDAnalysis RMSD class.

hannahbaumann · 2026-03-19T13:27:39Z

+        self, atomgroup, reference=None, mass_weighted=False, superposition=False, **kwargs
+    ):
+        super(RMSDAnalysis, self).__init__(atomgroup.universe.trajectory, **kwargs)
+


_analysis_algorithm_is_parallelizable

Please remember to add.

Added this!

hannahbaumann · 2026-03-19T13:41:43Z

Add tests for mass weighting and superposition. Potentially import test data from MDAnalysis.

hannahbaumann · 2026-04-13T09:42:57Z

+                prot_rmsd2d = Protein2DRMSD(prot).run(step=skip)
+                output["protein_2D_RMSD"].append(prot_rmsd2d.results.rmsd2d)
+                # # Using the MDAnalysis DistanceMatrix class
+                # prot_rmsd2d = diffusionmap.DistanceMatrix(u, select="protein and name CA")


This MDA code is much slower, on the test data 10s vs. 0.4s.

hannahbaumann · 2026-04-13T09:43:34Z

+                output["protein_RMSD"].append(prot_rmsd.results.rmsd)
+                # # Using the MDAnalysis RMSD class instead
+                # gs = ["protein and name CA"]
+                # prot_rmsd = rms.RMSD(


The two RMSD classes are approximately equal in timing (on the test data)

hannahbaumann · 2026-05-19T10:02:59Z

+                # output["protein_2D_RMSD"].append(flattened)
+
+            if ligand.n_atoms > 0:
+                lig_rmsd = RMSDAnalysis(ligand, mass_weighted=True).run(step=skip)


Ligand RMSD is currently calculated on the hybrid topology, which may not be what we want long term.

For a separate PR - the atom selection (or atomgroup) should really be user defined rather than defaulting to UNK.

This might be a good argument for letting Protocols deal with this rather than making it uniform.

Opened an issue here: #103

IAlibay · 2026-05-21T07:22:03Z

+    For all unique frame pairs ``(i, j)`` with ``i < j``, this function
+    computes the RMSD between atomic coordinates after optimal alignment.
+    """
+


Can you explicitly define _analysis_algorithm_is_parallelizable = False (it's inherited by default, but it would be good to have it explicitly defined here) in these classes and then raise an issue about looking into parallism?

Added this and also opened an issue.

IAlibay · 2026-05-21T07:26:27Z

+        self.results.rmsd2d = []
+
+    def _single_frame(self):
+        self._coords.append(self._ag.positions.copy())


Note that this is effectively the same as putting the whole trajectory into memory. I'm saying this as an FYI that it will be a possible failure mode when someone runs a really long simulation and then tries to use this class.

Might be good to document in the docstring notes.

Added a note in the doc string

IAlibay · 2026-05-21T07:47:11Z

+        self.results.rmsd2d = []
+
+    def _single_frame(self):
+        self._coords.append(self._ag.positions.copy())


Is the copy necessary?

Since now we're doing the pre-allocation, the copy is not necessary any more (not fully sure if it was before, but I wanted to be safe =)

IAlibay · 2026-05-21T07:48:17Z

+    Flattened 2D RMSD matrix
+
+    For all unique frame pairs ``(i, j)`` with ``i < j``, this function
+    computes the RMSD between atomic coordinates after optimal alignment.


Can you maybe expand this to mention you're doing a center of geometry fit as well as a rotational and translational superposition usingg QCP?

Added this!

IAlibay · 2026-05-21T07:48:36Z

+        self, atomgroup, reference=None, mass_weighted=False, superposition=False, **kwargs
+    ):
+        super(RMSDAnalysis, self).__init__(atomgroup.universe.trajectory, **kwargs)
+


Please remember to add.

IAlibay · 2026-05-21T07:58:03Z

+
+    def _single_frame(self):
+        # distance between start and current ligand position
+        # ignores PBC, but we've already centered the traj


Why ignore PBC? Could you not just pass the box kwarg argument along? Or is the box distorted because of the transformation?

Please document this in the docstring.

I added a note to the doc string, I think that if, e.g. the ligand drifted in the simulation away by more than half the box size, but stayed in the same box, applying the minimum image convention (by passing through the box) would actually make the drift look smaller than it really was. Or would it in that case still identify that the ligand stayed in the same box and not apply the minimum image convention?

IAlibay · 2026-05-21T08:02:17Z

+        self.results.com_drift.append(drift)
+
+    def _conclude(self):
+        self.results.com_drift = np.asarray(self.results.com_drift)


Nit: it may be more ever so slightly efficient to just pre-allocate the array ahead of time in _prepare by defining a numpy array of length self.n_frames. This also has the nice side effect of not needing a _conclude definition.

IAlibay · 2026-05-21T08:03:18Z

-        output.append(rmsd)
+                prot_rmsd = RMSDAnalysis(prot).run(step=skip)
+                output["protein_RMSD"].append(prot_rmsd.results.rmsd)
+                # # Using the MDAnalysis RMSD class instead


Please remember to remove the commented out regions.

Removed this!

IAlibay · 2026-05-21T08:07:45Z

+                # output["protein_2D_RMSD"].append(flattened)
+
+            if ligand.n_atoms > 0:
+                lig_rmsd = RMSDAnalysis(ligand, mass_weighted=True).run(step=skip)


For a separate PR - the atom selection (or atomgroup) should really be user defined rather than defaulting to UNK.

This might be a good argument for letting Protocols deal with this rather than making it uniform.

IAlibay · 2026-05-21T08:08:52Z

+                # lig_rmsd.run(step=skip)
+                # output["ligand_RMSD"].append(lig_rmsd.results.rmsd.T[3])
+                lig_com_drift = LigandCOMDrift(ligand).run(step=skip)
+                output["ligand_wander"].append(lig_com_drift.results.com_drift)


I know this is historical, so it doesn't have to be here, but can we please renamed this to ligand_com_drift or anything else? wander is such an unspecific name 😅

I think I would do this in a separate PR, since it would require an update in openfe? Raised an issue here #104

IAlibay · 2026-05-21T08:26:11Z

No tests for Protein2DRMSD or LigandCOMDrift?

The original tests in test_rmsd.py are already covering Protein2DRMSD and LigandCOMDrift. I had added the MDA tests since in one of our meetings you had mentioned that it could make sense to just use those test data, however, for now I still kept all the original tests, also for RMSD, so some things are double tested right now. Should I try to move as much of the testing to use the MDA data, and only have a regression test on our own trajectories or what would you suggest?

IAlibay · 2026-05-21T08:28:54Z

+        self._ag = atomgroup
+
+    def _prepare(self):
+        self._coords = []


Could you pre-allocate numpy arrays here instead?

IAlibay · 2026-05-21T08:30:31Z

+    def _prepare(self):
+        self.results.rmsd = []
+
+        self._reference_pos = self._reference.positions.copy()


Note that if you call .run(start=10), this will mean your reference is frame 10 not frame 0. This should probably be documented.

Updated the doc string and also inline comment

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

Refactor RMSD analyses into MDAnalysis AnaysisBase classes

2977c01

hannahbaumann requested review from jthorton and talagayev February 20, 2026 10:13

hannahbaumann changed the title ~~Refactor RMSD analyses into MDAnalysis AnaysisBase classes~~ Refactor RMSD analyses into MDAnalysis AnaysisBase classes Feb 20, 2026

jthorton reviewed Feb 20, 2026

View reviewed changes

hannahbaumann and others added 2 commits February 23, 2026 09:52

Combine RMSD analyses

4a960a0

Merge branch 'main' into rmsd_refactor_analysisbase

afa829a

talagayev approved these changes Feb 24, 2026

View reviewed changes

Add reference and superposition option

df017da

hannahbaumann commented Mar 2, 2026

View reviewed changes

Comment thread src/openfe_analysis/rmsd.py Outdated

hannahbaumann added 2 commits March 2, 2026 15:18

Apply suggestion from @hannahbaumann

d824ab6

Merge branch 'main' into rmsd_refactor_analysisbase

71a11d3

hannahbaumann commented Mar 19, 2026

View reviewed changes

IAlibay assigned IAlibay and jthorton Mar 23, 2026

hannahbaumann and others added 3 commits April 13, 2026 10:47

Merge branch 'main' into rmsd_refactor_analysisbase

5cabcf4

Some changes

e16b2e3

Fix mda 2D analysis example

616bf42

hannahbaumann commented Apr 13, 2026

View reviewed changes

hannahbaumann added 3 commits April 13, 2026 13:40

Add RMSD test using mda test data

90ed39e

Some fixes

7d6e49f

Some more updates

4bf598f

hannahbaumann commented May 19, 2026

View reviewed changes

hannahbaumann requested review from IAlibay and jthorton May 21, 2026 07:48

IAlibay requested changes May 21, 2026

View reviewed changes

This was linked to issues May 21, 2026

Refactor current analysis in an MDAnalysis like way #87

Open

API work - RMSD #82

Open

hannahbaumann and others added 4 commits May 22, 2026 09:22

Update src/openfe_analysis/rmsd.py

80b984d

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

Address review comments

db61dc4

Fix docs build

f22d96f

Add MDAnalysisTests as test dependency

fdf4531

hannahbaumann requested a review from IAlibay May 22, 2026 10:12

Conversation

hannahbaumann commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hannahbaumann commented Feb 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

talagayev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahbaumann commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahbaumann May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

hannahbaumann commented Feb 20, 2026 •

edited

Loading

codecov Bot commented Feb 20, 2026 •

edited

Loading

hannahbaumann commented Mar 19, 2026 •

edited

Loading

hannahbaumann May 22, 2026 •

edited

Loading