Skip to content

Fix ligand selection when cofactors share resname UNK#119

Open
siddhantkhatod wants to merge 4 commits into
OpenFreeEnergy:mainfrom
siddhantkhatod:fix-118-ligand-cofactor-selection
Open

Fix ligand selection when cofactors share resname UNK#119
siddhantkhatod wants to merge 4 commits into
OpenFreeEnergy:mainfrom
siddhantkhatod:fix-118-ligand-cofactor-selection

Conversation

@siddhantkhatod

@siddhantkhatod siddhantkhatod commented Jun 29, 2026

Copy link
Copy Markdown

Description

Fixes #118.

In OpenFE hybrid topologies, both the ligand and cofactors can be assigned the residue name UNK. The gather_rms_data function currently selects all atoms with resname UNK, which incorrectly includes cofactors in the ligand RMSD calculation.

Changes

  • Added _select_ligand() helper function that auto-detects the actual ligand by checking for hybrid topology tempfactors (0.25, 0.5, 0.75).
  • When the default "resname UNK" matches multiple residues, the ligand is identified as the residue containing atoms with tempfactors strictly between 0 and 1.
  • Custom ligand_selection strings are still respected and bypass auto-detection.
  • Added unit tests for all edge cases.

Tests

  • test_select_ligand_single_unk — single residue works as before
  • test_select_ligand_cofactor_present — cofactor correctly excluded
  • test_select_ligand_multiple_hybrid — picks largest hybrid residue
  • test_select_ligand_no_hybrid_fallback — graceful fallback
  • test_select_ligand_custom_selection_unchanged — custom selections respected

In OpenFE hybrid topologies, both ligands and cofactors can be assigned
resname UNK. gather_rms_data() now auto-detects the actual ligand by
checking for hybrid tempfactors (0.25, 0.5, 0.75) when the default
'resname UNK' selection matches multiple residues.

Fixes OpenFreeEnergy#118
In OpenFE hybrid topologies, both ligands and cofactors can be assigned
resname UNK. gather_rms_data() now auto-detects the actual ligand by
checking for hybrid tempfactors (0.25, 0.5, 0.75) when the default
'resname UNK' selection matches multiple residues.

Fixes OpenFreeEnergy#118
@siddhantkhatod

Copy link
Copy Markdown
Author

pre-commit.ci autofix

@hannahbaumann

Copy link
Copy Markdown
Contributor

Hi @siddhantkhatod, thank you for this contribution, the tempfactor-based detection is a clever way to find the ligand without relying on residue names.
After discussing #118, we had decided to fix the root cause upstream in openfe (OpenFreeEnergy/openfe#2042), by giving the ligand and cofactors distinct residue names at setup. That stops the ligand and cofactors from being written into the topology with the same residue name in the first place, which should generally make it easier for users to visualize trajectories and do their own post-analysis without causing confusion. So I'm going to hold off on merging here for now.
That said, your approach fixes trajectories already written with the shared UNK name, which the upstream fix won't help, so we may come back to this PR if we decide openfe_analysis should handle that case too.
Thanks again for digging into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gather_rms_data is wrong for system with cofactors since both ligand and cofactors get assigned resname UNK

2 participants