refactor(eval): unify rscc script structure loading with generate_synthetic_density.py#196
refactor(eval): unify rscc script structure loading with generate_synthetic_density.py#196
Conversation
…thetic_density.py
📝 WalkthroughWalkthroughThis PR updates structure loading to support flexible altloc selection strategies. The Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/eval/rscc_grid_search_script.py (1)
136-140: Cache the refined-structure loads per trial.Lines 139-140 parse the same
refined.ciftwice for every selection. On proteins with several selections, that adds redundant I/O to an already expensive evaluation pass. Cache thealtloc="all"andaltloc="occupancy"structures once per trial; if you keep the in-place coordinate update at Line 213, copy the all-altloc array before reusing it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/rscc_grid_search_script.py` around lines 136 - 140, The code is re-parsing trial.refined_cif_path twice for every selection; cache the results of load_structure_with_altlocs(trial.refined_cif_path, altloc="all") and load_structure_with_altlocs(trial.refined_cif_path, altloc="occupancy") once per trial (e.g., store as atom_array_all_altlocs and atom_array) and reuse them for all selections instead of reloading; if atom_array_all_altlocs is later modified in-place (see the coordinate update that mutates arrays), make and use a shallow/deep copy of atom_array_all_altlocs before mutation so the cached version remains reusable across selections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/eval/rscc_grid_search_script.py`:
- Around line 169-171: The code currently calls remove_atoms_with_any_nan_coords
on atom_array_all_altlocs before compute_density_from_atomarray, which can
change the density input and hide invalid structures; instead, revert removal
for atom_array_all_altlocs so compute_density_from_atomarray receives the
original data and only apply remove_atoms_with_any_nan_coords to alignment-only
arrays (e.g., the arrays used for alignment/registration), leaving
atom_array_all_altlocs untouched until after shared validation has run; locate
remove_atoms_with_any_nan_coords and the call sites involving
atom_array_all_altlocs and compute_density_from_atomarray to implement this
change.
---
Nitpick comments:
In `@scripts/eval/rscc_grid_search_script.py`:
- Around line 136-140: The code is re-parsing trial.refined_cif_path twice for
every selection; cache the results of
load_structure_with_altlocs(trial.refined_cif_path, altloc="all") and
load_structure_with_altlocs(trial.refined_cif_path, altloc="occupancy") once per
trial (e.g., store as atom_array_all_altlocs and atom_array) and reuse them for
all selections instead of reloading; if atom_array_all_altlocs is later modified
in-place (see the coordinate update that mutates arrays), make and use a
shallow/deep copy of atom_array_all_altlocs before mutation so the cached
version remains reusable across selections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d0f656e8-3739-459e-b0d5-48d548243b82
⛔ Files ignored due to path filters (1)
pixi.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
scripts/eval/rscc_grid_search_script.pysrc/sampleworks/utils/atom_array_utils.py
marcuscollins
left a comment
There was a problem hiding this comment.
I think this PR introduces a bug: it looks like you're loading just the first model in the Trial refined cif file (see comments inline) which would mean we'd only compute the density from that first model, rather than as the average of all the generated ensemble of structures.
| # Load refined structure twice: | ||
| # - all altlocs (occupancy-weighted) for density computation | ||
| # - highest-occupancy altloc for alignment (no duplicate atoms) | ||
| atom_array_all_altlocs = load_structure_with_altlocs(trial.refined_cif_path) |
There was a problem hiding this comment.
FWIW these shouldn't have altlocs (the current structure predictors produce structures with no altlocs, although I suppose this could change.) I'm curious why you chose this path?
There was a problem hiding this comment.
Actually, this could be a problem, because load_structure_with_altlocs currently returns only the first model in the CIF/PDB file. Later on, we need to handle multiple models (as each of our output CIF files contains the entire ensemble, but they are represented as models, not with altlocs.
As part of the PR, could you include a test that makes sure this code works correctly? I can point you to files if you need example input. But make sure that the new version produces the same output as the old version.
There was a problem hiding this comment.
oh I thought the predicted ensemble would be represented as altlocs. Can you point me to files that currently represent the ensemble predictions?
Address issue #113 (one of the TODO comments in
rscc_grid_search_script.py).Specific changes include:
parse()+get_asym_unit_from_structure()with the sharedload_structure_with_altlocs()load_structure_with_altlocs()("all" / "occupancy" / "first") so callers can select the biotite modealtloc="all"for density calculation (matchinggenerate_synthetic_density.py) andaltloc="occupancy"for alignmentextract_density_inputs_from_atomarrayinsidecompute_density_from_atomarraySummary by CodeRabbit