Remove phenix clashscore script and add occupancy/density utility improvements#112
Remove phenix clashscore script and add occupancy/density utility improvements#112k-chrispens merged 2 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughThe PR removes an outdated phenix clashscore processing script and refactors several utility modules. Changes include adding occupancy level imports, enhancing error guidance, enforcing input validation in CIF utilities, and fixing tensor shape mismatches in density utilities. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/sampleworks/metrics/lddt.py`:
- Around line 349-352: The RuntimeError message raised when checking
predicted_aa_stack concatenates two sentences without proper
punctuation/spacing; update the string in the raise RuntimeError inside the
relevant block (the one referencing predicted_aa_stack and mask()) to add a
period and a space so it reads "...mask(). You should read in atom arrays..."
ensuring the error text is grammatically correct and clear.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
scripts/run_and_process_phenix_clashscore.pysrc/sampleworks/eval/grid_search_eval_utils.pysrc/sampleworks/eval/occupancy_utils.pysrc/sampleworks/metrics/lddt.pysrc/sampleworks/utils/cif_utils.pysrc/sampleworks/utils/density_utils.py
💤 Files with no reviewable changes (1)
- scripts/run_and_process_phenix_clashscore.py
33084d6 to
2026961
Compare
| atom_array, device | ||
| ) | ||
|
|
||
| # need to make sure these all have the same batch dimension or the transformer will fail. |
There was a problem hiding this comment.
@k-chrispens this may no longer be needed with the changes you main in real_space_density.py but I'm going to leave here for now then test everything in my next PR, if that's okay with you.
There was a problem hiding this comment.
I don't think I made any changes affecting this
There was a problem hiding this comment.
I do think it would be clearer to do match_batch here in the next PR though, esp. since these are potentially already at batch size n_model from an AtomArrayStack, which could lead to errors that we would catch and report well in match_batch
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/sampleworks/eval/occupancy_utils.py (1)
17-20: Consider documenting the "native" case in the docstring.The new
"native"branch is acknowledged as a hack, but it would help future maintainers to add an example in the docstring (e.g.,'1vme_native' -> ('1vme', 0.5)). This clarifies expected behavior until the directory naming is properly addressed.📝 Proposed docstring update
"""Extract protein name and occupancy from directory name. Examples: - '1vme_0.5occA_0.5occB' -> ('1vme', 0.5) - '6b8x_1.0occA' -> ('6b8x', 1.0) - '5sop_1.0occB' -> ('5sop', 0.0) + - '1vme_native' -> ('1vme', 0.5) # hack: assumes native means 0.5 occupancy """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/eval/occupancy_utils.py` around lines 17 - 20, Update the docstring of the function in src/sampleworks/eval/occupancy_utils.py that parses directory names (the one using the dir_name variable) to document the new "native" branch: add a short example such as "'1vme_native' -> ('1vme', 0.5)" and a brief note that this is a temporary hack until directory naming is fixed so future maintainers understand expected behavior when "native" appears in dir_name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/sampleworks/eval/occupancy_utils.py`:
- Around line 17-20: Update the docstring of the function in
src/sampleworks/eval/occupancy_utils.py that parses directory names (the one
using the dir_name variable) to document the new "native" branch: add a short
example such as "'1vme_native' -> ('1vme', 0.5)" and a brief note that this is a
temporary hack until directory naming is fixed so future maintainers understand
expected behavior when "native" appears in dir_name.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
scripts/run_and_process_phenix_clashscore.pysrc/sampleworks/eval/grid_search_eval_utils.pysrc/sampleworks/eval/occupancy_utils.pysrc/sampleworks/metrics/lddt.pysrc/sampleworks/utils/cif_utils.pysrc/sampleworks/utils/density_utils.py
💤 Files with no reviewable changes (1)
- scripts/run_and_process_phenix_clashscore.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/sampleworks/utils/cif_utils.py
- src/sampleworks/utils/density_utils.py
k-chrispens
left a comment
There was a problem hiding this comment.
Looks good, I do think it's worth switching to match_batch wherever we do these expansions just so it is a bit clearer what we're trying to do and to catch potential errors, but that can come later.
| atom_array, device | ||
| ) | ||
|
|
||
| # need to make sure these all have the same batch dimension or the transformer will fail. |
There was a problem hiding this comment.
I do think it would be clearer to do match_batch here in the next PR though, esp. since these are potentially already at batch size n_model from an AtomArrayStack, which could lead to errors that we would catch and report well in match_batch
The one possibly controversial thing here is in density_utils.py. I expand arrays like elements and b-factors to have a matching batch dimension. I can update to use the new
match_batchif you think it is worthwhile.Otherwise:
--removed a duplicate script (for phenix clashscores)
--handling some quirks of our initial 40 protein run
--making it possible to search for files other than "refined.cif" (necessary since we patch those files and rename)
Summary by CodeRabbit
Release Notes
Bug Fixes
Improvements
Chores