feat(ciffiles): store metadata about ensemble generation directly in output CIF files. #209
feat(ciffiles): store metadata about ensemble generation directly in output CIF files. #209k-chrispens merged 6 commits intomainfrom
Conversation
…ategory to a CIFFile object
…; fixing up Claude's implementation and tests.
… grid search and patching script; resolves #208
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughWrites GuidanceConfig-derived metadata into output CIFs as a Changes
Sequence Diagram(s)sequenceDiagram
participant Guidance as Guidance runner
participant Save as save_everything
participant CIFUtil as add_category_to_cif
participant FS as Filesystem
participant Patcher as patch_output_cif_files
Guidance->>Save: run with GuidanceConfig args
Save->>CIFUtil: load template CIF and add "sampleworks" category (metadata)
CIFUtil-->>Save: modified CIF
Save->>FS: write refined.cif (with sampleworks)
Save->>FS: write job_metadata.json
Note right of FS: Later, patcher may read refined.cif or job_metadata.json
Patcher->>FS: read refined.cif
alt refined.cif contains sampleworks
FS-->>Patcher: return CIF with sampleworks (preserve)
else
FS-->>Patcher: return CIF without sampleworks
FS-->>Patcher: read job_metadata.json
Patcher->>CIFUtil: add sampleworks category from JSON
CIFUtil-->>Patcher: patched CIF
Patcher->>FS: write patched CIF
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/utils/test_cif_utils.py (1)
221-365: Comprehensive test coverage, but missing test forNonevalue handling.The test suite covers block selection, overwrite behavior, error cases, and round-trip persistence well. However, there's no explicit test verifying the
None→"none"conversion behavior documented in the implementation.💡 Suggested additional test
def test_none_values_converted(self, tmp_path): """None values in data dict should be converted to placeholder string.""" atoms = [_atom("A", 1, "ALA", False)] cif_path = _write_cif(atoms, tmp_path / "test.cif") ciffile = CIFFile.read(str(cif_path)) data = {"present": "value", "missing": None} add_category_to_cif(ciffile, data, "test_category") block = ciffile[list(ciffile.keys())[0]] category = block["test_category"] # Verify None was replaced (with "none" or "?" depending on implementation) assert "missing" in category🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/utils/test_cif_utils.py` around lines 221 - 365, Add a test that verifies add_category_to_cif converts None values to the CIF placeholder: create a CIF with one block, call add_category_to_cif with data containing a None (e.g., {"present":"value","missing":None}) and assert the resulting category in the block contains the "missing" key and that its value equals the expected placeholder used by add_category_to_cif (e.g., "none" or "?"); add this as test_none_values_converted alongside the other TestAddCategoryToCif methods to ensure None→placeholder behavior is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/sampleworks/utils/cif_utils.py`:
- Around line 314-317: Replace the non-standard "none" placeholder with the
CIF-standard missing-value marker: when building CIFCategory in the
CIFCategory(...) call (the category variable constructed from data and
category_name), map None values to "?" instead of "none" (i.e., use {k: v if v
is not None else "?" for k, v in data.items()}); if you need to distinguish
inapplicable values later, use "." per CIF rules, but for the current None
mapping use "?" to conform to CIF standard.
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Line 268: The type hint allows argparse.Namespace but the code calls
args.as_dict() (which Namespace lacks), so either restrict the parameter to
GuidanceConfig or add a fallback: change the function handling the args
parameter to detect or coerce Namespace to a dict (e.g., if hasattr(args,
"as_dict"): cfg = args.as_dict() else: cfg = vars(args) or use getattr(args,
"as_dict", lambda: vars(args))()), and update type hint if you choose to forbid
Namespace; reference the GuidanceConfig type and the args.as_dict() call when
making the change.
---
Nitpick comments:
In `@tests/utils/test_cif_utils.py`:
- Around line 221-365: Add a test that verifies add_category_to_cif converts
None values to the CIF placeholder: create a CIF with one block, call
add_category_to_cif with data containing a None (e.g.,
{"present":"value","missing":None}) and assert the resulting category in the
block contains the "missing" key and that its value equals the expected
placeholder used by add_category_to_cif (e.g., "none" or "?"); add this as
test_none_values_converted alongside the other TestAddCategoryToCif methods to
ensure None→placeholder behavior is covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 86cd0398-cff0-4e97-81eb-b0f95e64d0bb
📒 Files selected for processing (6)
scripts/patch_output_cif_files.pysrc/sampleworks/utils/cif_utils.pysrc/sampleworks/utils/guidance_script_utils.pytests/integration/test_mismatch_integration.pytests/utils/test_cif_utils.pytests/utils/test_guidance_script_utils.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/sampleworks/utils/guidance_script_utils.py (1)
331-332:⚠️ Potential issue | 🟠 MajorPotential runtime break if any caller still passes
argparse.Namespace.Line 331 assumes
args.as_dict()exists. Existing script paths still usingparse_*_args()+ directrun_guidance(...)with raw parser output will raiseAttributeErrorhere unless they were migrated toGuidanceConfigconstruction first.#!/bin/bash set -euo pipefail echo "Check parser return style (Namespace-producing parse_args):" rg -nP --type=py 'def parse_.*_args\(' src/sampleworks/utils/guidance_script_arguments.py -A20 \ | rg -n 'def |return parser\.parse_args\(\)' echo echo "Check run_guidance() call sites in scripts:" rg -nP --type=py '\brun_guidance\s*\(' scripts -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/utils/guidance_script_utils.py` around lines 331 - 332, The code unconditionally calls args.as_dict() (in guidance_script_utils.py) which will raise AttributeError for callers that still pass an argparse.Namespace (e.g. parse_*_args() + run_guidance(...) sites); change the metadata extraction to handle all forms: if args has as_dict use that, else if it's an argparse.Namespace use vars(args), else if it's a GuidanceConfig (or other known config class) call its appropriate to-dict method, so metadata is always a dict; update the metadata assignment in run_guidance (or the function that performs metadata = args.as_dict()) to perform this safe branching and normalize to a plain dict before further use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/sampleworks/utils/cif_utils.py`:
- Around line 275-279: The docstring example is non-runnable because
add_category_to_cif expects an existing CIF block; update the example to create
or add a block to the CIFFile before calling add_category_to_cif (e.g., create
CIFFile(), add a block or use CIFFile().new_block(...) / ensure ciffile.block
contains at least one block), then call add_category_to_cif(ciffile, data,
"my_custom_data"); reference the CIFFile constructor and add_category_to_cif
function to locate and fix the example.
- Around line 314-317: The current CIFCategory construction replaces only
top-level None values in data but leaves None inside list-like columns (e.g.,
data entries that are lists/tuples), so update the normalization before creating
CIFCategory: walk each value in the data dict (the variable data used to build
category in code that constructs category = CIFCategory(...)), and if a value is
a list/tuple, map its elements replacing inner None with "?" (recursively if
nested lists are possible); leave non-list values handled as before. After
normalizing list elements, pass the cleaned dict into
CIFCategory(name=category_name, columns=...) so all nulls comply with CIF rules.
---
Duplicate comments:
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Around line 331-332: The code unconditionally calls args.as_dict() (in
guidance_script_utils.py) which will raise AttributeError for callers that still
pass an argparse.Namespace (e.g. parse_*_args() + run_guidance(...) sites);
change the metadata extraction to handle all forms: if args has as_dict use
that, else if it's an argparse.Namespace use vars(args), else if it's a
GuidanceConfig (or other known config class) call its appropriate to-dict
method, so metadata is always a dict; update the metadata assignment in
run_guidance (or the function that performs metadata = args.as_dict()) to
perform this safe branching and normalize to a plain dict before further use.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 940ad7a7-cb49-403a-9e2e-7a5c2eda48a1
📒 Files selected for processing (3)
src/sampleworks/utils/cif_utils.pysrc/sampleworks/utils/guidance_script_utils.pytests/utils/test_cif_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/utils/test_cif_utils.py
k-chrispens
left a comment
There was a problem hiding this comment.
Looks good! This is a cool new method
This PR:
sampleworks.utils.cif_utils.add_category_to_cifthat adds dictionary-like data to a biotite CIFFile object. This data can be a simple dictionary of keys and values, or it can be a dictionary of list-like objects, resulting in a CIF loop_ structure.sampleworks.utils.guidance_script_utils.save_everythingto write the GuidanceConfig.as_dict() object into the output CIF filescripts/patch_output_cif_files.pyto copy thejob_metadata.jsondata in output directories into any patched CIF files, if it isn't already present.add_category_to_ciffile, including that any category data so written is read in and written out again correctly.resolves #208
Summary by CodeRabbit
New Features
Improvements
Tests