Skip to content

feat(ciffiles): store metadata about ensemble generation directly in output CIF files. #209

Merged
k-chrispens merged 6 commits intomainfrom
mdc/store-metadata-in-cif
Apr 10, 2026
Merged

feat(ciffiles): store metadata about ensemble generation directly in output CIF files. #209
k-chrispens merged 6 commits intomainfrom
mdc/store-metadata-in-cif

Conversation

@marcuscollins
Copy link
Copy Markdown
Collaborator

@marcuscollins marcuscollins commented Apr 10, 2026

This PR:

  • adds a new method sampleworks.utils.cif_utils.add_category_to_cif that adds dictionary-like data to a biotite CIFFile object. This data can be a simple dictionary of keys and values, or it can be a dictionary of list-like objects, resulting in a CIF loop_ structure.
  • Uses that method in sampleworks.utils.guidance_script_utils.save_everything to write the GuidanceConfig.as_dict() object into the output CIF file
  • Uses that method in scripts/patch_output_cif_files.py to copy the job_metadata.json data in output directories into any patched CIF files, if it isn't already present.
  • Adds tests for the new add_category_to_cif file, including that any category data so written is read in and written out again correctly.

resolves #208

Summary by CodeRabbit

  • New Features

    • Sampleworks trial metadata is now embedded into output CIFs and also emitted as a JSON sidecar.
  • Improvements

    • Metadata insertion can add or replace structured categories in CIFs with conflict handling and validation.
    • CLI help text for input path matching clarified.
    • Output saving unified to produce consistent CIF + JSON metadata.
  • Tests

    • Added tests covering metadata insertion, error cases, persistence, and updated save-flow tests.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b5f744d-2144-4ff4-9c48-66697a71d692

📥 Commits

Reviewing files that changed from the base of the PR and between 342058a and e8ed80d.

📒 Files selected for processing (1)
  • src/sampleworks/utils/guidance_script_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/sampleworks/utils/guidance_script_utils.py

📝 Walkthrough

Walkthrough

Writes GuidanceConfig-derived metadata into output CIFs as a sampleworks CIF category, adds add_category_to_cif() for in-memory CIF mutation, centralizes metadata creation/writing in save_everything, and updates the CIF patching script to preserve or inject sampleworks metadata.

Changes

Cohort / File(s) Summary
CIF Utilities
src/sampleworks/utils/cif_utils.py
Add add_category_to_cif(ciffile, data, category_name, overwrite=False, block_name=None) to construct and insert a CIFCategory into a block; block selection validation, overwrite handling, and None -> "?" normalization.
Metadata Integration
src/sampleworks/utils/guidance_script_utils.py
Change save_everything to accept args: GuidanceConfig; compute metadata = args.as_dict(); embed metadata into refined.cif via add_category_to_cif; write job_metadata.json into output dir; tighten related function signatures to require GuidanceConfig.
Output-patching script
scripts/patch_output_cif_files.py
Update patch_individual_cif_file to prefer an existing sampleworks block from the source CIF, else load job_metadata.json from the CIF directory and add it with add_category_to_cif; extend --input-pdb-pattern CLI help text.
Tests
tests/utils/test_cif_utils.py, tests/utils/test_guidance_script_utils.py, tests/integration/test_mismatch_integration.py
Add TestAddCategoryToCif covering single/multi-block, overwrite/error cases, persistence, and None/edge inputs; update tests to construct/pass GuidanceConfig to save_everything() and adjust integration test calls accordingly.

Sequence Diagram(s)

sequenceDiagram
  participant Guidance as Guidance runner
  participant Save as save_everything
  participant CIFUtil as add_category_to_cif
  participant FS as Filesystem
  participant Patcher as patch_output_cif_files

  Guidance->>Save: run with GuidanceConfig args
  Save->>CIFUtil: load template CIF and add "sampleworks" category (metadata)
  CIFUtil-->>Save: modified CIF
  Save->>FS: write refined.cif (with sampleworks)
  Save->>FS: write job_metadata.json
  Note right of FS: Later, patcher may read refined.cif or job_metadata.json
  Patcher->>FS: read refined.cif
  alt refined.cif contains sampleworks
    FS-->>Patcher: return CIF with sampleworks (preserve)
  else
    FS-->>Patcher: return CIF without sampleworks
    FS-->>Patcher: read job_metadata.json
    Patcher->>CIFUtil: add sampleworks category from JSON
    CIFUtil-->>Patcher: patched CIF
    Patcher->>FS: write patched CIF
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • k-chrispens

Poem

"I hopped through CIFs with a tiny pen,
Tucked guidance secrets inside now and then,
A category placed with a soft little thump,
Metadata snug in the crystalized dump,
🐰✨ — the rabbit winked and hopped again."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: storing metadata about ensemble generation directly in CIF files.
Linked Issues check ✅ Passed All PR objectives align with issue #208 requirements: adds add_category_to_cif utility, writes GuidanceConfig metadata to CIF files, and includes round-trip tests.
Out of Scope Changes check ✅ Passed All changes directly support the core objective of storing metadata in CIF files; no unrelated refactoring or scope creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mdc/store-metadata-in-cif

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/utils/test_cif_utils.py (1)

221-365: Comprehensive test coverage, but missing test for None value handling.

The test suite covers block selection, overwrite behavior, error cases, and round-trip persistence well. However, there's no explicit test verifying the None"none" conversion behavior documented in the implementation.

💡 Suggested additional test
def test_none_values_converted(self, tmp_path):
    """None values in data dict should be converted to placeholder string."""
    atoms = [_atom("A", 1, "ALA", False)]
    cif_path = _write_cif(atoms, tmp_path / "test.cif")
    ciffile = CIFFile.read(str(cif_path))

    data = {"present": "value", "missing": None}
    add_category_to_cif(ciffile, data, "test_category")

    block = ciffile[list(ciffile.keys())[0]]
    category = block["test_category"]
    # Verify None was replaced (with "none" or "?" depending on implementation)
    assert "missing" in category
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_cif_utils.py` around lines 221 - 365, Add a test that
verifies add_category_to_cif converts None values to the CIF placeholder: create
a CIF with one block, call add_category_to_cif with data containing a None
(e.g., {"present":"value","missing":None}) and assert the resulting category in
the block contains the "missing" key and that its value equals the expected
placeholder used by add_category_to_cif (e.g., "none" or "?"); add this as
test_none_values_converted alongside the other TestAddCategoryToCif methods to
ensure None→placeholder behavior is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sampleworks/utils/cif_utils.py`:
- Around line 314-317: Replace the non-standard "none" placeholder with the
CIF-standard missing-value marker: when building CIFCategory in the
CIFCategory(...) call (the category variable constructed from data and
category_name), map None values to "?" instead of "none" (i.e., use {k: v if v
is not None else "?" for k, v in data.items()}); if you need to distinguish
inapplicable values later, use "." per CIF rules, but for the current None
mapping use "?" to conform to CIF standard.

In `@src/sampleworks/utils/guidance_script_utils.py`:
- Line 268: The type hint allows argparse.Namespace but the code calls
args.as_dict() (which Namespace lacks), so either restrict the parameter to
GuidanceConfig or add a fallback: change the function handling the args
parameter to detect or coerce Namespace to a dict (e.g., if hasattr(args,
"as_dict"): cfg = args.as_dict() else: cfg = vars(args) or use getattr(args,
"as_dict", lambda: vars(args))()), and update type hint if you choose to forbid
Namespace; reference the GuidanceConfig type and the args.as_dict() call when
making the change.

---

Nitpick comments:
In `@tests/utils/test_cif_utils.py`:
- Around line 221-365: Add a test that verifies add_category_to_cif converts
None values to the CIF placeholder: create a CIF with one block, call
add_category_to_cif with data containing a None (e.g.,
{"present":"value","missing":None}) and assert the resulting category in the
block contains the "missing" key and that its value equals the expected
placeholder used by add_category_to_cif (e.g., "none" or "?"); add this as
test_none_values_converted alongside the other TestAddCategoryToCif methods to
ensure None→placeholder behavior is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86cd0398-cff0-4e97-81eb-b0f95e64d0bb

📥 Commits

Reviewing files that changed from the base of the PR and between 14ba724 and 33dc325.

📒 Files selected for processing (6)
  • scripts/patch_output_cif_files.py
  • src/sampleworks/utils/cif_utils.py
  • src/sampleworks/utils/guidance_script_utils.py
  • tests/integration/test_mismatch_integration.py
  • tests/utils/test_cif_utils.py
  • tests/utils/test_guidance_script_utils.py

Comment thread src/sampleworks/utils/guidance_script_utils.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/sampleworks/utils/guidance_script_utils.py (1)

331-332: ⚠️ Potential issue | 🟠 Major

Potential runtime break if any caller still passes argparse.Namespace.

Line 331 assumes args.as_dict() exists. Existing script paths still using parse_*_args() + direct run_guidance(...) with raw parser output will raise AttributeError here unless they were migrated to GuidanceConfig construction first.

#!/bin/bash
set -euo pipefail

echo "Check parser return style (Namespace-producing parse_args):"
rg -nP --type=py 'def parse_.*_args\(' src/sampleworks/utils/guidance_script_arguments.py -A20 \
  | rg -n 'def |return parser\.parse_args\(\)'

echo
echo "Check run_guidance() call sites in scripts:"
rg -nP --type=py '\brun_guidance\s*\(' scripts -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sampleworks/utils/guidance_script_utils.py` around lines 331 - 332, The
code unconditionally calls args.as_dict() (in guidance_script_utils.py) which
will raise AttributeError for callers that still pass an argparse.Namespace
(e.g. parse_*_args() + run_guidance(...) sites); change the metadata extraction
to handle all forms: if args has as_dict use that, else if it's an
argparse.Namespace use vars(args), else if it's a GuidanceConfig (or other known
config class) call its appropriate to-dict method, so metadata is always a dict;
update the metadata assignment in run_guidance (or the function that performs
metadata = args.as_dict()) to perform this safe branching and normalize to a
plain dict before further use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sampleworks/utils/cif_utils.py`:
- Around line 275-279: The docstring example is non-runnable because
add_category_to_cif expects an existing CIF block; update the example to create
or add a block to the CIFFile before calling add_category_to_cif (e.g., create
CIFFile(), add a block or use CIFFile().new_block(...) / ensure ciffile.block
contains at least one block), then call add_category_to_cif(ciffile, data,
"my_custom_data"); reference the CIFFile constructor and add_category_to_cif
function to locate and fix the example.
- Around line 314-317: The current CIFCategory construction replaces only
top-level None values in data but leaves None inside list-like columns (e.g.,
data entries that are lists/tuples), so update the normalization before creating
CIFCategory: walk each value in the data dict (the variable data used to build
category in code that constructs category = CIFCategory(...)), and if a value is
a list/tuple, map its elements replacing inner None with "?" (recursively if
nested lists are possible); leave non-list values handled as before. After
normalizing list elements, pass the cleaned dict into
CIFCategory(name=category_name, columns=...) so all nulls comply with CIF rules.

---

Duplicate comments:
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Around line 331-332: The code unconditionally calls args.as_dict() (in
guidance_script_utils.py) which will raise AttributeError for callers that still
pass an argparse.Namespace (e.g. parse_*_args() + run_guidance(...) sites);
change the metadata extraction to handle all forms: if args has as_dict use
that, else if it's an argparse.Namespace use vars(args), else if it's a
GuidanceConfig (or other known config class) call its appropriate to-dict
method, so metadata is always a dict; update the metadata assignment in
run_guidance (or the function that performs metadata = args.as_dict()) to
perform this safe branching and normalize to a plain dict before further use.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 940ad7a7-cb49-403a-9e2e-7a5c2eda48a1

📥 Commits

Reviewing files that changed from the base of the PR and between 33dc325 and 02258c5.

📒 Files selected for processing (3)
  • src/sampleworks/utils/cif_utils.py
  • src/sampleworks/utils/guidance_script_utils.py
  • tests/utils/test_cif_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/utils/test_cif_utils.py

Comment thread src/sampleworks/utils/cif_utils.py
Comment thread src/sampleworks/utils/cif_utils.py
Copy link
Copy Markdown
Collaborator

@k-chrispens k-chrispens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! This is a cool new method

@k-chrispens k-chrispens merged commit e66b543 into main Apr 10, 2026
11 checks passed
@k-chrispens k-chrispens deleted the mdc/store-metadata-in-cif branch April 10, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write sampleworks metadata into CIF files directly.

2 participants