Updates to CIF preparation post grid search by marcuscollins · Pull Request #103 · diff-use/sampleworks

marcuscollins · 2026-02-26T00:03:00Z

Changes to CIF patching:

Set the sequence ids to what they are in the input CIF used for guidance
Include the occupancy and B-factor columns, setting B to 20.0 if missing, occupancy to 1.0 if missing
Instead of backing up the original refined.cif, write the new CIF file as refined-patched.cif

Summary by CodeRabbit

New Features
- Enhanced CIF patching to align residue numbering using reference CIFs and added options to specify a reference directory and input pattern.
Improvements
- Stronger validation and error reporting for residue/atom data; patched outputs saved as "-patched.cif"; preserves occupancy/B_iso annotations when missing; parallelized patching workflow.
Chores
- Replaced pyright tooling with ty checks in pre-commit and dev config; added ty configuration in project metadata.

…e original pdb; 2/ include occ and B-factor columns 3/ write to a file refined-patched.cif instead of backing up the original

coderabbitai · 2026-02-26T00:03:18Z

Warning

Rate limit exceeded

@marcuscollins has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between eabcebc and 1901f79.

📒 Files selected for processing (1)

scripts/patch_input_cif_files.py

📝 Walkthrough

Walkthrough

The CIF patching script now accepts a reference directory and input-PDB pattern, loads reference and target CIFs, constructs and validates residue-number mappings, updates atom_site residue identifiers and annotations, and writes patched files with "-patched.cif" suffix. CLI signatures and error reporting were updated.

Changes

Cohort / File(s)	Summary
CIF Patcher Logic `scripts/patch_input_cif_files.py`	Added reference-dir & input-pdb-pattern parameters; `patch_individual_cif_file()` signature changed to return `str
Pre-commit hooks `.pre-commit-config.yaml`	Renamed three local hooks (ids and names) from `pyright-` to `ty-`, replaced `pyright` entries with `ty check`, and added a `files: protenix` filter to the protenix hook.
Tooling config `pyproject.toml`	Replaced dev tooling `pyright` with `ty`; added `[tool.ty.environment]` and `[tool.ty.rules]` blocks (python-version and unresolved-import rule).

Sequence Diagram

sequenceDiagram
    participant Main as Main Workflow
    participant Patcher as CIF Patcher
    participant RefLoader as Reference Loader
    participant TargetLoader as Target Loader
    participant Mapper as Residue Mapper
    participant Validator as Validator
    participant Writer as CIF Writer

    Main->>Patcher: patch_individual_cif_file(cif_file, rcsb_regex, reference_dir, input_pdb_pattern)
    Patcher->>RefLoader: load reference CIF (grid_search_input_dir + pattern)
    RefLoader-->>Patcher: reference residue data
    Patcher->>TargetLoader: load target CIF
    TargetLoader-->>Patcher: target residue data
    Patcher->>Mapper: build residue-number mapping (ref → target)
    Mapper->>Validator: ensure residue ID sets match
    alt validation success
        Mapper-->>Patcher: mapping dict
        Patcher->>Writer: apply mapping, add occupancy/B_iso_or_equiv
        Writer-->>Patcher: write `-patched.cif`
        Patcher-->>Main: return None
    else validation failed
        Validator-->>Patcher: error message
        Patcher-->>Main: return error message
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

New script that patches the CIF files we write #72: Directly modifies the same CIF patching utility and overlaps on function signatures and residue-mapping enhancements.

Suggested reviewers

k-chrispens

Poem

🐰
A reference CIF held close and neat,
I hop through residues, mapping each beat,
I validate, patch, and write with glee,
Now densities align — hoppy harmony! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: updating CIF file preparation logic that runs after grid search. It directly reflects the primary modifications to CIF patching behavior.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch patch-cif-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

scripts/patch_input_cif_files.py (1)
184-192: Consider filling missing occupancy/B-factor values even when columns already exist.

Current logic only adds columns when absent. If existing columns contain missing values (?/.), those entries remain unset. If the requirement is “value missing” rather than just “column missing”, backfill per-atom defaults too.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/patch_input_cif_files.py` around lines 184 - 192, The current block
only creates occupancy/B_iso_or_equiv columns when missing; instead, ensure you
also backfill per-atom default values for existing columns that contain missing
entries. In the atom_site block (template.block["atom_site"]) check for keys
"occupancy" and "B_iso_or_equiv": if a key is missing, create a CIFColumn filled
with defaults ([1.0] * len(template.block["atom_site"]["id"]) and [20.0] *
len(...)); if the column exists, iterate its entries and replace missing markers
(e.g., "?", ".", None, or empty strings) with the same defaults at those indices
so occupancy and B_iso_or_equiv columns are fully populated; keep using
CIFColumn semantics and the existing template.block["atom_site"]["id"] length to
drive sizes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/patch_input_cif_files.py`:
- Around line 60-64: The help text for the --cif-pattern argument is incorrect:
matching uses fnmatch/glob semantics not regex. Update the parser.add_argument
call for "--cif-pattern" (the existing argument definition) to describe it as a
glob/fnmatch pattern (e.g., "Glob pattern for CIF files to patch, default:
'refined.cif'") or alternatively change the matching logic where fnmatch is used
to perform a true regex match; make the fix in the parser.add_argument help
string or in the matching code so the help and behavior are consistent.
- Around line 123-129: The call to load_any(reference_path) can raise and will
abort joblib.Parallel; wrap the reference load in a try/except inside the worker
so exceptions are caught and converted to an error string (e.g. f"reference load
error: {e}") and returned instead of raising; only call
ensure_atom_array_stack(asym_unit) when the reference load succeeded, and ensure
the function signature still returns str | None so callers can collect error
strings from joblib.Parallel without crashing.
- Around line 203-205: The CLI flag args.input_pdb_pattern is parsed but never
forwarded into the program; update the final invocation of main to include
args.input_pdb_pattern so the user-provided pattern is used (i.e., modify the
main(...) call where main is invoked to pass args.input_pdb_pattern alongside
args.input_dir, args.grid_search_input_dir, args.cif_pattern, args.rcsb_pattern,
args.depth), ensuring the parameter order matches main's signature.
- Around line 137-149: The current residue remapping compares only sorted unique
res_id (ref_resnums/cif_resnums) and thus drops chain context; change the logic
in scripts/patch_input_cif_files.py to build chain-aware residue keys (e.g.,
tuples of (chain_id, res_id, insertion_code) derived from reference.res_id and
asym_unit.res_id) preserve ordering, compare those unique chain-aware keys for
length/equality, create the mapping using those chain-aware keys (use the same
unique-key ordering used to build mapping rather than plain sorted numbers), and
then replace asym_unit.res_id by mapping each residue's chain-aware key to the
corresponding reference residue number (or insertion-aware target) so that
mapping is correct for multi-chain inputs; keep using the existing variables
(reference.res_id, asym_unit.res_id, mapping, cif_path, reference_path) but use
chain-aware keys when constructing mapping and validating lengths.

---

Nitpick comments:
In `@scripts/patch_input_cif_files.py`:
- Around line 184-192: The current block only creates occupancy/B_iso_or_equiv
columns when missing; instead, ensure you also backfill per-atom default values
for existing columns that contain missing entries. In the atom_site block
(template.block["atom_site"]) check for keys "occupancy" and "B_iso_or_equiv":
if a key is missing, create a CIFColumn filled with defaults ([1.0] *
len(template.block["atom_site"]["id"]) and [20.0] * len(...)); if the column
exists, iterate its entries and replace missing markers (e.g., "?", ".", None,
or empty strings) with the same defaults at those indices so occupancy and
B_iso_or_equiv columns are fully populated; keep using CIFColumn semantics and
the existing template.block["atom_site"]["id"] length to drive sizes.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 469d8e4 and fedf668.

📒 Files selected for processing (1)

scripts/patch_input_cif_files.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

marcuscollins · 2026-02-26T00:35:52Z

Closing due to merge issues, recreated on a separate branch to fix properly.

marcuscollins added 2 commits February 25, 2026 23:57

Changes to CIF patching script: 1/ set the seq id to what it is in th…

b488571

…e original pdb; 2/ include occ and B-factor columns 3/ write to a file refined-patched.cif instead of backing up the original

put back deleted pixi.lock

fedf668

coderabbitai Bot reviewed Feb 26, 2026

View reviewed changes

Comment thread scripts/patch_input_cif_files.py

Comment thread scripts/patch_input_cif_files.py Outdated

Comment thread scripts/patch_input_cif_files.py Outdated

Comment thread scripts/patch_input_cif_files.py

k-chrispens and others added 3 commits February 26, 2026 00:08

feat(typechecking): migrate from pyright to ty

7de56bf

fix(ty): add tool section for ty

eabcebc

addressing CodeRabbit f/b on PR 103

5bce0f1

marcuscollins requested a review from k-chrispens February 26, 2026 00:21

Actually use argument --input-pdb-pattern

1901f79

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

marcuscollins closed this Feb 26, 2026

marcuscollins deleted the patch-cif-updates branch February 26, 2026 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to CIF preparation post grid search#103

Updates to CIF preparation post grid search#103
marcuscollins wants to merge 6 commits intomainfrom
patch-cif-updates

marcuscollins commented Feb 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 26, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcuscollins commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcuscollins commented Feb 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcuscollins commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcuscollins commented Feb 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 26, 2026 •

edited

Loading