Skip to content

Updates to CIF preparation post grid search#103

Closed
marcuscollins wants to merge 6 commits intomainfrom
patch-cif-updates
Closed

Updates to CIF preparation post grid search#103
marcuscollins wants to merge 6 commits intomainfrom
patch-cif-updates

Conversation

@marcuscollins
Copy link
Copy Markdown
Collaborator

@marcuscollins marcuscollins commented Feb 26, 2026

Changes to CIF patching:

  1. Set the sequence ids to what they are in the input CIF used for guidance
  2. Include the occupancy and B-factor columns, setting B to 20.0 if missing, occupancy to 1.0 if missing
  3. Instead of backing up the original refined.cif, write the new CIF file as refined-patched.cif

Summary by CodeRabbit

  • New Features

    • Enhanced CIF patching to align residue numbering using reference CIFs and added options to specify a reference directory and input pattern.
  • Improvements

    • Stronger validation and error reporting for residue/atom data; patched outputs saved as "-patched.cif"; preserves occupancy/B_iso annotations when missing; parallelized patching workflow.
  • Chores

    • Replaced pyright tooling with ty checks in pre-commit and dev config; added ty configuration in project metadata.

…e original pdb; 2/ include occ and B-factor columns 3/ write to a file refined-patched.cif instead of backing up the original
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 26, 2026

Warning

Rate limit exceeded

@marcuscollins has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between eabcebc and 1901f79.

📒 Files selected for processing (1)
  • scripts/patch_input_cif_files.py
📝 Walkthrough

Walkthrough

The CIF patching script now accepts a reference directory and input-PDB pattern, loads reference and target CIFs, constructs and validates residue-number mappings, updates atom_site residue identifiers and annotations, and writes patched files with "-patched.cif" suffix. CLI signatures and error reporting were updated.

Changes

Cohort / File(s) Summary
CIF Patcher Logic
scripts/patch_input_cif_files.py
Added reference-dir & input-pdb-pattern parameters; patch_individual_cif_file() signature changed to return `str
Pre-commit hooks
.pre-commit-config.yaml
Renamed three local hooks (ids and names) from pyright-* to ty-*, replaced pyright entries with ty check, and added a files: protenix filter to the protenix hook.
Tooling config
pyproject.toml
Replaced dev tooling pyright with ty; added [tool.ty.environment] and [tool.ty.rules] blocks (python-version and unresolved-import rule).

Sequence Diagram

sequenceDiagram
    participant Main as Main Workflow
    participant Patcher as CIF Patcher
    participant RefLoader as Reference Loader
    participant TargetLoader as Target Loader
    participant Mapper as Residue Mapper
    participant Validator as Validator
    participant Writer as CIF Writer

    Main->>Patcher: patch_individual_cif_file(cif_file, rcsb_regex, reference_dir, input_pdb_pattern)
    Patcher->>RefLoader: load reference CIF (grid_search_input_dir + pattern)
    RefLoader-->>Patcher: reference residue data
    Patcher->>TargetLoader: load target CIF
    TargetLoader-->>Patcher: target residue data
    Patcher->>Mapper: build residue-number mapping (ref → target)
    Mapper->>Validator: ensure residue ID sets match
    alt validation success
        Mapper-->>Patcher: mapping dict
        Patcher->>Writer: apply mapping, add occupancy/B_iso_or_equiv
        Writer-->>Patcher: write `-patched.cif`
        Patcher-->>Main: return None
    else validation failed
        Validator-->>Patcher: error message
        Patcher-->>Main: return error message
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • k-chrispens

Poem

🐰
A reference CIF held close and neat,
I hop through residues, mapping each beat,
I validate, patch, and write with glee,
Now densities align — hoppy harmony! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: updating CIF file preparation logic that runs after grid search. It directly reflects the primary modifications to CIF patching behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch patch-cif-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
scripts/patch_input_cif_files.py (1)

184-192: Consider filling missing occupancy/B-factor values even when columns already exist.

Current logic only adds columns when absent. If existing columns contain missing values (?/.), those entries remain unset. If the requirement is “value missing” rather than just “column missing”, backfill per-atom defaults too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/patch_input_cif_files.py` around lines 184 - 192, The current block
only creates occupancy/B_iso_or_equiv columns when missing; instead, ensure you
also backfill per-atom default values for existing columns that contain missing
entries. In the atom_site block (template.block["atom_site"]) check for keys
"occupancy" and "B_iso_or_equiv": if a key is missing, create a CIFColumn filled
with defaults ([1.0] * len(template.block["atom_site"]["id"]) and [20.0] *
len(...)); if the column exists, iterate its entries and replace missing markers
(e.g., "?", ".", None, or empty strings) with the same defaults at those indices
so occupancy and B_iso_or_equiv columns are fully populated; keep using
CIFColumn semantics and the existing template.block["atom_site"]["id"] length to
drive sizes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/patch_input_cif_files.py`:
- Around line 60-64: The help text for the --cif-pattern argument is incorrect:
matching uses fnmatch/glob semantics not regex. Update the parser.add_argument
call for "--cif-pattern" (the existing argument definition) to describe it as a
glob/fnmatch pattern (e.g., "Glob pattern for CIF files to patch, default:
'refined.cif'") or alternatively change the matching logic where fnmatch is used
to perform a true regex match; make the fix in the parser.add_argument help
string or in the matching code so the help and behavior are consistent.
- Around line 123-129: The call to load_any(reference_path) can raise and will
abort joblib.Parallel; wrap the reference load in a try/except inside the worker
so exceptions are caught and converted to an error string (e.g. f"reference load
error: {e}") and returned instead of raising; only call
ensure_atom_array_stack(asym_unit) when the reference load succeeded, and ensure
the function signature still returns str | None so callers can collect error
strings from joblib.Parallel without crashing.
- Around line 203-205: The CLI flag args.input_pdb_pattern is parsed but never
forwarded into the program; update the final invocation of main to include
args.input_pdb_pattern so the user-provided pattern is used (i.e., modify the
main(...) call where main is invoked to pass args.input_pdb_pattern alongside
args.input_dir, args.grid_search_input_dir, args.cif_pattern, args.rcsb_pattern,
args.depth), ensuring the parameter order matches main's signature.
- Around line 137-149: The current residue remapping compares only sorted unique
res_id (ref_resnums/cif_resnums) and thus drops chain context; change the logic
in scripts/patch_input_cif_files.py to build chain-aware residue keys (e.g.,
tuples of (chain_id, res_id, insertion_code) derived from reference.res_id and
asym_unit.res_id) preserve ordering, compare those unique chain-aware keys for
length/equality, create the mapping using those chain-aware keys (use the same
unique-key ordering used to build mapping rather than plain sorted numbers), and
then replace asym_unit.res_id by mapping each residue's chain-aware key to the
corresponding reference residue number (or insertion-aware target) so that
mapping is correct for multi-chain inputs; keep using the existing variables
(reference.res_id, asym_unit.res_id, mapping, cif_path, reference_path) but use
chain-aware keys when constructing mapping and validating lengths.

---

Nitpick comments:
In `@scripts/patch_input_cif_files.py`:
- Around line 184-192: The current block only creates occupancy/B_iso_or_equiv
columns when missing; instead, ensure you also backfill per-atom default values
for existing columns that contain missing entries. In the atom_site block
(template.block["atom_site"]) check for keys "occupancy" and "B_iso_or_equiv":
if a key is missing, create a CIFColumn filled with defaults ([1.0] *
len(template.block["atom_site"]["id"]) and [20.0] * len(...)); if the column
exists, iterate its entries and replace missing markers (e.g., "?", ".", None,
or empty strings) with the same defaults at those indices so occupancy and
B_iso_or_equiv columns are fully populated; keep using CIFColumn semantics and
the existing template.block["atom_site"]["id"] length to drive sizes.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 469d8e4 and fedf668.

📒 Files selected for processing (1)
  • scripts/patch_input_cif_files.py

Comment thread scripts/patch_input_cif_files.py
Comment thread scripts/patch_input_cif_files.py Outdated
Comment thread scripts/patch_input_cif_files.py Outdated
Comment thread scripts/patch_input_cif_files.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@marcuscollins
Copy link
Copy Markdown
Collaborator Author

Closing due to merge issues, recreated on a separate branch to fix properly.

@marcuscollins marcuscollins deleted the patch-cif-updates branch February 26, 2026 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants