Updates to CIF preparation post grid search#103
Conversation
…e original pdb; 2/ include occ and B-factor columns 3/ write to a file refined-patched.cif instead of backing up the original
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe CIF patching script now accepts a reference directory and input-PDB pattern, loads reference and target CIFs, constructs and validates residue-number mappings, updates atom_site residue identifiers and annotations, and writes patched files with "-patched.cif" suffix. CLI signatures and error reporting were updated. Changes
Sequence DiagramsequenceDiagram
participant Main as Main Workflow
participant Patcher as CIF Patcher
participant RefLoader as Reference Loader
participant TargetLoader as Target Loader
participant Mapper as Residue Mapper
participant Validator as Validator
participant Writer as CIF Writer
Main->>Patcher: patch_individual_cif_file(cif_file, rcsb_regex, reference_dir, input_pdb_pattern)
Patcher->>RefLoader: load reference CIF (grid_search_input_dir + pattern)
RefLoader-->>Patcher: reference residue data
Patcher->>TargetLoader: load target CIF
TargetLoader-->>Patcher: target residue data
Patcher->>Mapper: build residue-number mapping (ref → target)
Mapper->>Validator: ensure residue ID sets match
alt validation success
Mapper-->>Patcher: mapping dict
Patcher->>Writer: apply mapping, add occupancy/B_iso_or_equiv
Writer-->>Patcher: write `-patched.cif`
Patcher-->>Main: return None
else validation failed
Validator-->>Patcher: error message
Patcher-->>Main: return error message
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
scripts/patch_input_cif_files.py (1)
184-192: Consider filling missing occupancy/B-factor values even when columns already exist.Current logic only adds columns when absent. If existing columns contain missing values (
?/.), those entries remain unset. If the requirement is “value missing” rather than just “column missing”, backfill per-atom defaults too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/patch_input_cif_files.py` around lines 184 - 192, The current block only creates occupancy/B_iso_or_equiv columns when missing; instead, ensure you also backfill per-atom default values for existing columns that contain missing entries. In the atom_site block (template.block["atom_site"]) check for keys "occupancy" and "B_iso_or_equiv": if a key is missing, create a CIFColumn filled with defaults ([1.0] * len(template.block["atom_site"]["id"]) and [20.0] * len(...)); if the column exists, iterate its entries and replace missing markers (e.g., "?", ".", None, or empty strings) with the same defaults at those indices so occupancy and B_iso_or_equiv columns are fully populated; keep using CIFColumn semantics and the existing template.block["atom_site"]["id"] length to drive sizes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/patch_input_cif_files.py`:
- Around line 60-64: The help text for the --cif-pattern argument is incorrect:
matching uses fnmatch/glob semantics not regex. Update the parser.add_argument
call for "--cif-pattern" (the existing argument definition) to describe it as a
glob/fnmatch pattern (e.g., "Glob pattern for CIF files to patch, default:
'refined.cif'") or alternatively change the matching logic where fnmatch is used
to perform a true regex match; make the fix in the parser.add_argument help
string or in the matching code so the help and behavior are consistent.
- Around line 123-129: The call to load_any(reference_path) can raise and will
abort joblib.Parallel; wrap the reference load in a try/except inside the worker
so exceptions are caught and converted to an error string (e.g. f"reference load
error: {e}") and returned instead of raising; only call
ensure_atom_array_stack(asym_unit) when the reference load succeeded, and ensure
the function signature still returns str | None so callers can collect error
strings from joblib.Parallel without crashing.
- Around line 203-205: The CLI flag args.input_pdb_pattern is parsed but never
forwarded into the program; update the final invocation of main to include
args.input_pdb_pattern so the user-provided pattern is used (i.e., modify the
main(...) call where main is invoked to pass args.input_pdb_pattern alongside
args.input_dir, args.grid_search_input_dir, args.cif_pattern, args.rcsb_pattern,
args.depth), ensuring the parameter order matches main's signature.
- Around line 137-149: The current residue remapping compares only sorted unique
res_id (ref_resnums/cif_resnums) and thus drops chain context; change the logic
in scripts/patch_input_cif_files.py to build chain-aware residue keys (e.g.,
tuples of (chain_id, res_id, insertion_code) derived from reference.res_id and
asym_unit.res_id) preserve ordering, compare those unique chain-aware keys for
length/equality, create the mapping using those chain-aware keys (use the same
unique-key ordering used to build mapping rather than plain sorted numbers), and
then replace asym_unit.res_id by mapping each residue's chain-aware key to the
corresponding reference residue number (or insertion-aware target) so that
mapping is correct for multi-chain inputs; keep using the existing variables
(reference.res_id, asym_unit.res_id, mapping, cif_path, reference_path) but use
chain-aware keys when constructing mapping and validating lengths.
---
Nitpick comments:
In `@scripts/patch_input_cif_files.py`:
- Around line 184-192: The current block only creates occupancy/B_iso_or_equiv
columns when missing; instead, ensure you also backfill per-atom default values
for existing columns that contain missing entries. In the atom_site block
(template.block["atom_site"]) check for keys "occupancy" and "B_iso_or_equiv":
if a key is missing, create a CIFColumn filled with defaults ([1.0] *
len(template.block["atom_site"]["id"]) and [20.0] * len(...)); if the column
exists, iterate its entries and replace missing markers (e.g., "?", ".", None,
or empty strings) with the same defaults at those indices so occupancy and
B_iso_or_equiv columns are fully populated; keep using CIFColumn semantics and
the existing template.block["atom_site"]["id"] length to drive sizes.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
Closing due to merge issues, recreated on a separate branch to fix properly. |
Changes to CIF patching:
refined.cif, write the new CIF file asrefined-patched.cifSummary by CodeRabbit
New Features
Improvements
Chores