Skip to content

fix: quick patch to enforce that the ref chain is the one assigned to…#237

Merged
k-chrispens merged 1 commit into
mainfrom
kmc/patch-protenix-chain-mismatch
May 19, 2026
Merged

fix: quick patch to enforce that the ref chain is the one assigned to…#237
k-chrispens merged 1 commit into
mainfrom
kmc/patch-protenix-chain-mismatch

Conversation

@k-chrispens
Copy link
Copy Markdown
Collaborator

@k-chrispens k-chrispens commented May 19, 2026

… the cif file

Summary by CodeRabbit

  • Chores
    • Updated version control ignore patterns for large run outputs
    • Modified internal structure file processing to handle chain identifier mismatches

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

Walkthrough

This PR makes two independent changes: it expands .gitignore patterns to match run output directories with numbered variants (e.g., grid_search_results1, grid_search_results2), and it removes an early return in CIF file patching when reference and derived structure chain IDs don't match, allowing the function to continue instead of aborting.

Changes

Output handling improvements

Layer / File(s) Summary
Ignore pattern expansion for run outputs
.gitignore
Ignore entries for run output folders are widened from exact directory names (grid_search_results/, outputs/) to wildcard-suffixed patterns (grid_search_results*/, output*/) to match numbered variants, while preserving the exception for src/sampleworks/data/protein_configs.csv.
Chain mismatch handling in CIF patching
scripts/patch_output_cif_files.py
The patch_individual_cif_file function no longer returns an error when chain IDs between reference and derived structures don't match; early exit is commented out and execution continues, with a TODO noting the current behavior breaks multi-chain support.

🎯 2 (Simple) | ⏱️ ~8 minutes

A rabbit hops through code with care,
New patterns catch the outputs there,
And when chain IDs disagree,
We continue on (reluctantly),
With a note: "Please fix this, someday soon!" 🐰

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title is related to the main change (handling chain mismatch in CIF patching), but is truncated/incomplete and lacks clarity about the actual fix being applied. Complete the title to fully convey the fix's purpose. Consider: 'fix: suppress early return on chain mismatch in CIF patching' or similar to clarify the actual change.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmc/patch-protenix-chain-mismatch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/patch_output_cif_files.py (1)

113-115: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add NumPy-style docstring to document behavior and limitations.

As per coding guidelines, every function should include a NumPy-style docstring. This is particularly important here given the chain mismatch handling and the TODO indicating broken multi-chain behavior.

📝 Suggested docstring structure
 def patch_individual_cif_file(
     cif_file: Path, rcsb_regex: str, reference_dir: Path, input_pdb_pattern: str
-) -> str | None:  # returns an error message if there was one
+) -> str | None:
+    """
+    Patch a CIF file with metadata and residue numbering from reference structure.
+
+    Parameters
+    ----------
+    cif_file : Path
+        Path to the CIF file to patch.
+    rcsb_regex : str
+        Regex pattern to extract RCSB ID from the CIF file path.
+    reference_dir : Path
+        Directory containing reference PDB structures.
+    input_pdb_pattern : str
+        Pattern to locate input PDB file, with {pdb_id} placeholder.
+
+    Returns
+    -------
+    str | None
+        Error message if patching failed, None if successful.
+
+    Notes
+    -----
+    Chain mismatch handling is currently broken for multi-chain structures.
+    When chain IDs don't match between reference and CIF, an error is logged
+    but processing continues, which may result in incorrect residue mapping.
+    """

As per coding guidelines, Python files should always include NumPy-style docstrings for every function and class.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/patch_output_cif_files.py` around lines 113 - 115, Add a NumPy-style
docstring to the function patch_individual_cif_file describing its purpose,
parameters (cif_file: Path, rcsb_regex: str, reference_dir: Path,
input_pdb_pattern: str), return value (str | None indicating an error message or
None), and key behavior/limitations including how chain-mismatch handling works
and the current TODO about broken multi-chain behavior; ensure the docstring
documents side-effects, exceptions raised (if any), and any
preconditions/assumptions so future readers understand the chain-mismatch edge
case and known limitation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/patch_output_cif_files.py`:
- Around line 162-164: The code currently comments out the early "return msg" on
chain ID mismatch which lets execution continue and later updates
asym_unit.res_id using a possibly incorrect mapping; restore proper error
propagation by reinstating the original early return behavior (return msg) in
the chain-mismatch detection so callers receive the error string, or
alternatively change the function signature to return a tuple like (None,
warning_msg) / (error_msg, None) and update callers accordingly; locate the
chain mismatch check and the mapping code that leads to asym_unit.res_id to
apply this fix.

---

Outside diff comments:
In `@scripts/patch_output_cif_files.py`:
- Around line 113-115: Add a NumPy-style docstring to the function
patch_individual_cif_file describing its purpose, parameters (cif_file: Path,
rcsb_regex: str, reference_dir: Path, input_pdb_pattern: str), return value (str
| None indicating an error message or None), and key behavior/limitations
including how chain-mismatch handling works and the current TODO about broken
multi-chain behavior; ensure the docstring documents side-effects, exceptions
raised (if any), and any preconditions/assumptions so future readers understand
the chain-mismatch edge case and known limitation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a28e2a66-7930-476d-8603-cd544f67ceb2

📥 Commits

Reviewing files that changed from the base of the PR and between fbf6d38 and c2cc351.

📒 Files selected for processing (2)
  • .gitignore
  • scripts/patch_output_cif_files.py

Comment thread scripts/patch_output_cif_files.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tweaks the CIF post-processing patch script to no longer abort on reference/output chain mismatches, and updates .gitignore patterns to more broadly ignore large run-result directories.

Changes:

  • Disable early-return failure on chain mismatches during residue remapping in patch_output_cif_files.py.
  • Broaden .gitignore patterns for run outputs (grid_search_results*, output*) to avoid accidentally committing large result folders.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

File Description
scripts/patch_output_cif_files.py Stops returning an error on chain mismatch during residue remapping (continues patching).
.gitignore Expands ignored run-result directory patterns to match more output folder name variants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 159 to +163
if cif_key[0] != ref_key[0]:
msg = f"Chain mismatch while remapping residues for {cif_path} vs {reference_path}"
logger.error(msg)
return msg
# return msg
# TODO: fix chain mismatches upstream (protenix json creation needs update)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now

Copy link
Copy Markdown
Collaborator

@marcuscollins marcuscollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but please do not forget to file that issue and follow up to get it fixed, otherwise this will cause us other problems down the road.

logger.error(msg)
return msg
# return msg
# TODO: fix chain mismatches upstream (protenix json creation needs update)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure to add an issue, please mark it as a bug actually, Can Michael or Justin take this one on?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it! And probably yes. Will add tags now

@k-chrispens k-chrispens merged commit 1c56a5d into main May 19, 2026
5 checks passed
@k-chrispens k-chrispens deleted the kmc/patch-protenix-chain-mismatch branch May 19, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants