Skip to content

Add occupancy validation with proper error handling#217

Closed
ManjunathByadagi wants to merge 2 commits intodiff-use:mainfrom
ManjunathByadagi:fix-occupancy-validation
Closed

Add occupancy validation with proper error handling#217
ManjunathByadagi wants to merge 2 commits intodiff-use:mainfrom
ManjunathByadagi:fix-occupancy-validation

Conversation

@ManjunathByadagi
Copy link
Copy Markdown

@ManjunathByadagi ManjunathByadagi commented Apr 11, 2026

Changes

  • Added validation to ensure occupancy values are numeric
  • Ensured values are within valid range (0.0 to 1.0)
  • Added clear error messages for invalid inputs

Why

Previously, invalid occupancy values were not handled properly.
This update ensures proper validation and prevents incorrect data processing.

Example

Valid:
0.5, 1.0

Invalid:
-1, 2 → will raise error

Summary by CodeRabbit

  • Refactor

    • Streamlined CIF patching utility for a simpler, more maintainable processing flow.
  • Bug Fixes

    • Improved validation and handling of atomic coordinates and occupancy/B_iso values to reduce malformed output and surface clearer error messages.
  • Other

    • Patched files are saved with a "-patched.cif" suffix.

Copilot AI review requested due to automatic review settings April 11, 2026 07:12
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

📝 Walkthrough

Walkthrough

Condensed and simplified a CIF-patching script: removed docstrings and type annotations from three core functions, shortened CLI help text, and substantially reduced patching logic—removing metadata copying, coordinate cleanup, and detailed error branches—while adding occupancy/B_iso validation and changing output naming.

Changes

Cohort / File(s) Summary
Core Script
scripts/patch_output_cif_files.py
Condensed crawl_dir_by_depth definition, simplified parse_args() help text, and replaced typed main/patch_individual_cif_file signatures with untyped parameters. Major internal simplification of patch_individual_cif_file: removed metadata/sampleworks copying, NaN cleanup, entity-id and seq-scheme syncing; broadened exception handling and simplified error messages; adjusted residue remapping, atom_site.id sizing, occupancy and B_iso_or_equiv validation, and changed output filename to "{stem}-patched.cif". Removed some unused imports and terminal newline.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • k-chrispens

Poem

🐰 Hopping through lines with a nimble paw,
patches trimmed tidy, no fuss, no flaw.
Occupancy checked, B-factors in view,
out comes a CIF with a fresher hue.
Small, bright changes — a little code awe.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add occupancy validation with proper error handling' directly addresses the primary objective of the PR, which is to add occupancy validation and error handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates scripts/patch_output_cif_files.py, primarily aiming to validate atom_site.occupancy values when patching output CIFs, but it also includes broader refactors/removals in the CIF patching workflow.

Changes:

  • Added occupancy numeric + range validation (0.0–1.0) in the CIF patching step.
  • Simplified CLI argument definitions and several error messages/return paths.
  • Refactored parts of the patching logic (including how atom_site.id is assigned) and removed multiple previously-present patch steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/patch_output_cif_files.py Outdated
Comment on lines 47 to 55
parser.add_argument("--input-dir", required=True)
parser.add_argument(
"--cif-pattern",
default="refined.cif",
help="Pattern used by fnmatch/glob for cif files to patch, default: 'refined.cif'",
)
parser.add_argument(
"--rcsb-pattern",
default="grid_search_results/(.{4})",
help="Regex pattern for rcsb ids in file paths. "
"Must have only one group, surrounding the id",
)
parser.add_argument(
"--depth", type=int, default=4, help="Depth to search the directory tree below input-dir"
)
parser.add_argument("--cif-pattern", default="refined.cif")
parser.add_argument("--rcsb-pattern", default="grid_search_results/(.{4})")
parser.add_argument("--depth", type=int, default=4)
parser.add_argument("--grid-search-input-dir", required=True)
parser.add_argument(
"--input-pdb-pattern",
default="{pdb_id}/{pdb_id}_single_001_density_input.cif",
help="Pattern used by fnmatch/glob for input pdb files. The complete path of the input "
"pdb must match f'{grid-search-input-dir}/{input-pdb-pattern}'. Defaults to "
"'{pdb_id}/{pdb_id}_single_001_density_input.cif'",
)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI args lost their help= descriptions (e.g., --cif-pattern, --rcsb-pattern, --depth, --input-pdb-pattern). This makes the script harder to use/debug and is a regression from the previous interface; please restore the help strings (or at least the ones that document expected formats like the regex group requirement).

Copilot uses AI. Check for mistakes.
Comment on lines 87 to 95
try:
reference_path = reference_dir / input_pdb_pattern.format(pdb_id=rcsb_id)
# fetch only downloads the file if it isn't already present.
rcsb_path = fetch(rcsb_id, format="cif", target_path=str(SAMPLEWORKS_CACHE))

reference = load_any(reference_path)
asym_unit = load_any(cif_file)
asym_unit = ensure_atom_array_stack(asym_unit)
asym_unit = ensure_atom_array_stack(load_any(cif_file))
except Exception:
msg = f"Unable to read and parse either/both of {reference_path}, {cif_file}. "
logger.warning(msg)
return msg
return f"Error reading files"

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad except Exception: here discards the exception details and returns a generic message (and the f-string has no interpolation). This makes failures hard to triage and contradicts the PR goal of “clear error messages”. Capture the exception (as e), log it with context (e.g., logger.exception(...) including both paths), and include the relevant details in the returned error string.

Copilot uses AI. Check for mistakes.
msg = f"Chain mismatch while remapping residues for {cif_path} vs {reference_path}"
logger.error(msg)
return msg
for cif_key, ref_key in zip(cif_keys, ref_keys):
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Residue remapping no longer verifies that chain IDs match between cif_keys and ref_keys (the mapping ignores ref_key[0]). If chain ordering/IDs differ, this will silently assign the wrong residue numbers. Reinstate a chain-id consistency check (and ideally keep zip(..., strict=True) for defensive correctness) before populating mapping.

Suggested change
for cif_key, ref_key in zip(cif_keys, ref_keys):
for cif_key, ref_key in zip(cif_keys, ref_keys, strict=True):
if cif_key[0] != ref_key[0]:
return "Residue mapping mismatch"

Copilot uses AI. Check for mistakes.
Comment thread scripts/patch_output_cif_files.py Outdated

atom_keys = list(zip(asym_unit.chain_id.tolist(), asym_unit.res_id.tolist()))
asym_unit.res_id = np.array([mapping[k] for k in atom_keys], dtype=asym_unit.res_id.dtype)
asym_unit.res_id = np.array([mapping[k] for k in atom_keys])
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asym_unit.res_id is reassigned without preserving the original dtype. Downstream biotite/atomworks code often expects res_id to keep its integer dtype; use dtype=asym_unit.res_id.dtype (or equivalent) when building the new array to avoid accidental dtype changes.

Suggested change
asym_unit.res_id = np.array([mapping[k] for k in atom_keys])
asym_unit.res_id = np.array(
[mapping[k] for k in atom_keys], dtype=asym_unit.res_id.dtype
)

Copilot uses AI. Check for mistakes.
Comment on lines 112 to 116
template = CIFFile.read(rcsb_path)

# Write sampleworks trial metadata to the CIF file, if we can find it
cif_data = CIFFile.read(cif_path)
if "sampleworks" in cif_data.block:
template.block["sampleworks"] = cif_data.block["sampleworks"]
elif (metadata_path := cif_path.parent / "job_metadata.json").exists():
with open(metadata_path, "r") as fp:
metadata = json.load(fp)
if metadata is not None:
add_category_to_cif(template, metadata, "sampleworks")
else:
logger.warning(f"Sampleworks metadata file at {metadata_path} is empty")
else:
logger.warning(f"No sampleworks metadata found for {cif_path}")

# remove any atoms with nan coordinates--these seem to come in because we sometimes use parse
# (from AtomWorks) which creates them. Still, we'll do this here just in case.
asym_unit = remove_atoms_with_any_nan_coords(asym_unit)

# make sure entity ids match in atom_site and entity_poly
if "entity_poly" in template.block:
ep = template.block["entity_poly"]
# fixme for now I'm using a hack--if there's one polymer entity, just assert that
# polymers in atom_site have to be that one. Otherwise do nothing.
if len(ep["entity_id"]) == 1:
entity_id = ep["entity_id"].as_item()
if "label_entity_id" not in asym_unit.get_annotation_categories():
asym_unit.add_annotation("label_entity_id", int)
asym_unit.label_entity_id = np.ones_like(asym_unit.label_entity_id) * int(entity_id)
else:
logger.warning("No entity_poly block found in template CIF file. Cannot patch entity ids")

# now set the structure with correct entity ids
set_structure(template, asym_unit)

# If there's a pdbx_poly_seq_scheme, make sure the seq nums all agree, as
# the numbers in our outputs will all agree. We appear to use the one called ndb_seq_num
nsm = template.block["pdbx_poly_seq_scheme"]["ndb_seq_num"]
template.block["pdbx_poly_seq_scheme"]["pdb_seq_num"] = nsm
template.block["pdbx_poly_seq_scheme"]["auth_seq_num"] = nsm
template.block["atom_site"]["id"] = CIFColumn(np.arange(len(asym_unit)))

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template.block["atom_site"]["id"] is set using np.arange(len(asym_unit)), but len(AtomArrayStack) is the number of frames/models, not the number of atoms. This will almost certainly produce an atom_site.id column with the wrong length, breaking CIF consistency and downstream parsing. Generate IDs based on the actual number of atom_site rows after set_structure() (or the atom count), not len(asym_unit).

Copilot uses AI. Check for mistakes.
Comment thread scripts/patch_output_cif_files.py Outdated
Comment on lines +134 to +139
raise ValueError(f"Invalid occupancy value '{val}'. Must be a number.")

if not (0.0 <= value <= 1.0):
raise ValueError(
f"Invalid occupancy value {value}. Must be between 0.0 and 1.0."
)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This occupancy validation raises ValueError inside patch_individual_cif_file(), which is executed via joblib.Parallel(). A raised exception will abort the whole parallel run rather than being collected into the results list like other errors. To keep “proper error handling”, catch validation errors and return a descriptive error string (and/or log), instead of raising.

Suggested change
raise ValueError(f"Invalid occupancy value '{val}'. Must be a number.")
if not (0.0 <= value <= 1.0):
raise ValueError(
f"Invalid occupancy value {value}. Must be between 0.0 and 1.0."
)
error_message = (
f"Failed to patch {cif_path}: invalid occupancy value '{val}'. "
"Occupancy must be a number."
)
logger.error(error_message)
return error_message
if not (0.0 <= value <= 1.0):
error_message = (
f"Failed to patch {cif_path}: invalid occupancy value {value}. "
"Occupancy must be between 0.0 and 1.0."
)
logger.error(error_message)
return error_message

Copilot uses AI. Check for mistakes.
Comment on lines +117 to 120

# YOUR FIXED SECTION STARTS HERE


Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments # YOUR FIXED SECTION STARTS HERE/ENDS HERE look like leftover scaffolding and add noise to the script. Please remove these markers (and the surrounding extra blank whitespace) to keep the file maintainable.

Copilot uses AI. Check for mistakes.
Comment on lines 111 to 114

# load the actual PDB, we'll copy the new coordinates and metadata into it.
template = CIFFile.read(rcsb_path)

# Write sampleworks trial metadata to the CIF file, if we can find it
cif_data = CIFFile.read(cif_path)
if "sampleworks" in cif_data.block:
template.block["sampleworks"] = cif_data.block["sampleworks"]
elif (metadata_path := cif_path.parent / "job_metadata.json").exists():
with open(metadata_path, "r") as fp:
metadata = json.load(fp)
if metadata is not None:
add_category_to_cif(template, metadata, "sampleworks")
else:
logger.warning(f"Sampleworks metadata file at {metadata_path} is empty")
else:
logger.warning(f"No sampleworks metadata found for {cif_path}")

# remove any atoms with nan coordinates--these seem to come in because we sometimes use parse
# (from AtomWorks) which creates them. Still, we'll do this here just in case.
asym_unit = remove_atoms_with_any_nan_coords(asym_unit)

# make sure entity ids match in atom_site and entity_poly
if "entity_poly" in template.block:
ep = template.block["entity_poly"]
# fixme for now I'm using a hack--if there's one polymer entity, just assert that
# polymers in atom_site have to be that one. Otherwise do nothing.
if len(ep["entity_id"]) == 1:
entity_id = ep["entity_id"].as_item()
if "label_entity_id" not in asym_unit.get_annotation_categories():
asym_unit.add_annotation("label_entity_id", int)
asym_unit.label_entity_id = np.ones_like(asym_unit.label_entity_id) * int(entity_id)
else:
logger.warning("No entity_poly block found in template CIF file. Cannot patch entity ids")

# now set the structure with correct entity ids
set_structure(template, asym_unit)

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the metadata/NaN-handling logic, the imports json, remove_atoms_with_any_nan_coords, and add_category_to_cif are now unused in this file. Ruff is configured to flag F401 unused imports, so this will fail lint/CI; remove the unused imports or reintroduce the functionality that uses them.

Copilot uses AI. Check for mistakes.
Comment on lines 111 to 116

# load the actual PDB, we'll copy the new coordinates and metadata into it.
template = CIFFile.read(rcsb_path)

# Write sampleworks trial metadata to the CIF file, if we can find it
cif_data = CIFFile.read(cif_path)
if "sampleworks" in cif_data.block:
template.block["sampleworks"] = cif_data.block["sampleworks"]
elif (metadata_path := cif_path.parent / "job_metadata.json").exists():
with open(metadata_path, "r") as fp:
metadata = json.load(fp)
if metadata is not None:
add_category_to_cif(template, metadata, "sampleworks")
else:
logger.warning(f"Sampleworks metadata file at {metadata_path} is empty")
else:
logger.warning(f"No sampleworks metadata found for {cif_path}")

# remove any atoms with nan coordinates--these seem to come in because we sometimes use parse
# (from AtomWorks) which creates them. Still, we'll do this here just in case.
asym_unit = remove_atoms_with_any_nan_coords(asym_unit)

# make sure entity ids match in atom_site and entity_poly
if "entity_poly" in template.block:
ep = template.block["entity_poly"]
# fixme for now I'm using a hack--if there's one polymer entity, just assert that
# polymers in atom_site have to be that one. Otherwise do nothing.
if len(ep["entity_id"]) == 1:
entity_id = ep["entity_id"].as_item()
if "label_entity_id" not in asym_unit.get_annotation_categories():
asym_unit.add_annotation("label_entity_id", int)
asym_unit.label_entity_id = np.ones_like(asym_unit.label_entity_id) * int(entity_id)
else:
logger.warning("No entity_poly block found in template CIF file. Cannot patch entity ids")

# now set the structure with correct entity ids
set_structure(template, asym_unit)

# If there's a pdbx_poly_seq_scheme, make sure the seq nums all agree, as
# the numbers in our outputs will all agree. We appear to use the one called ndb_seq_num
nsm = template.block["pdbx_poly_seq_scheme"]["ndb_seq_num"]
template.block["pdbx_poly_seq_scheme"]["pdb_seq_num"] = nsm
template.block["pdbx_poly_seq_scheme"]["auth_seq_num"] = nsm
template.block["atom_site"]["id"] = CIFColumn(np.arange(len(asym_unit)))

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description is focused on adding occupancy validation, but this diff also removes substantial CIF patching behavior (e.g., sampleworks metadata propagation, NaN-coordinate filtering, entity-id patching, and pdbx_poly_seq_scheme adjustments). If those removals are intentional, please call them out explicitly in the PR description and/or split into a separate PR, since they change the script’s output semantics beyond occupancy validation.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/patch_output_cif_files.py`:
- Line 22: The functions crawl_dir_by_depth, main, and patch_individual_cif_file
are missing the required NumPy-style docstrings; add concise NumPy-style
docstrings to each function definition that document the parameters (types and
meanings), return values, and explicitly state side effects (filesystem
traversal, network calls, file writes) and failure behavior (exceptions raised
or exit conditions). For crawl_dir_by_depth describe root_dir (str|Path),
target_pattern (str), n_levels (int), and that it returns list[Path]; for
patch_individual_cif_file describe its input path, mutating file I/O and
possible exceptions; for main describe CLI inputs/flags, what it triggers, and
its exit/failure semantics. Ensure each docstring follows NumPy format with
short summary, Parameters, Returns, and Raises sections.
- Around line 115-145: The atom_site.id column is being sized with
len(asym_unit) (model count) which is incorrect after set_structure() because
atom_site rows equal atoms across all models; replace the sizing logic for
template.block["atom_site"]["id"] (and any subsequent uses like occupancy and
B_iso_or_equiv) to use the actual atom_site row count, e.g.
len(template.block["atom_site"]["Cartn_x"]) or another existing atom_site
column, so that template.block["atom_site"]["id"],
template.block["atom_site"]["occupancy"], and
template.block["atom_site"]["B_iso_or_equiv"] are created with the correct
number of rows after ensure_atom_array_stack()/set_structure().
- Around line 128-139: The occupancy validation currently raises ValueError
inside patch_individual_cif_file (loop over occupancy_values/block "atom_site"
occupancy), which breaks the function's contract of returning None or an error
string; change the exception-raising to return descriptive error strings (e.g.,
"Invalid occupancy value 'X'. Must be a number." and "Invalid occupancy value Y.
Must be between 0.0 and 1.0.") so patch_individual_cif_file returns errors
instead of throwing, preserving how main() and joblib collect and print
failures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56d9c04c-6d94-49aa-8ba7-4d6355888516

📥 Commits

Reviewing files that changed from the base of the PR and between 441c81d and 5f567a4.

📒 Files selected for processing (1)
  • scripts/patch_output_cif_files.py

- n_levels = 1: root_dir + its immediate subdirectories
- etc.
"""
def crawl_dir_by_depth(root_dir: str | Path, target_pattern: str, n_levels: int) -> list[Path]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Restore the required NumPy-style docstrings on these changed helpers.

crawl_dir_by_depth(), main(), and patch_individual_cif_file() now hide traversal, network, and filesystem side effects behind undocumented entry points. Please add back concise NumPy-style docstrings so the call contract and failure behavior stay discoverable at the definition site.

As per coding guidelines "Always include NumPy-style docstrings for every function and class."

Also applies to: 59-59, 77-77

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/patch_output_cif_files.py` at line 22, The functions
crawl_dir_by_depth, main, and patch_individual_cif_file are missing the required
NumPy-style docstrings; add concise NumPy-style docstrings to each function
definition that document the parameters (types and meanings), return values, and
explicitly state side effects (filesystem traversal, network calls, file writes)
and failure behavior (exceptions raised or exit conditions). For
crawl_dir_by_depth describe root_dir (str|Path), target_pattern (str), n_levels
(int), and that it returns list[Path]; for patch_individual_cif_file describe
its input path, mutating file I/O and possible exceptions; for main describe CLI
inputs/flags, what it triggers, and its exit/failure semantics. Ensure each
docstring follows NumPy format with short summary, Parameters, Returns, and
Raises sections.

Comment thread scripts/patch_output_cif_files.py Outdated
Comment thread scripts/patch_output_cif_files.py Outdated
@ManjunathByadagi
Copy link
Copy Markdown
Author

Thanks for the review! I've addressed the comments and updated the PR. Please let me know if anything else is needed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/patch_output_cif_files.py (1)

21-21: 🛠️ Refactor suggestion | 🟠 Major

Restore required NumPy-style docstrings on changed functions.

crawl_dir_by_depth(), parse_args(), main(), and patch_individual_cif_file() are missing required NumPy-style docstrings, including side effects/failure behavior.

As per coding guidelines "Always include NumPy-style docstrings for every function and class."

Also applies to: 45-45, 88-88, 107-107

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/patch_output_cif_files.py` at line 21, Restore NumPy-style docstrings
for the functions crawl_dir_by_depth, parse_args, main, and
patch_individual_cif_file: add a short summary, Parameters section with types
for each argument, Returns section, and a Notes/Side effects or Raises section
describing IO side effects and failure behavior (e.g., file writes, exceptions
raised on parse or patch failures). Ensure the docstrings follow NumPy
formatting (triple-quoted, sections titled "Parameters", "Returns", and "Raises"
or "Notes") and mention any filesystem modifications or error conditions the
function can produce so reviewers and callers know expected side effects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/patch_output_cif_files.py`:
- Around line 125-127: The except handlers currently return inconsistent and
sometimes empty str(e) values; update each except block that returns str(e) (the
ones referencing cif_file and reference_path) to return a consistent, actionable
message that includes the file context and the exception (e.g. return f"Error
reading {cif_file} or {reference_path}: {repr(e)}"), and do the same for the
other error-returning excepts at the same function so all failures (including
the handlers at the other locations noted) return a uniform, informative string
that contains the file names and the exception representation.

---

Duplicate comments:
In `@scripts/patch_output_cif_files.py`:
- Line 21: Restore NumPy-style docstrings for the functions crawl_dir_by_depth,
parse_args, main, and patch_individual_cif_file: add a short summary, Parameters
section with types for each argument, Returns section, and a Notes/Side effects
or Raises section describing IO side effects and failure behavior (e.g., file
writes, exceptions raised on parse or patch failures). Ensure the docstrings
follow NumPy formatting (triple-quoted, sections titled "Parameters", "Returns",
and "Raises" or "Notes") and mention any filesystem modifications or error
conditions the function can produce so reviewers and callers know expected side
effects.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e791f23-39d0-4fa1-81a4-d03f4f599463

📥 Commits

Reviewing files that changed from the base of the PR and between 5f567a4 and ddd4d6e.

📒 Files selected for processing (1)
  • scripts/patch_output_cif_files.py

Comment on lines +125 to +127
except Exception as e:
logger.exception(f"Error reading {cif_file} or {reference_path}: {e}")
return str(e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make returned errors consistently actionable (include file context).

Current returns are inconsistent (str(e) and generic strings), which makes parallel failures hard to triage and can silently drop empty exception messages.

Suggested fix
-    except Exception as e:
-        logger.exception(f"Error reading {cif_file} or {reference_path}: {e}")
-        return str(e)
+    except Exception as e:
+        msg = f"{cif_path}: Error reading {reference_path} ({type(e).__name__}: {e})"
+        logger.exception(msg)
+        return msg
@@
-        return "Missing residue numbers"
+        return f"{cif_path}: Missing residue numbers"
@@
-        return "Residue mapping mismatch"
+        return f"{cif_path}: Residue mapping mismatch"
@@
-            return "Chain mismatch in residue mapping"
+            return f"{cif_path}: Chain mismatch in residue mapping"

Also applies to: 130-130, 136-136, 143-143

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/patch_output_cif_files.py` around lines 125 - 127, The except
handlers currently return inconsistent and sometimes empty str(e) values; update
each except block that returns str(e) (the ones referencing cif_file and
reference_path) to return a consistent, actionable message that includes the
file context and the exception (e.g. return f"Error reading {cif_file} or
{reference_path}: {repr(e)}"), and do the same for the other error-returning
excepts at the same function so all failures (including the handlers at the
other locations noted) return a uniform, informative string that contains the
file names and the exception representation.

@k-chrispens
Copy link
Copy Markdown
Collaborator

Hi! I know we haven't really added contribution guidelines yet, but something we plan to require generally is that new contributor PRs are attached to an existing issue (though exceptions will apply for new ModelWrappers, etc.). For example, we haven't included an issue for this because the existing patch script is very hacky and will likely be replaced entirely in the future, so this type of QC adds review burden on us for changes that are likely not to be useful in the long run.

I encourage you to take a look at the existing issues and try out one of those instead. Closing this PR.

@ManjunathByadagi ManjunathByadagi deleted the fix-occupancy-validation branch April 16, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants