Add occupancy validation with proper error handling by ManjunathByadagi · Pull Request #217 · diff-use/sampleworks

ManjunathByadagi · 2026-04-11T07:12:56Z

Changes

Added validation to ensure occupancy values are numeric
Ensured values are within valid range (0.0 to 1.0)
Added clear error messages for invalid inputs

Why

Previously, invalid occupancy values were not handled properly.
This update ensures proper validation and prevents incorrect data processing.

Example

Valid:
0.5, 1.0

Invalid:
-1, 2 → will raise error

Summary by CodeRabbit

Refactor
- Streamlined CIF patching utility for a simpler, more maintainable processing flow.
Bug Fixes
- Improved validation and handling of atomic coordinates and occupancy/B_iso values to reduce malformed output and surface clearer error messages.
Other
- Patched files are saved with a "-patched.cif" suffix.

coderabbitai · 2026-04-11T07:13:11Z

📝 Walkthrough

Walkthrough

Condensed and simplified a CIF-patching script: removed docstrings and type annotations from three core functions, shortened CLI help text, and substantially reduced patching logic—removing metadata copying, coordinate cleanup, and detailed error branches—while adding occupancy/B_iso validation and changing output naming.

Changes

Cohort / File(s)	Summary
Core Script `scripts/patch_output_cif_files.py`	Condensed `crawl_dir_by_depth` definition, simplified `parse_args()` help text, and replaced typed `main`/`patch_individual_cif_file` signatures with untyped parameters. Major internal simplification of `patch_individual_cif_file`: removed metadata/sampleworks copying, NaN cleanup, entity-id and seq-scheme syncing; broadened exception handling and simplified error messages; adjusted residue remapping, `atom_site.id` sizing, occupancy and `B_iso_or_equiv` validation, and changed output filename to `"{stem}-patched.cif"`. Removed some unused imports and terminal newline.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

New script that patches the CIF files we write #72 — Edits the same CIF-patching functions and overlapping logic for atom/entity/sequence handling.
Updates to CIF patching code #104 — Overlaps on CIF patch workflow, occupancy/B_iso handling, and output filename conventions.

Suggested reviewers

k-chrispens

Poem

🐰 Hopping through lines with a nimble paw,
patches trimmed tidy, no fuss, no flaw.
Occupancy checked, B-factors in view,
out comes a CIF with a fresher hue.
Small, bright changes — a little code awe.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add occupancy validation with proper error handling' directly addresses the primary objective of the PR, which is to add occupancy validation and error handling.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR updates scripts/patch_output_cif_files.py, primarily aiming to validate atom_site.occupancy values when patching output CIFs, but it also includes broader refactors/removals in the CIF patching workflow.

Changes:

Added occupancy numeric + range validation (0.0–1.0) in the CIF patching step.
Simplified CLI argument definitions and several error messages/return paths.
Refactored parts of the patching logic (including how atom_site.id is assigned) and removed multiple previously-present patch steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T07:16:52Z

    parser.add_argument("--input-dir", required=True)
-    parser.add_argument(
-        "--cif-pattern",
-        default="refined.cif",
-        help="Pattern used by fnmatch/glob for cif files to patch, default: 'refined.cif'",
-    )
-    parser.add_argument(
-        "--rcsb-pattern",
-        default="grid_search_results/(.{4})",
-        help="Regex pattern for rcsb ids in file paths. "
-        "Must have only one group, surrounding the id",
-    )
-    parser.add_argument(
-        "--depth", type=int, default=4, help="Depth to search the directory tree below input-dir"
-    )
+    parser.add_argument("--cif-pattern", default="refined.cif")
+    parser.add_argument("--rcsb-pattern", default="grid_search_results/(.{4})")
+    parser.add_argument("--depth", type=int, default=4)
    parser.add_argument("--grid-search-input-dir", required=True)
    parser.add_argument(
        "--input-pdb-pattern",
        default="{pdb_id}/{pdb_id}_single_001_density_input.cif",
-        help="Pattern used by fnmatch/glob for input pdb files. The complete path of the input "
-        "pdb must match f'{grid-search-input-dir}/{input-pdb-pattern}'. Defaults to "
-        "'{pdb_id}/{pdb_id}_single_001_density_input.cif'",
    )


The CLI args lost their help= descriptions (e.g., --cif-pattern, --rcsb-pattern, --depth, --input-pdb-pattern). This makes the script harder to use/debug and is a regression from the previous interface; please restore the help strings (or at least the ones that document expected formats like the regex group requirement).

Copilot · 2026-04-11T07:16:52Z

    try:
        reference_path = reference_dir / input_pdb_pattern.format(pdb_id=rcsb_id)
-        # fetch only downloads the file if it isn't already present.
        rcsb_path = fetch(rcsb_id, format="cif", target_path=str(SAMPLEWORKS_CACHE))

        reference = load_any(reference_path)
-        asym_unit = load_any(cif_file)
-        asym_unit = ensure_atom_array_stack(asym_unit)
+        asym_unit = ensure_atom_array_stack(load_any(cif_file))
    except Exception:
-        msg = f"Unable to read and parse either/both of {reference_path}, {cif_file}. "
-        logger.warning(msg)
-        return msg
+        return f"Error reading files"



The broad except Exception: here discards the exception details and returns a generic message (and the f-string has no interpolation). This makes failures hard to triage and contradicts the PR goal of “clear error messages”. Capture the exception (as e), log it with context (e.g., logger.exception(...) including both paths), and include the relevant details in the returned error string.

Copilot · 2026-04-11T07:16:52Z

-            msg = f"Chain mismatch while remapping residues for {cif_path} vs {reference_path}"
-            logger.error(msg)
-            return msg
+    for cif_key, ref_key in zip(cif_keys, ref_keys):


Residue remapping no longer verifies that chain IDs match between cif_keys and ref_keys (the mapping ignores ref_key[0]). If chain ordering/IDs differ, this will silently assign the wrong residue numbers. Reinstate a chain-id consistency check (and ideally keep zip(..., strict=True) for defensive correctness) before populating mapping.

Suggested change

for cif_key, ref_key in zip(cif_keys, ref_keys):

for cif_key, ref_key in zip(cif_keys, ref_keys, strict=True):

if cif_key[0] != ref_key[0]:

return "Residue mapping mismatch"

Copilot · 2026-04-11T07:16:53Z


    atom_keys = list(zip(asym_unit.chain_id.tolist(), asym_unit.res_id.tolist()))
-    asym_unit.res_id = np.array([mapping[k] for k in atom_keys], dtype=asym_unit.res_id.dtype)
+    asym_unit.res_id = np.array([mapping[k] for k in atom_keys])


asym_unit.res_id is reassigned without preserving the original dtype. Downstream biotite/atomworks code often expects res_id to keep its integer dtype; use dtype=asym_unit.res_id.dtype (or equivalent) when building the new array to avoid accidental dtype changes.

Suggested change

asym_unit.res_id = np.array([mapping[k] for k in atom_keys])

asym_unit.res_id = np.array(

[mapping[k] for k in atom_keys], dtype=asym_unit.res_id.dtype

)

Copilot · 2026-04-11T07:16:53Z

    template = CIFFile.read(rcsb_path)
-
-    # Write sampleworks trial metadata to the CIF file, if we can find it
-    cif_data = CIFFile.read(cif_path)
-    if "sampleworks" in cif_data.block:
-        template.block["sampleworks"] = cif_data.block["sampleworks"]
-    elif (metadata_path := cif_path.parent / "job_metadata.json").exists():
-        with open(metadata_path, "r") as fp:
-            metadata = json.load(fp)
-        if metadata is not None:
-            add_category_to_cif(template, metadata, "sampleworks")
-        else:
-            logger.warning(f"Sampleworks metadata file at {metadata_path} is empty")
-    else:
-        logger.warning(f"No sampleworks metadata found for {cif_path}")
-
-    # remove any atoms with nan coordinates--these seem to come in because we sometimes use parse
-    # (from AtomWorks) which creates them. Still, we'll do this here just in case.
-    asym_unit = remove_atoms_with_any_nan_coords(asym_unit)
-
-    # make sure entity ids match in atom_site and entity_poly
-    if "entity_poly" in template.block:
-        ep = template.block["entity_poly"]
-        # fixme for now I'm using a hack--if there's one polymer entity, just assert that
-        #   polymers in atom_site have to be that one. Otherwise do nothing.
-        if len(ep["entity_id"]) == 1:
-            entity_id = ep["entity_id"].as_item()
-            if "label_entity_id" not in asym_unit.get_annotation_categories():
-                asym_unit.add_annotation("label_entity_id", int)
-            asym_unit.label_entity_id = np.ones_like(asym_unit.label_entity_id) * int(entity_id)
-    else:
-        logger.warning("No entity_poly block found in template CIF file. Cannot patch entity ids")
-
-    # now set the structure with correct entity ids
    set_structure(template, asym_unit)

-    # If there's a pdbx_poly_seq_scheme, make sure the seq nums all agree, as
-    # the numbers in our outputs will all agree. We appear to use the one called ndb_seq_num
-    nsm = template.block["pdbx_poly_seq_scheme"]["ndb_seq_num"]
-    template.block["pdbx_poly_seq_scheme"]["pdb_seq_num"] = nsm
-    template.block["pdbx_poly_seq_scheme"]["auth_seq_num"] = nsm
+    template.block["atom_site"]["id"] = CIFColumn(np.arange(len(asym_unit)))



template.block["atom_site"]["id"] is set using np.arange(len(asym_unit)), but len(AtomArrayStack) is the number of frames/models, not the number of atoms. This will almost certainly produce an atom_site.id column with the wrong length, breaking CIF consistency and downstream parsing. Generate IDs based on the actual number of atom_site rows after set_structure() (or the atom count), not len(asym_unit).

Copilot · 2026-04-11T07:16:53Z

+            raise ValueError(f"Invalid occupancy value '{val}'. Must be a number.")
+
+        if not (0.0 <= value <= 1.0):
+            raise ValueError(
+                f"Invalid occupancy value {value}. Must be between 0.0 and 1.0."
+            )


This occupancy validation raises ValueError inside patch_individual_cif_file(), which is executed via joblib.Parallel(). A raised exception will abort the whole parallel run rather than being collected into the results list like other errors. To keep “proper error handling”, catch validation errors and return a descriptive error string (and/or log), instead of raising.

Suggested change

raise ValueError(f"Invalid occupancy value '{val}'. Must be a number.")

if not (0.0 <= value <= 1.0):

raise ValueError(

f"Invalid occupancy value {value}. Must be between 0.0 and 1.0."

)

error_message = (

f"Failed to patch {cif_path}: invalid occupancy value '{val}'. "

"Occupancy must be a number."

)

logger.error(error_message)

return error_message

if not (0.0 <= value <= 1.0):

error_message = (

f"Failed to patch {cif_path}: invalid occupancy value {value}. "

"Occupancy must be between 0.0 and 1.0."

)

logger.error(error_message)

return error_message

Copilot · 2026-04-11T07:16:53Z

+
+    #  YOUR FIXED SECTION STARTS HERE
+



The comments # YOUR FIXED SECTION STARTS HERE/ENDS HERE look like leftover scaffolding and add noise to the script. Please remove these markers (and the surrounding extra blank whitespace) to keep the file maintainable.

Copilot · 2026-04-11T07:16:53Z


-    # load the actual PDB, we'll copy the new coordinates and metadata into it.
    template = CIFFile.read(rcsb_path)
-
-    # Write sampleworks trial metadata to the CIF file, if we can find it
-    cif_data = CIFFile.read(cif_path)
-    if "sampleworks" in cif_data.block:
-        template.block["sampleworks"] = cif_data.block["sampleworks"]
-    elif (metadata_path := cif_path.parent / "job_metadata.json").exists():
-        with open(metadata_path, "r") as fp:
-            metadata = json.load(fp)
-        if metadata is not None:
-            add_category_to_cif(template, metadata, "sampleworks")
-        else:
-            logger.warning(f"Sampleworks metadata file at {metadata_path} is empty")
-    else:
-        logger.warning(f"No sampleworks metadata found for {cif_path}")
-
-    # remove any atoms with nan coordinates--these seem to come in because we sometimes use parse
-    # (from AtomWorks) which creates them. Still, we'll do this here just in case.
-    asym_unit = remove_atoms_with_any_nan_coords(asym_unit)
-
-    # make sure entity ids match in atom_site and entity_poly
-    if "entity_poly" in template.block:
-        ep = template.block["entity_poly"]
-        # fixme for now I'm using a hack--if there's one polymer entity, just assert that
-        #   polymers in atom_site have to be that one. Otherwise do nothing.
-        if len(ep["entity_id"]) == 1:
-            entity_id = ep["entity_id"].as_item()
-            if "label_entity_id" not in asym_unit.get_annotation_categories():
-                asym_unit.add_annotation("label_entity_id", int)
-            asym_unit.label_entity_id = np.ones_like(asym_unit.label_entity_id) * int(entity_id)
-    else:
-        logger.warning("No entity_poly block found in template CIF file. Cannot patch entity ids")
-
-    # now set the structure with correct entity ids
    set_structure(template, asym_unit)



After removing the metadata/NaN-handling logic, the imports json, remove_atoms_with_any_nan_coords, and add_category_to_cif are now unused in this file. Ruff is configured to flag F401 unused imports, so this will fail lint/CI; remove the unused imports or reintroduce the functionality that uses them.

Copilot · 2026-04-11T07:16:54Z


-    # load the actual PDB, we'll copy the new coordinates and metadata into it.
    template = CIFFile.read(rcsb_path)
-
-    # Write sampleworks trial metadata to the CIF file, if we can find it
-    cif_data = CIFFile.read(cif_path)
-    if "sampleworks" in cif_data.block:
-        template.block["sampleworks"] = cif_data.block["sampleworks"]
-    elif (metadata_path := cif_path.parent / "job_metadata.json").exists():
-        with open(metadata_path, "r") as fp:
-            metadata = json.load(fp)
-        if metadata is not None:
-            add_category_to_cif(template, metadata, "sampleworks")
-        else:
-            logger.warning(f"Sampleworks metadata file at {metadata_path} is empty")
-    else:
-        logger.warning(f"No sampleworks metadata found for {cif_path}")
-
-    # remove any atoms with nan coordinates--these seem to come in because we sometimes use parse
-    # (from AtomWorks) which creates them. Still, we'll do this here just in case.
-    asym_unit = remove_atoms_with_any_nan_coords(asym_unit)
-
-    # make sure entity ids match in atom_site and entity_poly
-    if "entity_poly" in template.block:
-        ep = template.block["entity_poly"]
-        # fixme for now I'm using a hack--if there's one polymer entity, just assert that
-        #   polymers in atom_site have to be that one. Otherwise do nothing.
-        if len(ep["entity_id"]) == 1:
-            entity_id = ep["entity_id"].as_item()
-            if "label_entity_id" not in asym_unit.get_annotation_categories():
-                asym_unit.add_annotation("label_entity_id", int)
-            asym_unit.label_entity_id = np.ones_like(asym_unit.label_entity_id) * int(entity_id)
-    else:
-        logger.warning("No entity_poly block found in template CIF file. Cannot patch entity ids")
-
-    # now set the structure with correct entity ids
    set_structure(template, asym_unit)

-    # If there's a pdbx_poly_seq_scheme, make sure the seq nums all agree, as
-    # the numbers in our outputs will all agree. We appear to use the one called ndb_seq_num
-    nsm = template.block["pdbx_poly_seq_scheme"]["ndb_seq_num"]
-    template.block["pdbx_poly_seq_scheme"]["pdb_seq_num"] = nsm
-    template.block["pdbx_poly_seq_scheme"]["auth_seq_num"] = nsm
+    template.block["atom_site"]["id"] = CIFColumn(np.arange(len(asym_unit)))



The PR description is focused on adding occupancy validation, but this diff also removes substantial CIF patching behavior (e.g., sampleworks metadata propagation, NaN-coordinate filtering, entity-id patching, and pdbx_poly_seq_scheme adjustments). If those removals are intentional, please call them out explicitly in the PR description and/or split into a separate PR, since they change the script’s output semantics beyond occupancy validation.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/patch_output_cif_files.py`:
- Line 22: The functions crawl_dir_by_depth, main, and patch_individual_cif_file
are missing the required NumPy-style docstrings; add concise NumPy-style
docstrings to each function definition that document the parameters (types and
meanings), return values, and explicitly state side effects (filesystem
traversal, network calls, file writes) and failure behavior (exceptions raised
or exit conditions). For crawl_dir_by_depth describe root_dir (str|Path),
target_pattern (str), n_levels (int), and that it returns list[Path]; for
patch_individual_cif_file describe its input path, mutating file I/O and
possible exceptions; for main describe CLI inputs/flags, what it triggers, and
its exit/failure semantics. Ensure each docstring follows NumPy format with
short summary, Parameters, Returns, and Raises sections.
- Around line 115-145: The atom_site.id column is being sized with
len(asym_unit) (model count) which is incorrect after set_structure() because
atom_site rows equal atoms across all models; replace the sizing logic for
template.block["atom_site"]["id"] (and any subsequent uses like occupancy and
B_iso_or_equiv) to use the actual atom_site row count, e.g.
len(template.block["atom_site"]["Cartn_x"]) or another existing atom_site
column, so that template.block["atom_site"]["id"],
template.block["atom_site"]["occupancy"], and
template.block["atom_site"]["B_iso_or_equiv"] are created with the correct
number of rows after ensure_atom_array_stack()/set_structure().
- Around line 128-139: The occupancy validation currently raises ValueError
inside patch_individual_cif_file (loop over occupancy_values/block "atom_site"
occupancy), which breaks the function's contract of returning None or an error
string; change the exception-raising to return descriptive error strings (e.g.,
"Invalid occupancy value 'X'. Must be a number." and "Invalid occupancy value Y.
Must be between 0.0 and 1.0.") so patch_individual_cif_file returns errors
instead of throwing, preserving how main() and joblib collect and print
failures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56d9c04c-6d94-49aa-8ba7-4d6355888516

📥 Commits

Reviewing files that changed from the base of the PR and between 441c81d and 5f567a4.

📒 Files selected for processing (1)

scripts/patch_output_cif_files.py

coderabbitai · 2026-04-11T07:21:17Z

-      - n_levels = 1: root_dir + its immediate subdirectories
-      - etc.
-    """
+def crawl_dir_by_depth(root_dir: str | Path, target_pattern: str, n_levels: int) -> list[Path]:


⚠️ Potential issue | 🟡 Minor

Restore the required NumPy-style docstrings on these changed helpers.

crawl_dir_by_depth(), main(), and patch_individual_cif_file() now hide traversal, network, and filesystem side effects behind undocumented entry points. Please add back concise NumPy-style docstrings so the call contract and failure behavior stay discoverable at the definition site.

As per coding guidelines "Always include NumPy-style docstrings for every function and class."

Also applies to: 59-59, 77-77

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/patch_output_cif_files.py` at line 22, The functions crawl_dir_by_depth, main, and patch_individual_cif_file are missing the required NumPy-style docstrings; add concise NumPy-style docstrings to each function definition that document the parameters (types and meanings), return values, and explicitly state side effects (filesystem traversal, network calls, file writes) and failure behavior (exceptions raised or exit conditions). For crawl_dir_by_depth describe root_dir (str|Path), target_pattern (str), n_levels (int), and that it returns list[Path]; for patch_individual_cif_file describe its input path, mutating file I/O and possible exceptions; for main describe CLI inputs/flags, what it triggers, and its exit/failure semantics. Ensure each docstring follows NumPy format with short summary, Parameters, Returns, and Raises sections.

ManjunathByadagi · 2026-04-11T10:18:35Z

Thanks for the review! I've addressed the comments and updated the PR. Please let me know if anything else is needed.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

scripts/patch_output_cif_files.py (1)
21-21: 🛠️ Refactor suggestion | 🟠 Major

Restore required NumPy-style docstrings on changed functions.

crawl_dir_by_depth(), parse_args(), main(), and patch_individual_cif_file() are missing required NumPy-style docstrings, including side effects/failure behavior.

As per coding guidelines "Always include NumPy-style docstrings for every function and class."

Also applies to: 45-45, 88-88, 107-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/patch_output_cif_files.py` at line 21, Restore NumPy-style docstrings
for the functions crawl_dir_by_depth, parse_args, main, and
patch_individual_cif_file: add a short summary, Parameters section with types
for each argument, Returns section, and a Notes/Side effects or Raises section
describing IO side effects and failure behavior (e.g., file writes, exceptions
raised on parse or patch failures). Ensure the docstrings follow NumPy
formatting (triple-quoted, sections titled "Parameters", "Returns", and "Raises"
or "Notes") and mention any filesystem modifications or error conditions the
function can produce so reviewers and callers know expected side effects.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/patch_output_cif_files.py`:
- Around line 125-127: The except handlers currently return inconsistent and
sometimes empty str(e) values; update each except block that returns str(e) (the
ones referencing cif_file and reference_path) to return a consistent, actionable
message that includes the file context and the exception (e.g. return f"Error
reading {cif_file} or {reference_path}: {repr(e)}"), and do the same for the
other error-returning excepts at the same function so all failures (including
the handlers at the other locations noted) return a uniform, informative string
that contains the file names and the exception representation.

---

Duplicate comments:
In `@scripts/patch_output_cif_files.py`:
- Line 21: Restore NumPy-style docstrings for the functions crawl_dir_by_depth,
parse_args, main, and patch_individual_cif_file: add a short summary, Parameters
section with types for each argument, Returns section, and a Notes/Side effects
or Raises section describing IO side effects and failure behavior (e.g., file
writes, exceptions raised on parse or patch failures). Ensure the docstrings
follow NumPy formatting (triple-quoted, sections titled "Parameters", "Returns",
and "Raises" or "Notes") and mention any filesystem modifications or error
conditions the function can produce so reviewers and callers know expected side
effects.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e791f23-39d0-4fa1-81a4-d03f4f599463

📥 Commits

Reviewing files that changed from the base of the PR and between 5f567a4 and ddd4d6e.

📒 Files selected for processing (1)

scripts/patch_output_cif_files.py

coderabbitai · 2026-04-11T10:18:35Z

+    except Exception as e:
+        logger.exception(f"Error reading {cif_file} or {reference_path}: {e}")
+        return str(e)


⚠️ Potential issue | 🟡 Minor

Make returned errors consistently actionable (include file context).

Current returns are inconsistent (str(e) and generic strings), which makes parallel failures hard to triage and can silently drop empty exception messages.

Suggested fix

- except Exception as e: - logger.exception(f"Error reading {cif_file} or {reference_path}: {e}") - return str(e) + except Exception as e: + msg = f"{cif_path}: Error reading {reference_path} ({type(e).__name__}: {e})" + logger.exception(msg) + return msg @@ - return "Missing residue numbers" + return f"{cif_path}: Missing residue numbers" @@ - return "Residue mapping mismatch" + return f"{cif_path}: Residue mapping mismatch" @@ - return "Chain mismatch in residue mapping" + return f"{cif_path}: Chain mismatch in residue mapping"

Also applies to: 130-130, 136-136, 143-143

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/patch_output_cif_files.py` around lines 125 - 127, The except handlers currently return inconsistent and sometimes empty str(e) values; update each except block that returns str(e) (the ones referencing cif_file and reference_path) to return a consistent, actionable message that includes the file context and the exception (e.g. return f"Error reading {cif_file} or {reference_path}: {repr(e)}"), and do the same for the other error-returning excepts at the same function so all failures (including the handlers at the other locations noted) return a uniform, informative string that contains the file names and the exception representation.

k-chrispens · 2026-04-13T17:42:00Z

Hi! I know we haven't really added contribution guidelines yet, but something we plan to require generally is that new contributor PRs are attached to an existing issue (though exceptions will apply for new ModelWrappers, etc.). For example, we haven't included an issue for this because the existing patch script is very hacky and will likely be replaced entirely in the future, so this type of QC adds review burden on us for changes that are likely not to be useful in the long run.

I encourage you to take a look at the existing issues and try out one of those instead. Closing this PR.

Add occupancy validation with proper error handling

5f567a4

Copilot AI review requested due to automatic review settings April 11, 2026 07:12

Copilot started reviewing on behalf of ManjunathByadagi April 11, 2026 07:13 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 11, 2026

View reviewed changes

Fix review comments: CLI args, error handling, validation, mapping

ddd4d6e

coderabbitai Bot reviewed Apr 11, 2026

View reviewed changes

k-chrispens closed this Apr 13, 2026

ManjunathByadagi deleted the fix-occupancy-validation branch April 16, 2026 04:42

-    for cif_key, ref_key in zip(cif_keys, ref_keys):
+    for cif_key, ref_key in zip(cif_keys, ref_keys, strict=True):
+        if cif_key[0] != ref_key[0]:
+            return "Residue mapping mismatch"

-            raise ValueError(f"Invalid occupancy value '{val}'. Must be a number.")
-        if not (0.0 <= value <= 1.0):
-            raise ValueError(
-                f"Invalid occupancy value {value}. Must be between 0.0 and 1.0."
-            )
+            error_message = (
+                f"Failed to patch {cif_path}: invalid occupancy value '{val}'. "
+                "Occupancy must be a number."
+            )
+            logger.error(error_message)
+            return error_message
+        if not (0.0 <= value <= 1.0):
+            error_message = (
+                f"Failed to patch {cif_path}: invalid occupancy value {value}. "
+                "Occupancy must be between 0.0 and 1.0."
+            )
+            logger.error(error_message)
+            return error_message

Conversation

ManjunathByadagi commented Apr 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Why

Example

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ManjunathByadagi commented Apr 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

k-chrispens commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ManjunathByadagi commented Apr 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading