code(eval): refactoring eval scripts, primarily unifying arguments by marcuscollins · Pull Request #157 · diff-use/sampleworks

marcuscollins · 2026-03-12T16:25:59Z

Refactoring and unifying arguments to evaluation scripts.

Summary by CodeRabbit

Refactor
- Consolidated evaluation setup and standardized argument parsing across evaluation scripts; switched to a centralized parameter-driven workflow and configurable results path.
- Simplified trial/config loading and removed legacy on-disk scanning logic; added clearer runtime logging.
Bug Fixes
- Improved validation and error handling for bond geometry calculations to better detect and report invalid structures.

coderabbitai · 2026-03-12T16:26:21Z

Warning

Rate limit exceeded

@marcuscollins has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e578087f-0a75-47b9-b04e-47b0869ed50c

📥 Commits

Reviewing files that changed from the base of the PR and between 9c05966 and 9894f70.

📒 Files selected for processing (5)

scripts/eval/bond_geometry_eval.py
scripts/eval/lddt_evaluation_script.py
scripts/eval/rscc_grid_search_script.py
scripts/eval/run_and_process_phenix_clashscore.py
src/sampleworks/eval/grid_search_eval_utils.py

📝 Walkthrough

Walkthrough

Centralizes evaluation parameter setup via a new setup_evaluation_parameters() and parse_eval_args(), updates evaluation scripts to use these, and adds input validation helper check_pose_and_get_bounds() in bond_geometry_eval.py for bond/angle computations.

Changes

Cohort / File(s)	Summary
Core utility `src/sampleworks/eval/grid_search_eval_utils.py`	Adds `setup_evaluation_parameters(args)` that returns `(TrialList, dict[str, ProteinConfig])`, renames `parse_args()` → `parse_eval_args()`, replaces `workspace_root` with `grid_search_results_path`, and exposes `ProteinConfig` in imports/exports.
Bond geometry evaluation `scripts/eval/bond_geometry_eval.py`	Adds `check_pose_and_get_bounds(pose: AtomArray)` to validate non-empty structures and presence of bonds; refactors `bond_length_violations` and `bond_angle_violations` to use the helper and unified exception handling.
LDDT evaluation `scripts/eval/lddt_evaluation_script.py`	Replaces `parse_args`/scan-based setup with `parse_eval_args`/`setup_evaluation_parameters`, obtains trials via setup function, and writes results to `args.grid_search_results_path`.
RSCC evaluation `scripts/eval/rscc_grid_search_script.py`	Switches to `parse_eval_args` and `setup_evaluation_parameters` for trial and config loading; removes manual ProteinConfig scanning and grid_search_dir-derived paths.
Clashscore processing `scripts/eval/run_and_process_phenix_clashscore.py`	Replaces local CLI parsing with `parse_eval_args` and uses `setup_evaluation_parameters` to fetch trials; output path uses `args.grid_search_results_path`.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (script)
    participant Setup as grid_search_eval_utils.setup_evaluation_parameters
    participant FS as Filesystem (grid_search results)
    participant Config as ProteinConfig loader
    participant Eval as Evaluation script logic

    CLI->>Setup: parse_eval_args(args)
    Setup->>FS: scan_grid_search_results(args.grid_search_results_path)
    Setup->>Config: ProteinConfig.from_csv(args.protein_configs_csv)
    Setup-->>CLI: (all_trials, protein_configs)
    CLI->>Eval: run evaluations using all_trials + protein_configs
    Eval->>FS: write results to args.grid_search_results_path/*.csv

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Adding new evaluation scripts and moving eval scripts to a new home. #106 — Overlapping edits to evaluation scripts and CLI/setup code; likely touches the same entrypoints and argument flow.
Remove phenix clashscore script and add occupancy/density utility improvements #112 — Modifies grid_search_eval_utils.py CLI parsing and setup behavior similar to this change.
Adding a script to compute bond length and angle outliers using peppr #89 — Related to bond length/angle violation logic and refactors in bond_geometry evaluation.

Suggested reviewers

k-chrispens

Poem

🐰
I hopped through args and trials today,
Gathered configs in a tidy array.
One helper to rule the scattered script sea,
Now evaluations run coordinated and free.
Hops and carrots for CI—hip, hip, hooray! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary focus of the changeset: refactoring evaluation scripts with unified argument handling across multiple eval script files and their shared utilities.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat-refactor-eval-scripts

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

src/sampleworks/eval/grid_search_eval_utils.py (2)

180-188: required=True with default=None is contradictory.

When required=True, argparse will always require the argument, making the default=None unreachable. Consider removing the default for clarity.

🔧 Proposed fix

     parser.add_argument(
         "--grid-search-inputs-path",
         type=Path,
         required=True,
         help="Path to the directory containing the grid search inputs, in particular "
              "the protein configuration CSV file, maps, and reference structures.",
-        default=None,
     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/sampleworks/eval/grid_search_eval_utils.py` around lines 180 - 188, The
parser.add_argument call for "--grid-search-inputs-path" currently sets
required=True while also specifying default=None, which is contradictory; remove
the default=None (or set required=False if you want a fallback) so the argparse
behavior is consistent—update the parser.add_argument invocation for
"--grid-search-inputs-path" (in the grid_search_eval_utils parser setup) to drop
the default parameter when keeping required=True.

217-241: Missing NumPy-style docstring for setup_evaluation_parameters.

As per coding guidelines, all functions require NumPy-style docstrings. This new public function should document its parameters and return value.

📝 Proposed docstring

 def setup_evaluation_parameters(
         args: argparse.Namespace
 ) -> tuple[TrialList, dict[str, ProteinConfig]]:
+    """
+    Set up evaluation parameters by loading protein configs and scanning for trials.
+
+    Parameters
+    ----------
+    args : argparse.Namespace
+        Parsed command-line arguments containing grid_search_results_path,
+        grid_search_inputs_path, protein_configs_csv, and target_filename.
+
+    Returns
+    -------
+    tuple[TrialList, dict[str, ProteinConfig]]
+        A tuple containing the list of discovered trials and a dictionary
+        mapping protein names to their configurations.
+
+    Raises
+    ------
+    SystemExit
+        If no experiments are found in the grid search directory.
+    """
     grid_search_dir = Path(args.grid_search_results_path)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/sampleworks/eval/grid_search_eval_utils.py` around lines 217 - 241, Add a
NumPy-style docstring to the public function setup_evaluation_parameters that
documents the parameters (args: argparse.Namespace with expected attributes
grid_search_results_path, grid_search_inputs_path, protein_configs_csv,
target_filename) and the return value (tuple[TrialList, dict[str,
ProteinConfig]]), briefly describing what the function does (loads protein
configs, scans grid search results, summarizes trials, and may exit on no
trials). Place the docstring immediately below the def
setup_evaluation_parameters(...) line and include sections: Parameters, Returns,
and Raises/System exit behavior.

scripts/eval/bond_geometry_eval.py (2)

42-44: Remove unused exception variable e.

The exception is caught but never used. Replace with underscore to indicate intentional discard.

🔧 Proposed fix

     try:
         bounds = check_pose_and_get_bounds(pose)
-    except (ValueError, BadStructureError) as e:
+    except (ValueError, BadStructureError):
         return np.nan, pd.DataFrame()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/bond_geometry_eval.py` around lines 42 - 44, The except block
catching ValueError and BadStructureError declares an unused exception variable
`e`; update the handler in the try/except around the call to
check_pose_and_get_bounds(pose) to discard the exception by replacing `except
(ValueError, BadStructureError) as e:` with either `except (ValueError,
BadStructureError) as _:` or simply `except (ValueError, BadStructureError):` so
the unused variable is removed.

90-103: Missing NumPy-style docstring for check_pose_and_get_bounds.

As per coding guidelines, all functions require NumPy-style docstrings.

📝 Proposed docstring

 def check_pose_and_get_bounds(pose: AtomArray):
+    """
+    Validate structure and retrieve distance bounds.
+
+    Parameters
+    ----------
+    pose : AtomArray
+        The structure to validate and compute bounds for.
+
+    Returns
+    -------
+    np.ndarray
+        Distance bounds matrix from RDKit via peppr.
+
+    Raises
+    ------
+    ValueError
+        If the structure is empty or has no bonds.
+    BadStructureError
+        If RDKit cannot compute bounds for the structure.
+    """
     if pose.array_length() == 0:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/bond_geometry_eval.py` around lines 90 - 103, The function
check_pose_and_get_bounds is missing a NumPy-style docstring; add a docstring
immediately above the function definition that briefly describes what the
function does, documents the pose parameter (type AtomArray), the return value
(bounds from get_distance_bounds and its type), and the exceptions it may raise
(ValueError for empty structure or missing bonds and any errors propagated from
get_distance_bounds, e.g., BadStructureError). Use standard NumPy-style
sections: Parameters, Returns, Raises, and a short one-line summary followed by
a short description, referencing check_pose_and_get_bounds and
get_distance_bounds so readers know where bounds come from.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/eval/lddt_evaluation_script.py`:
- Line 197: The code assigns the return of setup_evaluation_parameters(args) to
all_experiments but later iterates over all_trials (which is undefined), causing
a NameError; change the variable usage to be consistent by either renaming the
assignment to all_trials or updating the loop to iterate over all_experiments
and ensure setup_evaluation_parameters returns the expected tuple (all_trials,
protein_configs) if needed—verify and update the call sites that use
all_experiments/all_trials (e.g., the assignment line with
setup_evaluation_parameters and the loop at line where all_trials is used) so
the same identifier is used everywhere.

In `@scripts/eval/rscc_grid_search_script.py`:
- Line 272: The script writes rscc_results.csv to the wrong path: change the
df.to_csv call that currently uses args.grid_search_inputs_path to use
args.grid_search_results_path instead so results are saved alongside other
evaluation outputs; locate the df.to_csv(...) invocation in
rscc_grid_search_script.py and replace the path argument from
args.grid_search_inputs_path / "rscc_results.csv" to
args.grid_search_results_path / "rscc_results.csv".

---

Nitpick comments:
In `@scripts/eval/bond_geometry_eval.py`:
- Around line 42-44: The except block catching ValueError and BadStructureError
declares an unused exception variable `e`; update the handler in the try/except
around the call to check_pose_and_get_bounds(pose) to discard the exception by
replacing `except (ValueError, BadStructureError) as e:` with either `except
(ValueError, BadStructureError) as _:` or simply `except (ValueError,
BadStructureError):` so the unused variable is removed.
- Around line 90-103: The function check_pose_and_get_bounds is missing a
NumPy-style docstring; add a docstring immediately above the function definition
that briefly describes what the function does, documents the pose parameter
(type AtomArray), the return value (bounds from get_distance_bounds and its
type), and the exceptions it may raise (ValueError for empty structure or
missing bonds and any errors propagated from get_distance_bounds, e.g.,
BadStructureError). Use standard NumPy-style sections: Parameters, Returns,
Raises, and a short one-line summary followed by a short description,
referencing check_pose_and_get_bounds and get_distance_bounds so readers know
where bounds come from.

In `@src/sampleworks/eval/grid_search_eval_utils.py`:
- Around line 180-188: The parser.add_argument call for
"--grid-search-inputs-path" currently sets required=True while also specifying
default=None, which is contradictory; remove the default=None (or set
required=False if you want a fallback) so the argparse behavior is
consistent—update the parser.add_argument invocation for
"--grid-search-inputs-path" (in the grid_search_eval_utils parser setup) to drop
the default parameter when keeping required=True.
- Around line 217-241: Add a NumPy-style docstring to the public function
setup_evaluation_parameters that documents the parameters (args:
argparse.Namespace with expected attributes grid_search_results_path,
grid_search_inputs_path, protein_configs_csv, target_filename) and the return
value (tuple[TrialList, dict[str, ProteinConfig]]), briefly describing what the
function does (loads protein configs, scans grid search results, summarizes
trials, and may exit on no trials). Place the docstring immediately below the
def setup_evaluation_parameters(...) line and include sections: Parameters,
Returns, and Raises/System exit behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b386793d-a15d-4605-a2a4-a9cd18b74e8d

📥 Commits

Reviewing files that changed from the base of the PR and between 7495131 and dc8763a.

📒 Files selected for processing (5)

scripts/eval/bond_geometry_eval.py
scripts/eval/lddt_evaluation_script.py
scripts/eval/rscc_grid_search_script.py
scripts/eval/run_and_process_phenix_clashscore.py
src/sampleworks/eval/grid_search_eval_utils.py

k-chrispens · 2026-03-13T18:54:23Z

-    grid_search_dir = workspace_root / "grid_search_results"
-    all_trials = scan_grid_search_results(grid_search_dir, target_filename=args.target_filename)
-    logger.info(f"Found {len(all_trials)} trials with {args.target_filename} files")
+    # The dropped variable is a list of ProteinConfigs, not used yet in this script


should we reference #97 ?

No, this is separate. This doesn't use the ProteinConfigs because it analyzes the whole protein, and doesn't break out individual selections. If we think it would be useful to look at clashes for individual selections I believe that's possible, but should be a separate issue.

k-chrispens

Looks good, but coderabbit caught a few things w.r.t. variable naming and output directory naming. Indicates further need for integration tests on this 😅

k-chrispens

LGTM

# Conflicts: # scripts/eval/rscc_grid_search_script.py

…xperiments->trials in eval setup methods

marcuscollins requested review from DorisMai and k-chrispens March 12, 2026 16:26

coderabbitai Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread scripts/eval/lddt_evaluation_script.py Outdated

Comment thread scripts/eval/rscc_grid_search_script.py Outdated

k-chrispens reviewed Mar 13, 2026

View reviewed changes

k-chrispens requested changes Mar 13, 2026

View reviewed changes

Comment thread scripts/eval/rscc_grid_search_script.py Outdated

Comment thread scripts/eval/lddt_evaluation_script.py Outdated

marcuscollins added a commit that referenced this pull request Mar 13, 2026

fix(eval): fixing bugs in paths and variable naming in #157

9c05966

marcuscollins temporarily deployed to gpu-testing March 13, 2026 23:18 — with GitHub Actions Inactive

marcuscollins had a problem deploying to gpu-testing March 13, 2026 23:18 — with GitHub Actions Failure

marcuscollins temporarily deployed to gpu-testing March 13, 2026 23:18 — with GitHub Actions Inactive

marcuscollins requested a review from k-chrispens March 14, 2026 22:26

k-chrispens approved these changes Mar 15, 2026

View reviewed changes

marcuscollins added 4 commits March 17, 2026 12:25

code(eval):unifying eval script args, resolving #110

5e979df

# Conflicts: # scripts/eval/rscc_grid_search_script.py

code(eval): further unifying eval script methods

4048902

code(eval):Minor refactoring of bond geometry eval script; renaming e…

31f1514

…xperiments->trials in eval setup methods

fix(eval): fixing bugs in paths and variable naming in #157

9894f70

marcuscollins force-pushed the feat-refactor-eval-scripts branch from 9c05966 to 9894f70 Compare March 17, 2026 19:28

marcuscollins had a problem deploying to gpu-testing March 17, 2026 19:28 — with GitHub Actions Failure

marcuscollins merged commit 6004513 into main Mar 17, 2026
1 of 4 checks passed

marcuscollins deleted the feat-refactor-eval-scripts branch March 17, 2026 19:28

coderabbitai Bot mentioned this pull request Mar 25, 2026

feat: Completing evaluation suite for v0 and documenting how to run evaluations #186

Merged

Conversation

marcuscollins commented Mar 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

k-chrispens Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

marcuscollins Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

k-chrispens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

k-chrispens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcuscollins commented Mar 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 12, 2026 •

edited

Loading