code(eval): refactoring eval scripts, primarily unifying arguments#157
code(eval): refactoring eval scripts, primarily unifying arguments#157marcuscollins merged 4 commits intomainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughCentralizes evaluation parameter setup via a new setup_evaluation_parameters() and parse_eval_args(), updates evaluation scripts to use these, and adds input validation helper check_pose_and_get_bounds() in bond_geometry_eval.py for bond/angle computations. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (script)
participant Setup as grid_search_eval_utils.setup_evaluation_parameters
participant FS as Filesystem (grid_search results)
participant Config as ProteinConfig loader
participant Eval as Evaluation script logic
CLI->>Setup: parse_eval_args(args)
Setup->>FS: scan_grid_search_results(args.grid_search_results_path)
Setup->>Config: ProteinConfig.from_csv(args.protein_configs_csv)
Setup-->>CLI: (all_trials, protein_configs)
CLI->>Eval: run evaluations using all_trials + protein_configs
Eval->>FS: write results to args.grid_search_results_path/*.csv
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
src/sampleworks/eval/grid_search_eval_utils.py (2)
180-188:required=Truewithdefault=Noneis contradictory.When
required=True, argparse will always require the argument, making thedefault=Noneunreachable. Consider removing the default for clarity.🔧 Proposed fix
parser.add_argument( "--grid-search-inputs-path", type=Path, required=True, help="Path to the directory containing the grid search inputs, in particular " "the protein configuration CSV file, maps, and reference structures.", - default=None, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/eval/grid_search_eval_utils.py` around lines 180 - 188, The parser.add_argument call for "--grid-search-inputs-path" currently sets required=True while also specifying default=None, which is contradictory; remove the default=None (or set required=False if you want a fallback) so the argparse behavior is consistent—update the parser.add_argument invocation for "--grid-search-inputs-path" (in the grid_search_eval_utils parser setup) to drop the default parameter when keeping required=True.
217-241: Missing NumPy-style docstring forsetup_evaluation_parameters.As per coding guidelines, all functions require NumPy-style docstrings. This new public function should document its parameters and return value.
📝 Proposed docstring
def setup_evaluation_parameters( args: argparse.Namespace ) -> tuple[TrialList, dict[str, ProteinConfig]]: + """ + Set up evaluation parameters by loading protein configs and scanning for trials. + + Parameters + ---------- + args : argparse.Namespace + Parsed command-line arguments containing grid_search_results_path, + grid_search_inputs_path, protein_configs_csv, and target_filename. + + Returns + ------- + tuple[TrialList, dict[str, ProteinConfig]] + A tuple containing the list of discovered trials and a dictionary + mapping protein names to their configurations. + + Raises + ------ + SystemExit + If no experiments are found in the grid search directory. + """ grid_search_dir = Path(args.grid_search_results_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/eval/grid_search_eval_utils.py` around lines 217 - 241, Add a NumPy-style docstring to the public function setup_evaluation_parameters that documents the parameters (args: argparse.Namespace with expected attributes grid_search_results_path, grid_search_inputs_path, protein_configs_csv, target_filename) and the return value (tuple[TrialList, dict[str, ProteinConfig]]), briefly describing what the function does (loads protein configs, scans grid search results, summarizes trials, and may exit on no trials). Place the docstring immediately below the def setup_evaluation_parameters(...) line and include sections: Parameters, Returns, and Raises/System exit behavior.scripts/eval/bond_geometry_eval.py (2)
42-44: Remove unused exception variablee.The exception is caught but never used. Replace with underscore to indicate intentional discard.
🔧 Proposed fix
try: bounds = check_pose_and_get_bounds(pose) - except (ValueError, BadStructureError) as e: + except (ValueError, BadStructureError): return np.nan, pd.DataFrame()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/bond_geometry_eval.py` around lines 42 - 44, The except block catching ValueError and BadStructureError declares an unused exception variable `e`; update the handler in the try/except around the call to check_pose_and_get_bounds(pose) to discard the exception by replacing `except (ValueError, BadStructureError) as e:` with either `except (ValueError, BadStructureError) as _:` or simply `except (ValueError, BadStructureError):` so the unused variable is removed.
90-103: Missing NumPy-style docstring forcheck_pose_and_get_bounds.As per coding guidelines, all functions require NumPy-style docstrings.
📝 Proposed docstring
def check_pose_and_get_bounds(pose: AtomArray): + """ + Validate structure and retrieve distance bounds. + + Parameters + ---------- + pose : AtomArray + The structure to validate and compute bounds for. + + Returns + ------- + np.ndarray + Distance bounds matrix from RDKit via peppr. + + Raises + ------ + ValueError + If the structure is empty or has no bonds. + BadStructureError + If RDKit cannot compute bounds for the structure. + """ if pose.array_length() == 0:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval/bond_geometry_eval.py` around lines 90 - 103, The function check_pose_and_get_bounds is missing a NumPy-style docstring; add a docstring immediately above the function definition that briefly describes what the function does, documents the pose parameter (type AtomArray), the return value (bounds from get_distance_bounds and its type), and the exceptions it may raise (ValueError for empty structure or missing bonds and any errors propagated from get_distance_bounds, e.g., BadStructureError). Use standard NumPy-style sections: Parameters, Returns, Raises, and a short one-line summary followed by a short description, referencing check_pose_and_get_bounds and get_distance_bounds so readers know where bounds come from.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/eval/lddt_evaluation_script.py`:
- Line 197: The code assigns the return of setup_evaluation_parameters(args) to
all_experiments but later iterates over all_trials (which is undefined), causing
a NameError; change the variable usage to be consistent by either renaming the
assignment to all_trials or updating the loop to iterate over all_experiments
and ensure setup_evaluation_parameters returns the expected tuple (all_trials,
protein_configs) if needed—verify and update the call sites that use
all_experiments/all_trials (e.g., the assignment line with
setup_evaluation_parameters and the loop at line where all_trials is used) so
the same identifier is used everywhere.
In `@scripts/eval/rscc_grid_search_script.py`:
- Line 272: The script writes rscc_results.csv to the wrong path: change the
df.to_csv call that currently uses args.grid_search_inputs_path to use
args.grid_search_results_path instead so results are saved alongside other
evaluation outputs; locate the df.to_csv(...) invocation in
rscc_grid_search_script.py and replace the path argument from
args.grid_search_inputs_path / "rscc_results.csv" to
args.grid_search_results_path / "rscc_results.csv".
---
Nitpick comments:
In `@scripts/eval/bond_geometry_eval.py`:
- Around line 42-44: The except block catching ValueError and BadStructureError
declares an unused exception variable `e`; update the handler in the try/except
around the call to check_pose_and_get_bounds(pose) to discard the exception by
replacing `except (ValueError, BadStructureError) as e:` with either `except
(ValueError, BadStructureError) as _:` or simply `except (ValueError,
BadStructureError):` so the unused variable is removed.
- Around line 90-103: The function check_pose_and_get_bounds is missing a
NumPy-style docstring; add a docstring immediately above the function definition
that briefly describes what the function does, documents the pose parameter
(type AtomArray), the return value (bounds from get_distance_bounds and its
type), and the exceptions it may raise (ValueError for empty structure or
missing bonds and any errors propagated from get_distance_bounds, e.g.,
BadStructureError). Use standard NumPy-style sections: Parameters, Returns,
Raises, and a short one-line summary followed by a short description,
referencing check_pose_and_get_bounds and get_distance_bounds so readers know
where bounds come from.
In `@src/sampleworks/eval/grid_search_eval_utils.py`:
- Around line 180-188: The parser.add_argument call for
"--grid-search-inputs-path" currently sets required=True while also specifying
default=None, which is contradictory; remove the default=None (or set
required=False if you want a fallback) so the argparse behavior is
consistent—update the parser.add_argument invocation for
"--grid-search-inputs-path" (in the grid_search_eval_utils parser setup) to drop
the default parameter when keeping required=True.
- Around line 217-241: Add a NumPy-style docstring to the public function
setup_evaluation_parameters that documents the parameters (args:
argparse.Namespace with expected attributes grid_search_results_path,
grid_search_inputs_path, protein_configs_csv, target_filename) and the return
value (tuple[TrialList, dict[str, ProteinConfig]]), briefly describing what the
function does (loads protein configs, scans grid search results, summarizes
trials, and may exit on no trials). Place the docstring immediately below the
def setup_evaluation_parameters(...) line and include sections: Parameters,
Returns, and Raises/System exit behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b386793d-a15d-4605-a2a4-a9cd18b74e8d
📒 Files selected for processing (5)
scripts/eval/bond_geometry_eval.pyscripts/eval/lddt_evaluation_script.pyscripts/eval/rscc_grid_search_script.pyscripts/eval/run_and_process_phenix_clashscore.pysrc/sampleworks/eval/grid_search_eval_utils.py
| grid_search_dir = workspace_root / "grid_search_results" | ||
| all_trials = scan_grid_search_results(grid_search_dir, target_filename=args.target_filename) | ||
| logger.info(f"Found {len(all_trials)} trials with {args.target_filename} files") | ||
| # The dropped variable is a list of ProteinConfigs, not used yet in this script |
There was a problem hiding this comment.
No, this is separate. This doesn't use the ProteinConfigs because it analyzes the whole protein, and doesn't break out individual selections. If we think it would be useful to look at clashes for individual selections I believe that's possible, but should be a separate issue.
k-chrispens
left a comment
There was a problem hiding this comment.
Looks good, but coderabbit caught a few things w.r.t. variable naming and output directory naming. Indicates further need for integration tests on this 😅
# Conflicts: # scripts/eval/rscc_grid_search_script.py
…xperiments->trials in eval setup methods
9c05966 to
9894f70
Compare
Refactoring and unifying arguments to evaluation scripts.
#110
Summary by CodeRabbit
Refactor
Bug Fixes