Skip to content

refactor: centralize input validations#13

Merged
dhalmazna merged 3 commits intomasterfrom
refactor/boundary-validations
Apr 16, 2026
Merged

refactor: centralize input validations#13
dhalmazna merged 3 commits intomasterfrom
refactor/boundary-validations

Conversation

@dhalmazna
Copy link
Copy Markdown
Collaborator

@dhalmazna dhalmazna commented Apr 3, 2026

Context:
This PR refactors input validations across the codebase to follow the "Boundary vs. Core" architectural principle. It ensures we fail-fast on invalid user inputs at the entrypoint, while removing redundant parameter checks deep inside internal algorithms.

What's Changed:

  • CIAOExplainer.explain (Boundary): Added strict input validations at the very beginning of the method (checking file existence, positive integers, device matches, and valid tensor shapes).
  • SearchContext (State): Implemented __post_init__ to guarantee the immutable context state is always mathematically valid before any search begins.
  • Internal Algorithms (Core): Removed redundant pass-through validations from builder.py, region.py, and lookahead.py to keep the core logic clean and DRY.

Related Task:
XAI-29

Summary by CodeRabbit

  • Refactor
    • Input validation has been reorganized across modules to verify image paths, predictor configurations, batch settings, and algorithm parameters at initialization and entry points. This enables earlier detection of configuration errors with consistent and clearer error reporting throughout the processing pipeline.

@dhalmazna dhalmazna self-assigned this Apr 3, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Warning

Rate limit exceeded

@dhalmazna has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 3 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 3 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0608f9ad-c1d8-4175-8d1c-f0a6ce32ba57

📥 Commits

Reviewing files that changed from the base of the PR and between bf9df94 and 1b21e83.

📒 Files selected for processing (2)
  • ciao/algorithm/lookahead.py
  • ciao/explainer/ciao_explainer.py
📝 Walkthrough

Walkthrough

The PR restructures input validation by moving checks from utility functions to dataclass __post_init__ hooks and the public-facing CIAOExplainer.explain method, while removing intermediate validation layers for a cleaner separation of concerns.

Changes

Cohort / File(s) Summary
Dataclass Initialization Validation
ciao/algorithm/context.py, ciao/algorithm/graph.py
Added __post_init__ methods to validate SearchContext and ImageGraph fields at construction time (desired_length, batch_size, optimization_sign, seed_idx, used_segments, and adj_list).
Public API Enhancement
ciao/explainer/ciao_explainer.py
Enhanced CIAOExplainer.explain with comprehensive upfront validations: file existence, predictor type, positive integers (max_hyperpixels, desired_length, batch_size), target_class_idx bounds, input/replacement image shapes, and device consistency. Removed fallback string generation for out-of-bounds class indices.
Utility Function Cleanup
ciao/algorithm/lookahead.py, ciao/scoring/hyperpixel.py, ciao/scoring/segments.py
Removed redundant input validation from build_hyperpixel_greedy_lookahead, calculate_hyperpixel_deltas (via removal of _validate_hyperpixel_inputs helper), select_top_hyperpixels, and compute_segment_deltas to eliminate duplication.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • vojtech-kur
  • vejtek

Poem

🐰 Validation bounds, once scattered wide,
Now gather at the gates with pride,
Dataclasses guard their sacred start,
The API leads with careful heart,
Clean concerns, no checks repeated—
The rabbit's task is now completed! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: centralize input validations' directly and concisely summarizes the main architectural change: moving input validations from scattered locations to centralized boundary entry points.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/boundary-validations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors validation logic by centralizing boundary checks in dataclass __post_init__ methods and the main explain entry point, while removing redundant checks from internal functions. Review feedback identifies a remaining redundant validation in calculate_hyperpixel_deltas and recommends restoring a safety check for class name lookups to prevent potential IndexError when handling auto-detected class indices.

Comment thread ciao/scoring/hyperpixel.py Outdated
Comment thread ciao/explainer/ciao_explainer.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ciao/scoring/hyperpixel.py (1)

133-141: ⚠️ Potential issue | 🟠 Major

Add validation to reject negative max_hyperpixels.

Python slicing with negative integers has special semantics ([:−1] returns "all but last element"). Without explicit validation, passing -1 silently returns an incorrect subset instead of failing. Add a guard before slicing:

if max_hyperpixels < 0:
    raise ValueError(
        f"max_hyperpixels must be non-negative, got {max_hyperpixels}"
    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ciao/scoring/hyperpixel.py` around lines 133 - 141, The function
select_top_hyperpixels currently slices sorted results and will behave
incorrectly for negative max_hyperpixels; add an explicit guard at the start of
select_top_hyperpixels that checks if max_hyperpixels < 0 and raises
ValueError(f"max_hyperpixels must be non-negative, got {max_hyperpixels}"),
ensuring invalid negative inputs fail fast before the sorted(...)[
:max_hyperpixels] slice is applied.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ciao/algorithm/context.py`:
- Around line 24-38: SearchContext.__post_init__ currently only checks value
ranges but allows wrong types (e.g., floats or bools); add explicit type checks
using isinstance for desired_length, batch_size, seed_idx and optimization_sign
(and ensure optimization_sign is an int but not a bool) and raise TypeError with
a clear message when types are wrong, then keep the existing range checks
(desired_length >=1, batch_size >0, optimization_sign in (1, -1), and 0 <=
seed_idx < self.image_graph.num_segments) and the existing duplicate check
against self.used_segments to prevent runtime TypeErrors later.

In `@ciao/explainer/ciao_explainer.py`:
- Around line 83-95: Add explicit type validation at the public API boundary:
check that max_hyperpixels, desired_length, and batch_size are instances of int
(use isinstance(..., int)) before the existing positive-value checks and raise a
TypeError with a clear message if not; likewise, if target_class_idx is not None
validate it is an int before checking bounds against predictor.class_names and
raise TypeError for non-int inputs. Ensure messages include the parameter name
and the received type/value so downstream code (e.g., range(max_hyperpixels) in
algorithm/builder.py) cannot receive floats or other invalid types.
- Around line 90-96: The code currently only validates caller-supplied
target_class_idx against predictor.class_names but misses validating the
resolved index when target_class_idx is None; after the resolution call to
predictor.get_predicted_class(input_batch) (the place that computes
argmax(outputs, dim=1)), add a bounds check that ensures the resulting
target_class_idx is within 0..len(predictor.class_names)-1 and raise a
ValueError if not, and also ensure predictor.class_names is non-empty (raise a
clear error if len(class_names)==0) before any indexing (used later when
accessing class_names[target_class_idx]).

In `@ciao/scoring/hyperpixel.py`:
- Around line 97-102: Validate the batch_size parameter before it's used in the
range(0, num_masks, batch_size) loop: ensure batch_size is an int and > 0,
otherwise raise a ValueError with a clear message (e.g., "batch_size must be a
positive integer"). Add this check near the start of the function (before the
loop that iterates using range with batch_size) alongside the existing
segment_sets validation so floats or non-positive values fail fast.

---

Outside diff comments:
In `@ciao/scoring/hyperpixel.py`:
- Around line 133-141: The function select_top_hyperpixels currently slices
sorted results and will behave incorrectly for negative max_hyperpixels; add an
explicit guard at the start of select_top_hyperpixels that checks if
max_hyperpixels < 0 and raises ValueError(f"max_hyperpixels must be
non-negative, got {max_hyperpixels}"), ensuring invalid negative inputs fail
fast before the sorted(...)[ :max_hyperpixels] slice is applied.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41b01c08-2351-412a-8f2c-dfe992ba0a3e

📥 Commits

Reviewing files that changed from the base of the PR and between 8752ffa and 2709a40.

📒 Files selected for processing (6)
  • ciao/algorithm/context.py
  • ciao/algorithm/graph.py
  • ciao/algorithm/lookahead.py
  • ciao/explainer/ciao_explainer.py
  • ciao/scoring/hyperpixel.py
  • ciao/scoring/segments.py
💤 Files with no reviewable changes (2)
  • ciao/algorithm/lookahead.py
  • ciao/scoring/segments.py

Comment thread ciao/algorithm/context.py
Comment thread ciao/explainer/ciao_explainer.py
Comment thread ciao/explainer/ciao_explainer.py
Comment thread ciao/scoring/hyperpixel.py Outdated
@dhalmazna dhalmazna force-pushed the refactor/boundary-validations branch 5 times, most recently from d0f666e to d0592d6 Compare April 8, 2026 16:11
@dhalmazna dhalmazna force-pushed the refactor/callable-strategies branch from d168c1a to 0451a3f Compare April 12, 2026 11:56
@dhalmazna dhalmazna force-pushed the refactor/boundary-validations branch from d0592d6 to b5ca1e5 Compare April 12, 2026 11:56
@dhalmazna dhalmazna force-pushed the refactor/callable-strategies branch from 0451a3f to 7869eb4 Compare April 13, 2026 04:59
@dhalmazna dhalmazna force-pushed the refactor/boundary-validations branch from b5ca1e5 to b1c0118 Compare April 13, 2026 04:59
@dhalmazna dhalmazna force-pushed the refactor/callable-strategies branch from 7869eb4 to fdc538b Compare April 13, 2026 06:51
@dhalmazna dhalmazna force-pushed the refactor/boundary-validations branch from b1c0118 to 5faee77 Compare April 13, 2026 06:51
@dhalmazna dhalmazna marked this pull request as ready for review April 13, 2026 14:09
Copilot AI review requested due to automatic review settings April 13, 2026 14:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors validation responsibilities to align with a Boundary vs. Core architecture: fail fast at the explainer entrypoint and enforce invariants on immutable context/state objects, while removing redundant checks from deeper algorithm/scoring helpers.

Changes:

  • Added strict boundary validations in CIAOExplainer.explain (path existence, parameter bounds, class index bounds, tensor/device/shape checks).
  • Enforced invariant validation via __post_init__ in SearchContext and ImageGraph.
  • Removed redundant/internal validations from scoring/algorithm helpers (e.g., hyperpixel and lookahead).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ciao/explainer/ciao_explainer.py Adds centralized boundary validations and simplifies class name handling.
ciao/algorithm/context.py Adds SearchContext.__post_init__ to guarantee context validity.
ciao/algorithm/graph.py Adds ImageGraph.__post_init__ to prevent empty graphs.
ciao/algorithm/lookahead.py Removes local parameter validations from the lookahead builder.
ciao/scoring/hyperpixel.py Removes helper-level input validation and relies on upstream checks.
ciao/scoring/segments.py Removes redundant empty-graph validation (now enforced by ImageGraph).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ciao/explainer/ciao_explainer.py Outdated
Comment thread ciao/algorithm/lookahead.py
@dhalmazna dhalmazna force-pushed the refactor/callable-strategies branch from fdc538b to 7332b88 Compare April 15, 2026 10:07
@dhalmazna dhalmazna force-pushed the refactor/boundary-validations branch from 5faee77 to 7c8e95e Compare April 15, 2026 10:07
Base automatically changed from refactor/callable-strategies to master April 15, 2026 12:39
@dhalmazna dhalmazna requested a review from a team April 15, 2026 12:39
@dhalmazna dhalmazna force-pushed the refactor/boundary-validations branch from 7c8e95e to bf9df94 Compare April 15, 2026 12:43
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ciao/algorithm/graph.py (1)

17-20: Strengthen ImageGraph invariants beyond non-empty adj_list.

This validates emptiness, but traversal assumes stronger consistency (segment IDs and neighbor IDs must be within adjacency bounds). Consider enforcing those invariants here to fail fast on malformed graphs instead of failing later during indexing.

Suggested hardening in __post_init__
 def __post_init__(self) -> None:
     if not self.adj_list:
         raise ValueError("ImageGraph cannot be empty (adj_list is empty).")
+    n = len(self.adj_list)
+
+    if self.segments.numel() == 0:
+        raise ValueError("ImageGraph cannot be empty (segments is empty).")
+
+    max_segment_id = int(self.segments.max().item())
+    if max_segment_id >= n:
+        raise ValueError(
+            f"segments contain id {max_segment_id}, but adj_list has only {n} nodes."
+        )
+
+    for node_id, neighbors in enumerate(self.adj_list):
+        if any(neighbor_id < 0 or neighbor_id >= n for neighbor_id in neighbors):
+            raise ValueError(
+                f"adj_list[{node_id}] contains out-of-range neighbor id(s)."
+            )

Based on learnings: in this codebase, keep __post_init__ runtime checks focused on domain-logic validation (valid ranges/values), not explicit isinstance() type enforcement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ciao/algorithm/graph.py` around lines 17 - 20, The __post_init__ for
ImageGraph currently only checks adj_list non-emptiness but should also validate
adjacency consistency: in ImageGraph.__post_init__, iterate each segment index i
and ensure its neighbor IDs in adj_list[i] are integers within
0..len(self.adj_list)-1 (no negatives, no out-of-range IDs) and optionally
ensure neighbors refer to existing segments (i.e., membership in bounds); raise
ValueError with a clear message identifying the offending segment/index when any
invariant fails so malformed graphs fail fast during construction instead of
later in traversal or indexing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ciao/algorithm/graph.py`:
- Around line 17-20: The __post_init__ for ImageGraph currently only checks
adj_list non-emptiness but should also validate adjacency consistency: in
ImageGraph.__post_init__, iterate each segment index i and ensure its neighbor
IDs in adj_list[i] are integers within 0..len(self.adj_list)-1 (no negatives, no
out-of-range IDs) and optionally ensure neighbors refer to existing segments
(i.e., membership in bounds); raise ValueError with a clear message identifying
the offending segment/index when any invariant fails so malformed graphs fail
fast during construction instead of later in traversal or indexing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f2e7a46-4eac-465d-82f1-37f6ca36bada

📥 Commits

Reviewing files that changed from the base of the PR and between 2709a40 and bf9df94.

📒 Files selected for processing (6)
  • ciao/algorithm/context.py
  • ciao/algorithm/graph.py
  • ciao/algorithm/lookahead.py
  • ciao/explainer/ciao_explainer.py
  • ciao/scoring/hyperpixel.py
  • ciao/scoring/segments.py
💤 Files with no reviewable changes (3)
  • ciao/scoring/segments.py
  • ciao/algorithm/lookahead.py
  • ciao/scoring/hyperpixel.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • ciao/algorithm/context.py
  • ciao/explainer/ciao_explainer.py

@dhalmazna dhalmazna merged commit d4f7d6f into master Apr 16, 2026
3 checks passed
@dhalmazna dhalmazna deleted the refactor/boundary-validations branch April 16, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants