Skip to content

refactor(sdk): simplify pass on enhancements layer (main)#42

Merged
Gradata merged 6 commits into
mainfrom
refactor/simplify-sdk-enhancements
Apr 15, 2026
Merged

refactor(sdk): simplify pass on enhancements layer (main)#42
Gradata merged 6 commits into
mainfrom
refactor/simplify-sdk-enhancements

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 14, 2026

Summary

Targeted simplification pass over 10 in-scope sdk/src/gradata/enhancements/ modules. Net -157 LOC (162 insertions, 319 deletions). No behavior change.

Out of scope (untouched — in-flight PRs)

  • behavioral_extractor.py, meta_rules.py, pubsub_pipeline.py, self_improvement.py, rule_evolution.py

What changed

  • llm_provider: extract shared _openai_complete(); remove duplicate openai-SDK boilerplate between OpenAIProvider and GenericHTTPProvider.
  • diff_engine: collapse _extract_changed_sections + _compute_summary_stats into _analyze_line_opcodes() — builds SequenceMatcher once per diff instead of twice.
  • git_backfill: inline _git() helper replaces six copies of subprocess.run(...) invocations; drop redundant except (TimeoutExpired, Exception) and TOCTOU existence check.
  • contradiction_detector: promote inlined stopwords to module-level frozenset; fold three near-identical _check_* functions through a shared _check_pair_list() helper.
  • quality_monitoring: extract _compile_anti_pattern() used by both __init__ and add_pattern.
  • rule_canary: add _now_iso() helper; drop three duplicated datetime-import blocks.
  • rule_integrity: replace try/finally connection close with with sqlite3.connect(...) context manager in three places.
  • carl: simplify has_blocking_violations() — filter already guarantees MUST-only results, so bool() suffices.
  • pattern_extractor: drop dead used.update(cluster) after inner loop already adds all indices.
  • pattern_integration: replace except (ValueError, Exception) with except Exception.

Test plan

  • pytest tests/ — 2070 passed, 23 skipped (same as baseline)
  • ruff check src/gradata/enhancements/ — all checks passed
  • pyright src/gradata/enhancements/ — 0 errors (only pre-existing unrelated warning)

Generated with Gradata

Targeted cleanups across 10 enhancement modules (-157 LOC net).
No behavior change; 2070 tests pass; ruff + pyright clean.

- llm_provider: extract shared _openai_complete() helper; remove
  duplicate openai-SDK boilerplate between OpenAIProvider and
  GenericHTTPProvider.
- diff_engine: collapse _extract_changed_sections + _compute_summary_stats
  into a single _analyze_line_opcodes() — builds SequenceMatcher once
  per diff instead of twice.
- git_backfill: inline _git() helper replaces six copies of
  subprocess.run(...) invocations; drop redundant
  `except (TimeoutExpired, Exception)` and TOCTOU existence check.
- contradiction_detector: promote inlined stopwords to module-level
  frozenset; fold three near-identical _check_* functions through a
  shared _check_pair_list() helper.
- quality_monitoring: extract _compile_anti_pattern() used by both
  __init__ and add_pattern.
- rule_canary: add _now_iso() helper; drop three duplicated
  `from datetime import datetime; now = datetime.now(UTC).isoformat()`
  blocks.
- rule_integrity: use `with sqlite3.connect(...)` context manager
  instead of try/finally close pattern in three places.
- carl: simplify has_blocking_violations() — filter already guarantees
  MUST-only results, so bool() suffices.
- pattern_extractor: drop dead `used.update(cluster)` after inner loop
  already added all indices.
- pattern_integration: replace `except (ValueError, Exception)` (Exception
  already catches ValueError) with `except Exception`.

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Warning

Rate limit exceeded

@Gradata has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 38 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 0 minutes and 38 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e91f8b40-1b60-415c-8718-b6a1bb73795c

📥 Commits

Reviewing files that changed from the base of the PR and between b77ff61 and fe600ad.

📒 Files selected for processing (1)
  • src/gradata/enhancements/contradiction_detector.py
📝 Walkthrough

Walkthrough

A set of internal refactors across enhancements: shared helpers extracted, duplicate logic consolidated, diff and git traversal optimized, pattern clustering behavior changed, database access made context-driven and context-managed, and minor control-flow simplifications applied.

Changes

Cohort / File(s) Summary
Helper extraction & consolidation
src/gradata/enhancements/contradiction_detector.py, src/gradata/enhancements/llm_provider.py, src/gradata/enhancements/quality_monitoring.py, src/gradata/enhancements/rule_canary.py
Moved repeated logic into module-level helpers (_STOPWORDS, _check_pair_list, _openai_complete, _compile_anti_pattern, _now_iso) and updated callers to delegate, reducing inline duplication.
Diff opcode optimization
src/gradata/enhancements/diff_engine.py
Replaced two-pass opcode processing with _analyze_line_opcodes that walks difflib opcodes once to produce changed sections and summary stats.
Git subprocess centralization & backfill
src/gradata/enhancements/git_backfill.py
Added _GitResult dataclass and _git(...) helper; consolidated git invocations (single commit log with date), unified error handling, and centralized skipping logic for missing/identical/oversized diffs.
Boolean/control-flow simplification
src/gradata/enhancements/carl.py
ContractRegistry.has_blocking_violations simplified to return presence of MUST-priority constraints directly via get_prioritized_constraints(..., min_priority=RulePriority.MUST).
Pattern extraction behavior change
src/gradata/enhancements/pattern_extractor.py
Removed used.update(cluster) so indices in a formed cluster are no longer all marked globally used; only inner-loop used.add(j) calls remain, changing cluster composition and output patterns.
Tool registration gating
src/gradata/enhancements/pattern_integration.py
Replaced exception-swallowing registration try/except with explicit presence checks brain.tools.get(...) is None before brain.tools.register(...) for two tools.
DB context & resource management
src/gradata/enhancements/rule_integrity.py, src/gradata/brain.py
Switched several functions to accept ctx: BrainContext (use ctx.db_path) and replaced manual sqlite connect/close with with sqlite3.connect(...) as conn:; updated Brain.patch_rule to pass self.ctx into sign_and_store.
Minor internal refactors
src/gradata/enhancements/contradiction_detector.py (stopwords move)
Moved stopword set to module-level _STOPWORDS and adjusted extraction logic to subtract it rather than building per-call sets.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

refactor

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: a targeted refactoring/simplification pass across the enhancements layer with no behavior change.
Description check ✅ Passed The description comprehensively details all ten modules affected, the specific simplifications made to each, net LOC reduction, and test results confirming no behavior change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/simplify-sdk-enhancements

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 14, 2026

Deploying gradata-dashboard with  Cloudflare Pages  Cloudflare Pages

Latest commit: fe600ad
Status: ✅  Deploy successful!
Preview URL: https://e407d5bb.gradata-dashboard.pages.dev
Branch Preview URL: https://refactor-simplify-sdk-enhanc.gradata-dashboard.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/gradata/enhancements/rule_integrity.py (1)

226-244: ⚠️ Potential issue | 🟠 Major

DB-access functions in this changed surface should accept BrainContext.

store_signature, load_signatures, and _load_signature perform DB access but still operate on raw db_path instead of accepting BrainContext, which violates the core SDK rule for this path.

As per coding guidelines, "src/gradata/**/*.py: ... all functions accepting BrainContext where DB access occurs".

Also applies to: 246-255, 274-289

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/rule_integrity.py` around lines 226 - 244, Change
the DB-access functions to accept a BrainContext instead of raw db_path: update
function signatures for store_signature, load_signatures, and _load_signature to
take a single ctx: BrainContext (or ctx plus other non-DB args), obtain the DB
path from ctx (e.g., ctx.db_path) and pass that into _ensure_table/connect
calls, and update all internal callers to pass the BrainContext; also adjust
_ensure_table or its usages if it currently only accepts a Path so it can accept
the ctx-derived path. Ensure you update any call sites referenced near the
original functions to supply the BrainContext and preserve existing behavior
(category/signature handling and timestamping).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/enhancements/git_backfill.py`:
- Around line 99-100: The current code calls _git for each commit to look up the
date (N+1 subprocesses). Change the initial log call that sets result to request
both hash and ISO date (e.g. "--format=%H %cI" or "%H %ci") using the existing
_git(["log", f"--since={since_date}", "--format=...", f"-{max_commits}"], ...)
call, then parse result into (commit, date) pairs and remove the per-commit _git
date lookup (the code that separately calls _git for each commit hash); keep
using since_date, max_commits and _git but derive commit_date from the parsed
log output instead of additional subprocesses.
- Around line 88-105: The helper _git currently swallows
TimeoutExpired/FileNotFoundError/OSError and returns None, losing root-cause
info; modify _git to return both the subprocess.CompletedProcess (on success)
and, on failure, an object or tuple that includes the caught exception (e.g.,
return (None, exc) or a simple dataclass with error and message) instead of
plain None, then update the caller (the result assignment and the subsequent
checks around result, result.returncode and the _log.warning calls) to log the
captured exception details (exc) along with the repo_path and stderr so the
warning for "git not available" includes the original exception type and
message; reference symbols: _git, result, repo_path, and _log.warning.

In `@src/gradata/enhancements/quality_monitoring.py`:
- Around line 142-145: The add_pattern method currently appends to
self._patterns before calling _compile_anti_pattern, risking partial state if
compilation raises; change add_pattern to first call
_compile_anti_pattern(pattern) and store the result in a local variable, and
only if compilation succeeds append the pattern to self._patterns and the tuple
(pattern, compiled) to self._compiled so both lists are updated atomically;
refer to add_pattern, _compile_anti_pattern, self._patterns and self._compiled
when making this change.

---

Outside diff comments:
In `@src/gradata/enhancements/rule_integrity.py`:
- Around line 226-244: Change the DB-access functions to accept a BrainContext
instead of raw db_path: update function signatures for store_signature,
load_signatures, and _load_signature to take a single ctx: BrainContext (or ctx
plus other non-DB args), obtain the DB path from ctx (e.g., ctx.db_path) and
pass that into _ensure_table/connect calls, and update all internal callers to
pass the BrainContext; also adjust _ensure_table or its usages if it currently
only accepts a Path so it can accept the ctx-derived path. Ensure you update any
call sites referenced near the original functions to supply the BrainContext and
preserve existing behavior (category/signature handling and timestamping).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c08f56ed-29c4-4228-82b3-d4a3644176a0

📥 Commits

Reviewing files that changed from the base of the PR and between ce1ad3c and d01aec0.

📒 Files selected for processing (10)
  • src/gradata/enhancements/carl.py
  • src/gradata/enhancements/contradiction_detector.py
  • src/gradata/enhancements/diff_engine.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/enhancements/llm_provider.py
  • src/gradata/enhancements/pattern_extractor.py
  • src/gradata/enhancements/pattern_integration.py
  • src/gradata/enhancements/quality_monitoring.py
  • src/gradata/enhancements/rule_canary.py
  • src/gradata/enhancements/rule_integrity.py
💤 Files with no reviewable changes (1)
  • src/gradata/enhancements/pattern_extractor.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Python 3.12
  • GitHub Check: Python 3.11
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/enhancements/carl.py
  • src/gradata/enhancements/rule_integrity.py
  • src/gradata/enhancements/quality_monitoring.py
  • src/gradata/enhancements/pattern_integration.py
  • src/gradata/enhancements/diff_engine.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/enhancements/llm_provider.py
  • src/gradata/enhancements/contradiction_detector.py
  • src/gradata/enhancements/rule_canary.py
🔇 Additional comments (14)
src/gradata/enhancements/carl.py (1)

216-216: Behavior-preserving simplification looks correct.

Line 216 correctly checks for existence of applicable MUST constraints via truthiness of the filtered list.

src/gradata/enhancements/quality_monitoring.py (1)

101-105: Good helper extraction and consistent compilation path.

This reduces duplicate regex setup and keeps constructor behavior aligned with add_pattern.

Also applies to: 116-118

src/gradata/enhancements/llm_provider.py (3)

60-87: Good extraction of shared OpenAI-compatible completion flow.

This helper preserves the provider contract (return None on failure, no exception propagation) while removing duplicated SDK boilerplate.


100-103: Delegation in OpenAIProvider.complete looks correct.

Argument mapping is clear and keeps the public method behavior stable after refactor.


122-126: Generic provider delegation is clean and behaviorally consistent.

Keeping the local placeholder key and forwarding endpoint/model settings preserves compatibility with local OpenAI-style backends.

src/gradata/enhancements/diff_engine.py (2)

148-199: Single-pass opcode analysis looks correct and improves efficiency.

The merged _analyze_line_opcodes(...) keeps section extraction and line-stat aggregation consistent while avoiding a second matcher traversal.


282-282: Call-site update is clean and aligned with the refactor.

compute_diff(...) now correctly uses the tuple return from _analyze_line_opcodes(...), preserving behavior while removing duplicate work.

src/gradata/enhancements/pattern_integration.py (1)

421-423: Redundant exception tuple removal is correct.

At Line 421 and Line 433, except Exception already captures ValueError, so this simplification is behavior-preserving and clearer.

Also applies to: 433-434

src/gradata/enhancements/contradiction_detector.py (3)

97-113: Good stopword extraction refactor with lower allocation overhead.

Moving stopwords to a module-level frozenset and reusing it in Line 112 is clean and keeps behavior intact while avoiding per-call set creation.

As per coding guidelines, "src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print() statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0]."


115-127: Shared pair-matching helper is a solid deduplication improvement.

_check_pair_list() cleanly centralizes repeated contradiction-pair scanning without altering matching semantics.


132-132: Delegated check functions are concise and preserve confidence bounds.

The wrappers now read clearly and keep confidence outputs within valid range (0.9, 0.85, 0.7, else 0.0).

As per coding guidelines, "Confidence values must be in [0.0, 1.0]."

Also applies to: 137-137, 142-142

src/gradata/enhancements/rule_canary.py (2)

27-30: Good refactor: centralized UTC timestamp generation.

Using _now_iso() (Line 27) and reusing it at Lines 109, 210, and 244 removes duplication and keeps audit timestamp formatting consistent.

Also applies to: 109-110, 210-211, 244-245


20-20: No action needed — datetime.UTC is compatible with the project's minimum Python version.

The project explicitly requires Python 3.11+ (requires-python = ">=3.11" in pyproject.toml). datetime.UTC is available in Python 3.11, so the import on line 20 is safe and requires no changes.

			> Likely an incorrect or invalid review comment.
src/gradata/enhancements/rule_integrity.py (1)

237-244: ⚠️ Potential issue | 🟠 Major

Apply contextlib.closing() to prevent sqlite3 connection leaks and refactor functions to accept BrainContext per coding guideline.

The sqlite3 context manager (lines 237, 253, 283) handles transaction boundaries but does not close connections. This causes connection leaks on repeated calls.

Additionally, these DB-accessing functions (store_signature, load_signatures, _load_signature) violate the coding guideline requiring BrainContext for all functions that access the database. Currently they accept db_path: Path directly.

🔧 Proposed fix

Add import at file top:

from contextlib import closing

Then apply closing() to all three locations:

-    with sqlite3.connect(str(db_path)) as conn:
+    with closing(sqlite3.connect(str(db_path))) as conn:
         conn.execute(

Refactor functions to accept BrainContext instead of db_path: Path to comply with SDK guidelines and enable proper resource management through context.

Applies to lines 237–244, 253–255, 283–289.

			> Likely an incorrect or invalid review comment.

Comment thread src/gradata/enhancements/git_backfill.py Outdated
Comment thread src/gradata/enhancements/git_backfill.py Outdated
Comment thread src/gradata/enhancements/quality_monitoring.py
- git_backfill.scan_git_diffs: fold per-commit `git log -1` calls into the
  initial `git log --format=%H %cI` pass so we issue O(1) subprocesses for
  date lookups instead of N+1.
- git_backfill._git: replace bare None-on-failure with a tiny `_GitResult`
  dataclass that carries the caught exception. Caller log lines now include
  the exception type, message, and repo path so root cause survives.
- quality_monitoring.PatternChecker.add_pattern: compile the pattern first
  and only mutate `_patterns`/`_compiled` after compilation succeeds, so a
  bad regex no longer leaves the checker in partial state.
- rule_integrity: migrate `store_signature`, `load_signatures`, `_load_signature`,
  `sign_and_store`, `verify_from_db`, and `_ensure_table` to accept
  `BrainContext` instead of raw `db_path`, per the SDK rule that DB-access
  helpers must be ctx-scoped. Sole external caller (`brain.patch_rule`)
  updated to pass `self.ctx`.

Tests: 121/121 enhancements suite + 31/31 brain suite green. Lint clean.

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 14, 2026

Addressed all 3 inline + 1 outside-diff CR actionables in a1181bc:

  • git_backfill: N+1 subprocess fixed (single git log --format=%H %cI pass)
  • _git: returns _GitResult dataclass; caller logs preserve exception type+message
  • quality_monitoring.add_pattern: compile-first/append-after for atomic add
  • rule_integrity: full BrainContext migration (store_signature, load_signatures, _load_signature, sign_and_store, verify_from_db, _ensure_table); sole external caller brain.patch_rule updated to pass self.ctx

Tests: 121/121 enhancements + 31/31 brain green. Ruff clean. @coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

@Gradata, thanks for the thorough follow-through on all four items — the changes look well-scoped:

  • git_backfill N+1 fix — collapsing the commit-hash + timestamp into a single git log --format=%H %cI pass is the right call; it also removes a class of timing skew between the two previously separate lookups.
  • _GitResult dataclass — a typed return beats a raw None/string union and makes caller-side error logging straightforward without losing exception fidelity.
  • add_pattern atomicity — compile-first, append-after is the correct ordering; it guarantees the internal list is never left with a half-initialized entry on a compile error.
  • rule_integrity BrainContext migration — propagating ctx all the way through sign_and_store, verify_from_db, and _ensure_table (plus updating brain.patch_rule) closes the dependency inversion properly rather than leaving it half-done.

Green test matrix (121 + 31) and a clean Ruff pass are a good signal. Kicking off the full review now.

✅ Actions performed

Full review triggered.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 14, 2026

Confirmed CR feedback already addressed in commit a1181bc (pushed). Verification pass:

  • git_backfill._git root-cause preservation_GitResult dataclass carries caught exception; warnings now log exc type, message, repo path
  • git_backfill N+1 subprocess — folded per-commit date lookup into initial git log --format=%H %cI pass, parsed into (hash, date) tuples
  • quality_monitoring.add_pattern atomicity — compile first into local compiled, only then mutate _patterns/_compiled
  • rule_integrity BrainContext migrationstore_signature, load_signatures, _load_signature, sign_and_store, verify_from_db, _ensure_table all accept BrainContext; sole caller brain.patch_rule updated

Tests: pytest tests/ -x -q2070 passed, 23 skipped, 0 failures.

No behavior change — simplify intent preserved.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 14, 2026

@coderabbitai review — fixes pushed, please re-verify.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

@Gradata, acknowledged — kicking off a full re-verification pass now against the latest commit. I'll go through all four addressed items (the _GitResult dataclass with exception fidelity, the collapsed N+1 git log pass, add_pattern compile-first atomicity, and the full BrainContext propagation through rule_integrity) as well as the rest of the diff surface.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/gradata/enhancements/pattern_integration.py (1)

421-434: ⚠️ Potential issue | 🟠 Major

Do not swallow registration failures with misleading exception handlers.

At lines 421 and 433, except Exception masks real failures (e.g., constructor/parameter errors, missing brain.tools object). The ToolRegistry.register() method in src/gradata/contrib/patterns/tools.py (lines 99-101) silently overwrites entries via dict assignment—it does not raise exceptions for duplicates. The "Already registered" comments are misleading and give false confidence that exception handling is appropriate.

Proposed fix
-    try:
-        brain.tools.register(ToolSpec(
-            name="brain_correct",
-            description="Log a user correction for the learning pipeline",
-            category="learning",
-            parameters={"draft": "str", "final": "str"},
-        ))
-        count += 1
-    except Exception:
-        pass  # Already registered
+    brain.tools.register(ToolSpec(
+        name="brain_correct",
+        description="Log a user correction for the learning pipeline",
+        category="learning",
+        parameters={"draft": "str", "final": "str"},
+    ))
+    count += 1

-    try:
-        brain.tools.register(ToolSpec(
-            name="brain_apply_rules",
-            description="Get graduated rules for prompt injection",
-            category="learning",
-            parameters={"task": "str"},
-            returns="str",
-        ))
-        count += 1
-    except Exception:
-        pass  # Already registered
+    brain.tools.register(ToolSpec(
+        name="brain_apply_rules",
+        description="Get graduated rules for prompt injection",
+        category="learning",
+        parameters={"task": "str"},
+        returns="str",
+    ))
+    count += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/pattern_integration.py` around lines 421 - 434, The
current broad except blocks around brain.tools.register(ToolSpec(...)) silently
hide real errors and misleadingly claim "Already registered"; instead, stop
swallowing Exception: either check for existing entries on the registry before
calling brain.tools.register (e.g., use a lookup method on brain.tools or
inspect brain.tools._registry) and only skip registration if the name already
exists, or update ToolRegistry.register in src/gradata/contrib/patterns/tools.py
to raise a specific DuplicateRegistrationError on duplicates and catch that
specific exception here; in short, remove the bare except Exception and
implement an explicit existence check or catch a specific duplicate-registration
exception for the brain_apply_rules and earlier registration calls so real
constructor/attribute errors surface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/enhancements/contradiction_detector.py`:
- Around line 115-127: The _check_pair_list function currently returns the
provided score value directly, which can violate the [0.0, 1.0] scoring
invariant; update _check_pair_list to clamp the score to the [0.0, 1.0] range
before returning (e.g., ensure any returned score is min(max(score, 0.0), 1.0)),
applying this clamp both when returning the matched score and for the default
return so all exits from _check_pair_list honor the confidence bounds.

In `@src/gradata/enhancements/git_backfill.py`:
- Around line 159-166: The current block in git_backfill.py that checks
files_result.ok silently skips when files_result.completed exists but returncode
!= 0; update the handling in the failing branch (the if not files_result.ok
block around files_result) to also detect files_result.completed and log its
return code and stderr/stdout (e.g., include files_result.completed.returncode
and files_result.completed.stderr or files_result.completed.stdout) using
_log.warning alongside the existing type/exception log so non-zero git diff-tree
return codes are recorded for repo_path and commit_hash before continuing;
ensure you still continue on failure but include these diagnostics for easier
triage.

---

Outside diff comments:
In `@src/gradata/enhancements/pattern_integration.py`:
- Around line 421-434: The current broad except blocks around
brain.tools.register(ToolSpec(...)) silently hide real errors and misleadingly
claim "Already registered"; instead, stop swallowing Exception: either check for
existing entries on the registry before calling brain.tools.register (e.g., use
a lookup method on brain.tools or inspect brain.tools._registry) and only skip
registration if the name already exists, or update ToolRegistry.register in
src/gradata/contrib/patterns/tools.py to raise a specific
DuplicateRegistrationError on duplicates and catch that specific exception here;
in short, remove the bare except Exception and implement an explicit existence
check or catch a specific duplicate-registration exception for the
brain_apply_rules and earlier registration calls so real constructor/attribute
errors surface.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b0c6a146-464f-4c22-9328-848bd368ccd3

📥 Commits

Reviewing files that changed from the base of the PR and between ce1ad3c and a1181bc.

📒 Files selected for processing (11)
  • src/gradata/brain.py
  • src/gradata/enhancements/carl.py
  • src/gradata/enhancements/contradiction_detector.py
  • src/gradata/enhancements/diff_engine.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/enhancements/llm_provider.py
  • src/gradata/enhancements/pattern_extractor.py
  • src/gradata/enhancements/pattern_integration.py
  • src/gradata/enhancements/quality_monitoring.py
  • src/gradata/enhancements/rule_canary.py
  • src/gradata/enhancements/rule_integrity.py
💤 Files with no reviewable changes (1)
  • src/gradata/enhancements/pattern_extractor.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/brain.py
  • src/gradata/enhancements/carl.py
  • src/gradata/enhancements/rule_canary.py
  • src/gradata/enhancements/quality_monitoring.py
  • src/gradata/enhancements/pattern_integration.py
  • src/gradata/enhancements/diff_engine.py
  • src/gradata/enhancements/contradiction_detector.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/enhancements/llm_provider.py
  • src/gradata/enhancements/rule_integrity.py
🔇 Additional comments (17)
src/gradata/enhancements/rule_canary.py (2)

20-30: Good extraction of UTC timestamp logic.

Centralizing timestamp generation in _now_iso() removes repeated inline datetime code and keeps audit formatting consistent.


109-109: Nice call-site cleanup with shared helper.

Switching all updated_at writes to _now_iso() improves consistency and reduces drift risk across canary state transitions.

Also applies to: 210-210, 244-244

src/gradata/enhancements/carl.py (1)

216-216: Simplification is correct and behavior-preserving.

Using bool(...) on the MUST-filtered constraint list is equivalent to the previous explicit check and removes redundant work.

src/gradata/enhancements/quality_monitoring.py (2)

101-105: Helper extraction is clean and behavior-preserving.

Line 101 centralizes compile behavior, and Line 116 reuses it in initialization, which removes duplicated regex/literal branching without changing semantics.

Also applies to: 116-118


143-147: add_pattern mutation order is now atomic and correct.

Line 145 compiles before list mutation, so a regex compile error no longer leaves partially updated detector state.

src/gradata/enhancements/diff_engine.py (2)

148-199: Clean consolidation of diff analysis into a single pass.

The refactoring correctly combines section extraction and stats computation, building SequenceMatcher once instead of twice. The counting logic for insert/delete/replace opcodes is correct, and type annotations are accurate.


282-282: LGTM!

Clean unpacking of the consolidated function result maintains the same behavior while benefiting from the single-pass optimization.

src/gradata/enhancements/contradiction_detector.py (2)

97-113: Good extraction of stopwords to module scope.

This removes repeated allocations and keeps _extract_topic_words() concise without changing intent.


132-142: Nice deduplication across contradiction checks.

Centralizing pair-scan logic improves maintainability and keeps call-site scores readable.

src/gradata/enhancements/llm_provider.py (3)

60-87: Good extraction of shared OpenAI-compatible completion path.

This helper cleanly removes duplication while preserving the existing failure semantics and response validation guard.


100-103: OpenAIProvider.complete delegation is correct and minimal.

Parameter forwarding is clear and keeps provider-specific auth/base URL behavior intact.


122-126: GenericHTTPProvider.complete delegation preserves local fallback behavior.

The placeholder-key fallback and base URL wiring are retained while reusing shared logic.

src/gradata/enhancements/git_backfill.py (2)

43-59: Strong _GitResult shape for preserving subprocess root cause.

This is a clean refactor: callers can now distinguish process failure vs launch/timeout failure without losing context.


118-149: Single-pass commit/date retrieval removes the prior N+1 pattern effectively.

Good improvement in subprocess efficiency while keeping parsing straightforward.

src/gradata/brain.py (1)

420-420: patch_rule now passes BrainContext correctly.

Line 420 correctly updates sign_and_store(...) to use self.ctx, matching the refactored DB-access contract and avoiding path-based coupling.

As per coding guidelines, "all functions accepting BrainContext where DB access occurs".

src/gradata/enhancements/rule_integrity.py (2)

30-33: Type-only BrainContext import is clean and runtime-safe.

Using TYPE_CHECKING here keeps runtime imports lean while preserving type clarity for DB-facing signatures.


217-310: DB helpers are consistently context-driven and safely managed.

The move to ctx: BrainContext plus context-managed SQLite connections is solid across _ensure_table, store_signature, load_signatures, _load_signature, and verify_from_db.

As per coding guidelines, "all functions accepting BrainContext where DB access occurs".

Comment thread src/gradata/enhancements/contradiction_detector.py
Comment thread src/gradata/enhancements/git_backfill.py
- contradiction_detector: clamp _check_pair_list score to [0.0, 1.0]
  to enforce scoring invariants at the central helper.
- git_backfill: log non-zero rc + stderr on git diff-tree failure
  instead of silently skipping the commit (easier backfill triage).
- pattern_integration: replace misleading 'except Exception' around
  tool registration with explicit .get() existence checks; registry
  overwrites via dict assignment so the except was dead code hiding
  real ctor/attr errors.
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 14, 2026

Addressed round-3 CR (b77ff61):

  • contradiction_detector.py_check_pair_list now clamps score to [0.0, 1.0] before returning, enforcing the scoring invariant at the central helper.
  • git_backfill.py — when git diff-tree returns non-zero (result present, rc != 0) we now log the rc and stderr/stdout at WARNING before continuing, instead of silently dropping the commit.
  • pattern_integration.py — removed misleading except Exception: pass # Already registered blocks around brain.tools.register. ToolRegistry.register overwrites via dict assignment and never raises, so the except was dead code hiding real ctor/attr errors. Replaced with explicit brain.tools.get(name) is None guards.

Tests: 2065 pass, 23 skip (excluding one unrelated local network flake in test_plugin_integration.py).

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 14, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 15, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/enhancements/contradiction_detector.py`:
- Around line 123-126: The current contradiction check in the loop over pairs
(variables left, right and collections new_norm, existing_norm) uses raw
substring membership which yields false positives (e.g., "allow" matching inside
"disallow"); change it to word-boundary matching by either tokenizing
new_norm/existing_norm into word sets and checking exact token membership, or
constructing regex patterns using re.escape(left/right) with \b anchors and
using re.search against the texts; update the condition inside the for left,
right in pairs loop to use these word-boundary checks consistently for both
directions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 546bb0cf-6274-497d-9889-b9f7607bed4e

📥 Commits

Reviewing files that changed from the base of the PR and between a1181bc and b77ff61.

📒 Files selected for processing (3)
  • src/gradata/enhancements/contradiction_detector.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/enhancements/pattern_integration.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/enhancements/pattern_integration.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/enhancements/contradiction_detector.py
🔇 Additional comments (7)
src/gradata/enhancements/pattern_integration.py (1)

413-430: Good change: registration is now idempotent and no longer masks real failures.

Guarding with get(...) is None before register(...) avoids silent overwrites on repeated calls, and removing broad except blocks improves debuggability for unexpected errors.

src/gradata/enhancements/git_backfill.py (4)

43-58: Good separation between subprocess failures and git exit failures.

_GitResult plus _git() makes the call sites much clearer and preserves launch/timeout errors without conflating them with non-zero git return codes.

Also applies to: 107-116


118-149: Nice removal of the per-commit git log lookup.

Fetching hash and ISO date in one pass materially reduces subprocess churn while keeping the parser straightforward.


157-173: The diff-tree failure path is much easier to triage now.

Logging both invocation errors and non-zero return codes should make backfill gaps diagnosable from production logs.


181-204: Good bounds on per-commit backfill work.

The file cap and size guards still keep this scan predictable after the helper refactor.

src/gradata/enhancements/contradiction_detector.py (2)

115-128: Score clamping is correctly centralized in the shared helper.

Line 122 enforces confidence bounds before returning matched scores, which aligns with scoring invariants.

As per coding guidelines, "Severity scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0]."


97-113: Good refactor: shared stopwords + shared pair-check helper reduce duplication cleanly.

This keeps behavior cohesive and makes future updates safer across polarity/negation/sentiment checks.

Also applies to: 133-143

Comment thread src/gradata/enhancements/contradiction_detector.py
Round-4 CR: substring checks like 'allow' in 'disallow' produced false
contradiction hits. Use re.search with negative-lookbehind/lookahead
word-boundary anchors via _contains_phrase helper.

Co-Authored-By: Gradata <noreply@gradata.ai>
@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 15, 2026

Addressed round-4 CR:

  • contradiction_detector.py: replaced raw substring matching with word-boundary regex matching (_contains_phrase helper using (?<!\w) / (?!\w) anchors). Eliminates false hits like allow matching inside disallow. Both directions of each pair use the new helper consistently.

Tests: 2070 passed, 23 skipped.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 15, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 15, 2026

@coderabbitai review — CR Pro active, please re-verify.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

@Gradata, acknowledged — triggering a full re-verification pass against the latest commit now.

The round-4 fix in contradiction_detector.py looks like a meaningful correctness improvement: promoting from raw in substring checks to _contains_phrase with (?<!\w) / (?!\w) word-boundary anchors is the right approach — it prevents partial-word false positives (e.g., allow matching inside disallow, use inside misuse) that could inflate contradiction scores incorrectly. Applying the helper consistently in both directions of each pair keeps the bidirectional matching logic uniform.

Test baseline restored to 2070 passed, 23 skipped is a good sign the fix introduced no regressions. I'll go through the full diff surface now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 15, 2026

Rebased + round-5 CR addressed

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 15, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented Apr 15, 2026

@coderabbitai review — rebased on latest main, all prior fixes applied

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

@Gradata, acknowledged — triggering a full review pass against the rebased branch now. Given the prior rounds, I'll go through the entire diff surface with particular attention to:

  • contradiction_detector.py_contains_phrase word-boundary anchors, _check_pair_list score clamping, and the _STOPWORDS promotion.
  • git_backfill.py — single-pass git log --format=%H %cI, _GitResult dataclass, and non-zero rc logging for git diff-tree.
  • quality_monitoring.py — compile-first atomicity in add_pattern.
  • rule_integrity.py — full BrainContext propagation and with sqlite3.connect(...) context managers.
  • pattern_integration.py — existence-guard replacements for the dead except blocks.
  • Remaining modules: llm_provider, diff_engine, rule_canary, carl, pattern_extractor.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata Gradata merged commit 9c3b91b into main Apr 15, 2026
11 checks passed
@Gradata Gradata deleted the refactor/simplify-sdk-enhancements branch April 17, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant