feat: meta-rule discovery pipeline + behavioral extraction#19
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a deterministic archetype-based behavioral-instruction extractor (with optional LLM fallback), integrates it into brain.correct() as the primary extraction path with a legacy fallback, adds persistent correction-pattern storage/querying, and expands unit and E2E tests covering correction → graduation → meta-rule workflow. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Brain as brain.correct()
participant BehavioralExt as behavioral_extractor.extract_instruction()
participant Archetype as detect_archetype()
participant Gen as generate_instruction()
participant LLM as llm_provider (optional)
participant Legacy as extract_behavioral_instruction()
Client->>Brain: correct(draft, final, primary, category)
Brain->>BehavioralExt: extract_instruction(draft, final, classification, category, llm_provider)
BehavioralExt->>Archetype: detect_archetype(draft, final, classification)
Archetype-->>BehavioralExt: ArchetypeMatch
BehavioralExt->>Gen: generate_instruction(match, category)
Gen-->>BehavioralExt: instruction or None
alt instruction missing or low confidence
BehavioralExt->>LLM: extract(...) (optional)
LLM-->>BehavioralExt: refined_instruction or error
end
alt instruction obtained
BehavioralExt-->>Brain: instruction
Brain->>Brain: desc = instruction
else fallback
Brain->>Legacy: extract_behavioral_instruction(diff, primary, cache)
Legacy-->>Brain: legacy_desc or exception
Brain->>Brain: desc = legacy_desc or primary.description
end
Brain-->>Client: corrected result (with desc)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_core_behavioral.py (1)
42-44: 🧹 Nitpick | 🔵 TrivialConsider strengthening the fallback test assertion.
The assertion only checks for presence of state markers (
INSTINCTorPATTERN), not whether a specific fallback description was used. Consider also verifying that the lesson description doesn't contain the mocked string (to confirm the fallback path was actually taken).Suggested enhancement
lessons_path = Path(d) / "lessons.md" if lessons_path.exists(): content = lessons_path.read_text(encoding="utf-8") assert "INSTINCT" in content or "PATTERN" in content + # Verify fallback was used (mocked string should NOT appear) + assert "Use casual, direct tone" not in content🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_core_behavioral.py` around lines 42 - 44, The current test only asserts presence of a state marker; also assert that the lesson text used the fallback by ensuring the mocked description value is not present and/or that the description equals the expected fallback string: after reading lessons_path into content, keep the existing assert "INSTINCT" in content or "PATTERN" in content and add an assertion that the mocked description string used in the test mock (replace with the actual mocked value) is not in content (or alternatively assert content contains the known fallback phrase), referencing the variables lessons_path and content in tests/test_core_behavioral.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Line 314: The parameter `category` on generate_instruction(match:
ArchetypeMatch, category: str = "") is unused—remove the `category` parameter
from the function signature and any default, and then update all call sites that
pass a category (e.g., where generate_instruction(...) is invoked) to call
generate_instruction(match) instead; alternatively, if you intend to vary
templates by category, implement usage inside generate_instruction (e.g., branch
on the category string) and keep the signature—choose one approach and make the
corresponding change consistently across the codebase.
- Around line 416-426: The _try_llm_extract function currently swallows all
exceptions with a bare except; update it to catch Exception as e and log the
exception at debug (or debug + stack) before returning None so failures from
llm_provider.extract are visible; keep the existing flow that returns refined
only if _is_actionable(refined) and otherwise returns None. Target the
_try_llm_extract function and the llm_provider.extract call when adding the
logging; use the module logger (e.g., logger.debug or similar) to record the
exception and any contextual info (draft, final, classification) without
changing return behavior.
In `@src/gradata/enhancements/meta_rules_storage.py`:
- Around line 340-342: PATTERN_SEVERITY_WEIGHTS are aggregation multipliers
(range ~0.5–2.5) used for pattern graduation scoring and are intentionally on a
different scale than self_improvement.SEVERITY_WEIGHTS (which are clamped to
[0,1]); update the nearby comment or docstring in meta_rules_storage.py to
explicitly state that PATTERN_SEVERITY_WEIGHTS are multipliers/aggregation
weights (not per-item severity scores) and mention their numeric range and
relation to self_improvement.SEVERITY_WEIGHTS so future readers won’t expect
clamped [0,1] values.
- Around line 445-465: The SQLite connection opened via sqlite3.connect (conn)
can leak if conn.execute(...).fetchall() raises; wrap the DB usage in a safe
context by either using the connection as a context manager (with
sqlite3.connect(...) as conn:) or surrounding the execute/fetchall with
try/finally and calling conn.close() in finally; update the block that creates
conn, sets conn.row_factory, runs the SELECT (the execute(...).fetchall() call)
and closes conn to ensure conn.close() always runs even on exceptions.
- Around line 345-367: The pattern-tracking helpers (ensure_pattern_table,
upsert_correction_pattern, upsert_correction_patterns_batch,
query_graduation_candidates) are never called so extracted patterns aren’t
persisted; wire them into the correction pipeline by: ensure_pattern_table is
invoked during DB initialization, call extract_patterns from brain.correct() and
immediately persist results using upsert_correction_patterns_batch (or
upsert_correction_pattern for single entries) with the current session_id and
severity metadata, and in brain.end_session() call query_graduation_candidates
to fetch cross-session candidates and incorporate/persist any graduations into
meta-rules (or the existing meta-rule persistence flow); reference these
functions when adding the calls so the persistence happens whenever corrections
are made and at session teardown.
In `@tests/test_pipeline_e2e.py`:
- Around line 90-109: The test assumes discover_meta_rules(...) returns
meta-rules but that function is cloud-gated in the OSS build and may return an
empty list; update the test in test_meta_rule_discovery_from_related_corrections
to handle the gated case by either skipping the test when cloud-gating is active
(use pytest.skip with a clear message) or by asserting conditionally: call
discover_meta_rules(rule_lessons, ...), and if it returns an empty list treat
the test as skipped/expected (or assert len(metas) == 0 with an informative
message), otherwise run the existing assertions on the first meta (referencing
discover_meta_rules, Lesson, and LessonState to locate the relevant code).
- Around line 21-27: Remove the hardcoded Windows path by stopping insertion of
"C:/Users/..." into sys.path; instead only respect an optional
GRADATA_CLOUD_PATH environment variable or prefer the package import fallback.
Update the block that sets _cloud_path (created via os.environ.get) and the
sys.path.insert call so it only inserts when the env var is set and non-empty,
then attempt importing discover_meta_rules and merge_into_meta accordingly
(symbols: _cloud_path, sys.path.insert, discover_meta_rules, merge_into_meta).
Ensure the tests rely on the package import fallback (from
gradata.enhancements.meta_rules) when no env var is provided.
---
Outside diff comments:
In `@tests/test_core_behavioral.py`:
- Around line 42-44: The current test only asserts presence of a state marker;
also assert that the lesson text used the fallback by ensuring the mocked
description value is not present and/or that the description equals the expected
fallback string: after reading lessons_path into content, keep the existing
assert "INSTINCT" in content or "PATTERN" in content and add an assertion that
the mocked description string used in the test mock (replace with the actual
mocked value) is not in content (or alternatively assert content contains the
known fallback phrase), referencing the variables lessons_path and content in
tests/test_core_behavioral.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8e51391b-acd7-475c-a256-a29e1d38b150
📒 Files selected for processing (6)
src/gradata/_core.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/enhancements/meta_rules_storage.pytests/test_convergence_gate.pytests/test_core_behavioral.pytests/test_pipeline_e2e.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_convergence_gate.pytests/test_core_behavioral.pytests/test_pipeline_e2e.py
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/_core.pysrc/gradata/enhancements/meta_rules_storage.pysrc/gradata/enhancements/behavioral_extractor.py
🪛 GitHub Actions: CI
tests/test_pipeline_e2e.py
[error] 102-102: pytest failed in TestPipelineE2E.test_meta_rule_discovery_from_related_corrections. AssertionError: 4 RULE-graduated PROCESS lessons should produce at least 1 meta-rule (assert len(metas) >= 1); got len(metas)=0.
🪛 Ruff (0.15.9)
tests/test_convergence_gate.py
[warning] 48-49: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
tests/test_pipeline_e2e.py
[error] 87-87: Ambiguous variable name: l
(E741)
[error] 209-209: Ambiguous variable name: l
(E741)
src/gradata/enhancements/behavioral_extractor.py
[warning] 54-54: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
[warning] 168-168: Too many return statements (14 > 6)
(PLR0911)
[warning] 168-168: Too many branches (15 > 12)
(PLR0912)
[warning] 196-196: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 239-239: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 278-278: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 314-314: Too many return statements (19 > 6)
(PLR0911)
[warning] 314-314: Too many branches (18 > 12)
(PLR0912)
[warning] 314-314: Unused function argument: category
(ARG001)
[error] 424-425: try-except-pass detected, consider logging the exception
(S110)
[warning] 424-424: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (7)
tests/test_pipeline_e2e.py (2)
86-88: Ambiguous variable namelis acceptable in list comprehensions.The static analysis flags
las ambiguous (E741), but in context (for l in lessons) it's a common and readable pattern for "lesson". No change required, but considerlessonfor clarity if refactoring.Also applies to: 208-216
79-81: Good assertion pattern: checks specific severity values.The assertion verifies the severity value is one of the expected discrete values rather than just checking truthiness.
src/gradata/enhancements/behavioral_extractor.py (2)
476-478:extract_instructionreturnsNonewhen category has no fallback.If the category is not in
_GENERIC_FALLBACKS(e.g., "UNKNOWN" or a typo), the function returnsNone. This is documented in the return type, but callers in_core.pyhandle this correctly viabehavioral_desc or primary.description. Just noting this is intentional.
168-307: Archetype detection logic is comprehensive.The 12-archetype taxonomy with sentence-level analysis and ordered resolution is well-designed. The confidence values are appropriately in
[0.0, 1.0], and the fallback chain (prefix → hedging → constraint → action step → tone → length → reorder → factual → format → content removal → word replacement → fallback) covers the expected correction patterns.src/gradata/_core.py (1)
228-250: Clean integration of behavioral extractor with proper fallback chain.The two-stage extraction sequence is well-implemented:
- Try archetype-based
extract_instruction()first- Fall back to keyword templates via
extract_behavioral_instruction()if step 1 returns falsy- Final fallback to
primary.descriptionThe
isinstanceguard on_instruction_cache(line 243) correctly handles uninitialized/None cache. Exception handling preserves the fallback toprimary.description.tests/test_convergence_gate.py (1)
48-55: Mock updated correctly to match new extraction flow.The patch target now correctly reflects the updated
brain_correct()flow whereextract_instructionfrombehavioral_extractoris called first. The non-None return value ensures the primary extraction path is exercised without triggering fallbacks.tests/test_core_behavioral.py (1)
15-18: Mocks correctly updated to test both extraction paths.Both tests properly patch the new
extract_instructionfunction:
test_correct_uses_behavioral_description: Returns a string → verifies it appears in lessonstest_correct_falls_back_to_old_description: ReturnsNone→ verifies fallback to old description pathThis provides good coverage of the two-stage extraction sequence.
Also applies to: 33-36
| # Template Generation | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| def generate_instruction(match: ArchetypeMatch, category: str = "") -> str: |
There was a problem hiding this comment.
Unused category parameter in generate_instruction().
The category parameter is declared but never used in the function body. Either remove it or use it (e.g., for category-specific template variations).
Proposed fix (remove unused parameter)
-def generate_instruction(match: ArchetypeMatch, category: str = "") -> str:
+def generate_instruction(match: ArchetypeMatch) -> str:
"""Generate an imperative behavioral instruction from an archetype match."""And update the call site at line 461:
- instruction = generate_instruction(match, category)
+ instruction = generate_instruction(match)🧰 Tools
🪛 Ruff (0.15.9)
[warning] 314-314: Too many return statements (19 > 6)
(PLR0911)
[warning] 314-314: Too many branches (18 > 12)
(PLR0912)
[warning] 314-314: Unused function argument: category
(ARG001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/behavioral_extractor.py` at line 314, The parameter
`category` on generate_instruction(match: ArchetypeMatch, category: str = "") is
unused—remove the `category` parameter from the function signature and any
default, and then update all call sites that pass a category (e.g., where
generate_instruction(...) is invoked) to call generate_instruction(match)
instead; alternatively, if you intend to vary templates by category, implement
usage inside generate_instruction (e.g., branch on the category string) and keep
the signature—choose one approach and make the corresponding change consistently
across the codebase.
| def _try_llm_extract(llm_provider, draft: str, final: str, classification) -> str | None: | ||
| """Try LLM extraction, return result or None on failure.""" | ||
| if llm_provider is None: | ||
| return None | ||
| try: | ||
| refined = llm_provider.extract(draft, final, classification) | ||
| if refined and _is_actionable(refined): | ||
| return refined | ||
| except Exception: | ||
| pass | ||
| return None |
There was a problem hiding this comment.
Silent exception swallowing loses diagnostic information.
The bare except Exception: pass hides all errors from LLM extraction, making debugging difficult when the LLM provider misbehaves. At minimum, log at debug level.
Proposed fix
+import logging
+
+_log = logging.getLogger("gradata")
+
def _try_llm_extract(llm_provider, draft: str, final: str, classification) -> str | None:
"""Try LLM extraction, return result or None on failure."""
if llm_provider is None:
return None
try:
refined = llm_provider.extract(draft, final, classification)
if refined and _is_actionable(refined):
return refined
- except Exception:
- pass
+ except Exception as e:
+ _log.debug("LLM extraction failed: %s", e)
return None🧰 Tools
🪛 Ruff (0.15.9)
[error] 424-425: try-except-pass detected, consider logging the exception
(S110)
[warning] 424-424: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/behavioral_extractor.py` around lines 416 - 426, The
_try_llm_extract function currently swallows all exceptions with a bare except;
update it to catch Exception as e and log the exception at debug (or debug +
stack) before returning None so failures from llm_provider.extract are visible;
keep the existing flow that returns refined only if _is_actionable(refined) and
otherwise returns None. Target the _try_llm_extract function and the
llm_provider.extract call when adding the logging; use the module logger (e.g.,
logger.debug or similar) to record the exception and any contextual info (draft,
final, classification) without changing return behavior.
| def ensure_pattern_table(db_path: str | Path) -> None: | ||
| """Create correction_patterns table if it doesn't exist.""" | ||
| conn = sqlite3.connect(str(db_path)) | ||
| try: | ||
| conn.execute(""" | ||
| CREATE TABLE IF NOT EXISTS correction_patterns ( | ||
| pattern_hash TEXT NOT NULL, | ||
| category TEXT NOT NULL, | ||
| representative_text TEXT NOT NULL, | ||
| session_id INTEGER NOT NULL, | ||
| severity TEXT DEFAULT 'minor', | ||
| severity_weight REAL DEFAULT 1.0, | ||
| created_at TEXT DEFAULT (datetime('now')), | ||
| UNIQUE(pattern_hash, session_id) | ||
| ) | ||
| """) | ||
| conn.execute(""" | ||
| CREATE INDEX IF NOT EXISTS idx_patterns_hash | ||
| ON correction_patterns(pattern_hash) | ||
| """) | ||
| conn.commit() | ||
| finally: | ||
| conn.close() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for usages of the new pattern tracking functions outside of tests and the storage module itself
echo "=== Searching for upsert_correction_pattern calls ==="
rg -n "upsert_correction_pattern" --type py -g '!tests/*' -g '!*_storage.py'
echo "=== Searching for ensure_pattern_table calls ==="
rg -n "ensure_pattern_table" --type py -g '!tests/*' -g '!*_storage.py'
echo "=== Searching for query_graduation_candidates calls ==="
rg -n "query_graduation_candidates" --type py -g '!tests/*' -g '!*_storage.py'Repository: Gradata/gradata
Length of output: 510
🏁 Script executed:
# First, let's understand the structure and see the actual function definitions
wc -l src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 109
🏁 Script executed:
# Read the storage module to understand the functions
sed -n '345,475p' src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 4547
🏁 Script executed:
# Check if these functions are exported from the module
rg -n "__all__" src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Search for any calls to these functions within the storage module itself
rg -n "ensure_pattern_table|upsert_correction_pattern|query_graduation_candidates" src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 231
🏁 Script executed:
# Check the brain.correct() and brain.end_session() implementations
fd -name "*.py" -path "*/gradata/*" | xargs grep -l "def correct\|def end_session" | grep -v testRepository: Gradata/gradata
Length of output: 289
🏁 Script executed:
# Check the top of the storage module for imports and exports
head -50 src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 1440
🏁 Script executed:
# Find files that define brain.correct() and brain.end_session()
fd -name "*.py" -path "*/gradata/*" type f | head -20Repository: Gradata/gradata
Length of output: 289
🏁 Script executed:
# Search for brain class and correct method
rg "def correct\(" src/gradata/ --type py -A 5Repository: Gradata/gradata
Length of output: 1803
🏁 Script executed:
# Search for brain end_session
rg "def end_session\(" src/gradata/ --type py -A 5Repository: Gradata/gradata
Length of output: 716
🏁 Script executed:
# Look at the test file mentioned in the review
cat -n src/gradata/tests/test_pipeline_e2e.py | grep -A 30 "test_correction_pattern_tracking"Repository: Gradata/gradata
Length of output: 129
🏁 Script executed:
# Find test files with "pattern" in the name
fd ".*test.*\.py" src/gradata/tests/ | xargs grep -l "correction_pattern_tracking\|correction_patterns"Repository: Gradata/gradata
Length of output: 166
🏁 Script executed:
# Read the brain.py file to see correct() implementation
wc -l src/gradata/brain.pyRepository: Gradata/gradata
Length of output: 84
🏁 Script executed:
# Read brain.correct() implementation
rg "def correct\(" src/gradata/brain.py -A 50Repository: Gradata/gradata
Length of output: 2969
🏁 Script executed:
# Read brain.end_session() implementation
rg "def end_session\(" src/gradata/brain.py -A 40Repository: Gradata/gradata
Length of output: 2340
🏁 Script executed:
# Check for TODO/FIXME comments in meta_rules_storage.py
rg "TODO|FIXME|XXX|HACK" src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Check the _core module which is imported
fd "_core.py" src/gradata/Repository: Gradata/gradata
Length of output: 79
🏁 Script executed:
# Check _core.py for brain_correct implementation
rg "def brain_correct" src/gradata/_core.py -A 60Repository: Gradata/gradata
Length of output: 2809
🏁 Script executed:
# Check _core.py for brain_end_session implementation
rg "def brain_end_session" src/gradata/_core.py -A 60Repository: Gradata/gradata
Length of output: 2723
🏁 Script executed:
# Search for any usage of pattern tracking functions in _core.py
rg "ensure_pattern_table|upsert_correction_pattern|query_graduation_candidates" src/gradata/_core.pyRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Find test files (check broader directory)
fd "test.*\.py" src/gradata/ | head -20Repository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Search for pattern tracking test
rg "test_correction_pattern_tracking" src/gradata/Repository: Gradata/gradata
Length of output: 167
🏁 Script executed:
# Search all Python files for the test function name
rg "test_correction_pattern_tracking" src/Repository: Gradata/gradata
Length of output: 167
🏁 Script executed:
# Check if there are test directories
find src -type d -name "*test*" 2>/dev/nullRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Search for files with "pattern" and "test" in name
rg -l "pattern.*test|test.*pattern" src/ --type pyRepository: Gradata/gradata
Length of output: 167
🏁 Script executed:
# Check the full brain_correct to see if pattern tracking is there
rg "def brain_correct" src/gradata/_core.py -A 150 | tail -100Repository: Gradata/gradata
Length of output: 4236
🏁 Script executed:
# Search for any pattern tracking in _core.py
rg "pattern" src/gradata/_core.py -iRepository: Gradata/gradata
Length of output: 1628
🏁 Script executed:
# Look for imports of pattern functions in _core.py
rg "from.*meta_rules_storage|import.*ensure_pattern_table|import.*upsert_correction" src/gradata/_core.pyRepository: Gradata/gradata
Length of output: 159
🏁 Script executed:
# Look at the _core.py end_session more completely to check for pattern tracking
rg "def brain_end_session" src/gradata/_core.py -A 200 | tail -150Repository: Gradata/gradata
Length of output: 8124
🏁 Script executed:
# Search for any PATTERN_SEVERITY_WEIGHTS constant which is referenced in pattern functions
rg "PATTERN_SEVERITY_WEIGHTS" src/gradata/Repository: Gradata/gradata
Length of output: 552
🏁 Script executed:
# Check the entire meta_rules_storage.py for context about these functions
head -100 src/gradata/enhancements/meta_rules_storage.py | tail -50Repository: Gradata/gradata
Length of output: 1765
🏁 Script executed:
# Check if there's a correction_patterns table referenced anywhere
rg "correction_patterns" src/gradata/ --type pyRepository: Gradata/gradata
Length of output: 1199
🏁 Script executed:
# Check git log for recent changes to meta_rules_storage.py
git log --oneline -20 src/gradata/enhancements/meta_rules_storage.py 2>/dev/null || echo "Git not available"Repository: Gradata/gradata
Length of output: 125
🏁 Script executed:
# Verify the line ranges mentioned in the comment
sed -n '370,396p' src/gradata/enhancements/meta_rules_storage.py | head -20Repository: Gradata/gradata
Length of output: 933
🏁 Script executed:
# Check for any test files that might reference pattern tracking
find src -name "*.py" -type f | xargs grep -l "correction_pattern_tracking\|correction_patterns"Repository: Gradata/gradata
Length of output: 149
🏁 Script executed:
# Look for any PR or issue context in comments
rg "cross-session|pattern.*tracking|correction_patterns" src/gradata/enhancements/meta_rules_storage.py -B 2 -A 2Repository: Gradata/gradata
Length of output: 1661
🏁 Script executed:
# Check for any TODO/FIXME comments about pattern tracking integration
rg "TODO|FIXME|TODO:|FIXME:|future|defer|deferred" src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 93
🏁 Script executed:
# Check the comments and docstrings around pattern functions for context
sed -n '340,475p' src/gradata/enhancements/meta_rules_storage.py | grep -A 2 -B 2 "def\|#"Repository: Gradata/gradata
Length of output: 763
🏁 Script executed:
# Verify the exact line numbers for the functions mentioned
rg -n "^def ensure_pattern_table|^def upsert_correction_pattern|^def upsert_correction_patterns_batch|^def query_graduation_candidates" src/gradata/enhancements/meta_rules_storage.pyRepository: Gradata/gradata
Length of output: 231
Pattern tracking functions are not integrated into the correction pipeline.
These functions (ensure_pattern_table, upsert_correction_pattern, upsert_correction_patterns_batch, query_graduation_candidates) are defined and fully implemented, but are never called from brain.correct() or brain.end_session(). The PR mentions "cross-session pattern tracking," but the architecture shows these helpers are orphaned—corrections don't populate the correction_patterns table. In brain.correct(), extract_patterns() is called but results are not persisted; in brain.end_session(), only meta-rules are discovered and persisted.
Is this intentional (deferred to a future PR), or should these be wired into the pipeline now?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/meta_rules_storage.py` around lines 345 - 367, The
pattern-tracking helpers (ensure_pattern_table, upsert_correction_pattern,
upsert_correction_patterns_batch, query_graduation_candidates) are never called
so extracted patterns aren’t persisted; wire them into the correction pipeline
by: ensure_pattern_table is invoked during DB initialization, call
extract_patterns from brain.correct() and immediately persist results using
upsert_correction_patterns_batch (or upsert_correction_pattern for single
entries) with the current session_id and severity metadata, and in
brain.end_session() call query_graduation_candidates to fetch cross-session
candidates and incorporate/persist any graduations into meta-rules (or the
existing meta-rule persistence flow); reference these functions when adding the
calls so the persistence happens whenever corrections are made and at session
teardown.
| conn = sqlite3.connect(str(db_path)) | ||
| conn.row_factory = sqlite3.Row | ||
| rows = conn.execute( | ||
| """SELECT | ||
| pattern_hash, | ||
| category, | ||
| representative_text, | ||
| COUNT(DISTINCT session_id) AS distinct_sessions, | ||
| SUM(severity_weight) AS weighted_score, | ||
| MIN(created_at) AS first_seen, | ||
| MAX(created_at) AS last_seen, | ||
| GROUP_CONCAT(DISTINCT session_id) AS session_ids | ||
| FROM correction_patterns | ||
| GROUP BY pattern_hash | ||
| HAVING COUNT(DISTINCT session_id) >= ? | ||
| AND SUM(severity_weight) >= ? | ||
| ORDER BY weighted_score DESC | ||
| """, | ||
| (min_sessions, min_score), | ||
| ).fetchall() | ||
| conn.close() |
There was a problem hiding this comment.
Potential resource leak if query fails.
The connection is opened at line 445 but conn.close() is called outside a try/finally block. If fetchall() raises an exception, the connection will leak.
Proposed fix
def query_graduation_candidates(
db_path: str | Path,
min_sessions: int = 2,
min_score: float = 3.0,
) -> list[dict]:
"""Find correction patterns ready for meta-rule graduation.
Returns patterns where:
- Distinct sessions >= min_sessions
- Sum of severity weights >= min_score
"""
conn = sqlite3.connect(str(db_path))
- conn.row_factory = sqlite3.Row
- rows = conn.execute(
- """SELECT
- pattern_hash,
- category,
- representative_text,
- COUNT(DISTINCT session_id) AS distinct_sessions,
- SUM(severity_weight) AS weighted_score,
- MIN(created_at) AS first_seen,
- MAX(created_at) AS last_seen,
- GROUP_CONCAT(DISTINCT session_id) AS session_ids
- FROM correction_patterns
- GROUP BY pattern_hash
- HAVING COUNT(DISTINCT session_id) >= ?
- AND SUM(severity_weight) >= ?
- ORDER BY weighted_score DESC
- """,
- (min_sessions, min_score),
- ).fetchall()
- conn.close()
- return [dict(r) for r in rows]
+ try:
+ conn.row_factory = sqlite3.Row
+ rows = conn.execute(
+ """SELECT
+ pattern_hash,
+ category,
+ representative_text,
+ COUNT(DISTINCT session_id) AS distinct_sessions,
+ SUM(severity_weight) AS weighted_score,
+ MIN(created_at) AS first_seen,
+ MAX(created_at) AS last_seen,
+ GROUP_CONCAT(DISTINCT session_id) AS session_ids
+ FROM correction_patterns
+ GROUP BY pattern_hash
+ HAVING COUNT(DISTINCT session_id) >= ?
+ AND SUM(severity_weight) >= ?
+ ORDER BY weighted_score DESC
+ """,
+ (min_sessions, min_score),
+ ).fetchall()
+ return [dict(r) for r in rows]
+ finally:
+ conn.close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| conn = sqlite3.connect(str(db_path)) | |
| conn.row_factory = sqlite3.Row | |
| rows = conn.execute( | |
| """SELECT | |
| pattern_hash, | |
| category, | |
| representative_text, | |
| COUNT(DISTINCT session_id) AS distinct_sessions, | |
| SUM(severity_weight) AS weighted_score, | |
| MIN(created_at) AS first_seen, | |
| MAX(created_at) AS last_seen, | |
| GROUP_CONCAT(DISTINCT session_id) AS session_ids | |
| FROM correction_patterns | |
| GROUP BY pattern_hash | |
| HAVING COUNT(DISTINCT session_id) >= ? | |
| AND SUM(severity_weight) >= ? | |
| ORDER BY weighted_score DESC | |
| """, | |
| (min_sessions, min_score), | |
| ).fetchall() | |
| conn.close() | |
| def query_graduation_candidates( | |
| db_path: str | Path, | |
| min_sessions: int = 2, | |
| min_score: float = 3.0, | |
| ) -> list[dict]: | |
| """Find correction patterns ready for meta-rule graduation. | |
| Returns patterns where: | |
| - Distinct sessions >= min_sessions | |
| - Sum of severity weights >= min_score | |
| """ | |
| conn = sqlite3.connect(str(db_path)) | |
| try: | |
| conn.row_factory = sqlite3.Row | |
| rows = conn.execute( | |
| """SELECT | |
| pattern_hash, | |
| category, | |
| representative_text, | |
| COUNT(DISTINCT session_id) AS distinct_sessions, | |
| SUM(severity_weight) AS weighted_score, | |
| MIN(created_at) AS first_seen, | |
| MAX(created_at) AS last_seen, | |
| GROUP_CONCAT(DISTINCT session_id) AS session_ids | |
| FROM correction_patterns | |
| GROUP BY pattern_hash | |
| HAVING COUNT(DISTINCT session_id) >= ? | |
| AND SUM(severity_weight) >= ? | |
| ORDER BY weighted_score DESC | |
| """, | |
| (min_sessions, min_score), | |
| ).fetchall() | |
| return [dict(r) for r in rows] | |
| finally: | |
| conn.close() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/meta_rules_storage.py` around lines 445 - 465, The
SQLite connection opened via sqlite3.connect (conn) can leak if
conn.execute(...).fetchall() raises; wrap the DB usage in a safe context by
either using the connection as a context manager (with sqlite3.connect(...) as
conn:) or surrounding the execute/fetchall with try/finally and calling
conn.close() in finally; update the block that creates conn, sets
conn.row_factory, runs the SELECT (the execute(...).fetchall() call) and closes
conn to ensure conn.close() always runs even on exceptions.
| try: | ||
| _cloud_path = os.environ.get("GRADATA_CLOUD_PATH", "C:/Users/olive/SpritesWork/brain/cloud-only") | ||
| sys.path.insert(0, _cloud_path) | ||
| from meta_rules import discover_meta_rules, merge_into_meta # type: ignore[import] | ||
| except ImportError: | ||
| from gradata.enhancements.meta_rules import discover_meta_rules | ||
|
|
There was a problem hiding this comment.
Hardcoded Windows path violates test guidelines.
Line 22 contains a hardcoded Windows-specific path C:/Users/olive/SpritesWork/brain/cloud-only. This will fail on CI/other environments and violates the coding guidelines for tests. The fallback import on line 26 works, but the hardcoded path should be removed or made configurable.
Proposed fix
-# Try cloud-only override first (real discovery), fall back to SDK stubs
-try:
- _cloud_path = os.environ.get("GRADATA_CLOUD_PATH", "C:/Users/olive/SpritesWork/brain/cloud-only")
- sys.path.insert(0, _cloud_path)
- from meta_rules import discover_meta_rules, merge_into_meta # type: ignore[import]
-except ImportError:
- from gradata.enhancements.meta_rules import discover_meta_rules
+# Try cloud-only override first (real discovery), fall back to SDK stubs
+_cloud_path = os.environ.get("GRADATA_CLOUD_PATH")
+if _cloud_path:
+ sys.path.insert(0, _cloud_path)
+try:
+ from meta_rules import discover_meta_rules, merge_into_meta # type: ignore[import]
+except ImportError:
+ from gradata.enhancements.meta_rules import discover_meta_rulesAs per coding guidelines: "tests/**: no hardcoded paths"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| _cloud_path = os.environ.get("GRADATA_CLOUD_PATH", "C:/Users/olive/SpritesWork/brain/cloud-only") | |
| sys.path.insert(0, _cloud_path) | |
| from meta_rules import discover_meta_rules, merge_into_meta # type: ignore[import] | |
| except ImportError: | |
| from gradata.enhancements.meta_rules import discover_meta_rules | |
| _cloud_path = os.environ.get("GRADATA_CLOUD_PATH") | |
| if _cloud_path: | |
| sys.path.insert(0, _cloud_path) | |
| try: | |
| from meta_rules import discover_meta_rules, merge_into_meta # type: ignore[import] | |
| except ImportError: | |
| from gradata.enhancements.meta_rules import discover_meta_rules |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pipeline_e2e.py` around lines 21 - 27, Remove the hardcoded
Windows path by stopping insertion of "C:/Users/..." into sys.path; instead only
respect an optional GRADATA_CLOUD_PATH environment variable or prefer the
package import fallback. Update the block that sets _cloud_path (created via
os.environ.get) and the sys.path.insert call so it only inserts when the env var
is set and non-empty, then attempt importing discover_meta_rules and
merge_into_meta accordingly (symbols: _cloud_path, sys.path.insert,
discover_meta_rules, merge_into_meta). Ensure the tests rely on the package
import fallback (from gradata.enhancements.meta_rules) when no env var is
provided.
New: behavioral_extractor.py - 12-archetype sentence-level extraction for actionable lesson descriptions - Replaces word-diff summaries with imperative instructions - Template-based with upgrade path for LLM refinement New: correction_patterns table (meta_rules_storage.py) - Cross-session pattern tracking with batch upsert - Graduation candidate query for repeated patterns Changed: _core.py - brain.correct() uses behavioral_extractor as primary extraction path - Falls back to keyword templates New: test_pipeline_e2e.py (12 tests) - Pattern tracking, correction logging, injection formatting - Cloud-dependent tests skip on CI 1699 tests passing, 0 failures. Co-Authored-By: Gradata <noreply@gradata.ai>
0ca07b8 to
7680808
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gradata/_core.py (1)
232-253:⚠️ Potential issue | 🟠 MajorPreserve the legacy extractor when the new path throws.
Right now the fallback to
extract_behavioral_instruction()only happens whenextract_instruction()returns a falsy value. If the new extractor raises, Line 251 skips the legacy path and drops straight toprimary.description, so a bug in the new code disables the old behavior entirely.Proposed fix
- try: - from gradata.enhancements.behavioral_extractor import extract_instruction - behavioral_desc = extract_instruction( - draft, final, primary, category=cat, - ) - if not behavioral_desc: - # Fallback to keyword templates - from gradata.enhancements.edit_classifier import ( - extract_behavioral_instruction, - ) - from gradata.enhancements.instruction_cache import InstructionCache - if not isinstance(brain._instruction_cache, InstructionCache): - brain._instruction_cache = InstructionCache( - lessons_path.parent / "instruction_cache.json" - ) - behavioral_desc = extract_behavioral_instruction( - diff, primary, cache=brain._instruction_cache, # type: ignore[arg-type] - ) - desc = behavioral_desc or primary.description - except Exception as e: - _log.debug("Behavioral extraction failed: %s", e) - desc = primary.description + behavioral_desc = None + try: + from gradata.enhancements.behavioral_extractor import extract_instruction + behavioral_desc = extract_instruction( + draft, final, primary, category=cat, + ) + except Exception as e: + _log.debug("Behavioral extractor failed; trying legacy fallback: %s", e) + + if not behavioral_desc: + try: + from gradata.enhancements.edit_classifier import ( + extract_behavioral_instruction, + ) + from gradata.enhancements.instruction_cache import InstructionCache + if not isinstance(brain._instruction_cache, InstructionCache): + brain._instruction_cache = InstructionCache( + lessons_path.parent / "instruction_cache.json" + ) + behavioral_desc = extract_behavioral_instruction( + diff, primary, cache=brain._instruction_cache, # type: ignore[arg-type] + ) + except Exception as e: + _log.debug("Legacy behavioral extraction failed: %s", e) + + desc = behavioral_desc or primary.description🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_core.py` around lines 232 - 253, The try/except currently swallows exceptions from extract_instruction and directly falls back to primary.description, skipping the legacy extractor; update _core.py so that when extract_instruction raises you still attempt the legacy path: in the except block (or by restructuring the try) call extract_behavioral_instruction(...) using the same cache initialization logic around InstructionCache and brain._instruction_cache, assign its result to behavioral_desc, and then set desc = behavioral_desc or primary.description (same as the success path) so that legacy extraction runs on both falsy returns and exceptions from extract_instruction.
♻️ Duplicate comments (1)
tests/test_pipeline_e2e.py (1)
23-24:⚠️ Potential issue | 🟠 MajorRemove the developer-specific cloud path from the test module.
Falling back to
C:/Users/olive/...makes this suite host-specific and brittle on CI/non-Windows machines. Only prependGRADATA_CLOUD_PATHwhen the env var is actually set; otherwise rely on the package import fallback.As per coding guidelines, "tests/**: Test files. Verify: no hardcoded paths".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pipeline_e2e.py` around lines 23 - 24, The test currently falls back to a hardcoded developer path via the _cloud_path variable and unconditionally calls sys.path.insert(0, _cloud_path); remove the hardcoded fallback and only prepend to sys.path when the GRADATA_CLOUD_PATH environment variable is set (i.e., check os.environ.get("GRADATA_CLOUD_PATH") and if it's truthy call sys.path.insert(0, path)); ensure no other hardcoded paths remain and rely on normal package imports when the env var is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 416-425: In _try_llm_extract replace the nonexistent
llm_provider.extract call by building a prompt from draft, final, and
classification and invoking llm_provider.complete(prompt, max_tokens=...,
timeout=...) (use appropriate token/timeout values or existing defaults), then
pass the returned text to _is_actionable and return it if actionable; also
remove the silent except and log the caught Exception (include the exception
message and context) so failures to call complete are visible.
In `@src/gradata/enhancements/meta_rules_storage.py`:
- Around line 447-458: The graduation SELECT currently groups by pattern_hash
but selects non-aggregated columns category and representative_text, yielding
arbitrary rows; change the query to explicitly pick a representative row per
pattern_hash (e.g., use a subquery or window function like ROW_NUMBER() OVER
(PARTITION BY pattern_hash ORDER BY created_at DESC) to pick the
latest/highest-weight row) and then join that representative row to the
aggregates (COUNT(DISTINCT session_id), SUM(severity_weight), MIN(created_at),
MAX(created_at), GROUP_CONCAT(...)) so category and representative_text come
from the chosen representative (reference the query built where rows =
conn.execute(...) against correction_patterns).
In `@tests/test_core_behavioral.py`:
- Around line 16-17: The tests currently allow regressions because both
assertions are conditional on lessons_path.exists(); update tests in
tests/test_core_behavioral.py to assert lessons_path.exists() (i.e., that
lessons.md was created) before checking contents, and add a new test case that
patches gradata.enhancements.behavioral_extractor.extract_instruction to raise
an exception (use side_effect=Exception(...)) to simulate the _core.py failure
mode, then run the flow and assert the legacy description (the content written
by the legacy extractor) is persisted into lessons.md; reference the mocked
function name extract_instruction and the file/variable lessons_path/lessons.md
to locate where to add these checks.
In `@tests/test_pipeline_e2e.py`:
- Around line 67-70: The test is forcing every simulated correction to "major"
by using result.get("outcome", "major") when building session_corrections for
end_session(); instead, propagate the real severity from the correction itself
(or fall back to result/outcome if needed). Update the session_corrections
construction in tests/test_pipeline_e2e.py to set "severity" using
correction.get("severity", result.get("outcome", "major")) (or
correction["severity"] if you expect it always present), so the value returned
by brain.correct() and/or the correction entry drives severity-aware graduation
logic in end_session().
---
Outside diff comments:
In `@src/gradata/_core.py`:
- Around line 232-253: The try/except currently swallows exceptions from
extract_instruction and directly falls back to primary.description, skipping the
legacy extractor; update _core.py so that when extract_instruction raises you
still attempt the legacy path: in the except block (or by restructuring the try)
call extract_behavioral_instruction(...) using the same cache initialization
logic around InstructionCache and brain._instruction_cache, assign its result to
behavioral_desc, and then set desc = behavioral_desc or primary.description
(same as the success path) so that legacy extraction runs on both falsy returns
and exceptions from extract_instruction.
---
Duplicate comments:
In `@tests/test_pipeline_e2e.py`:
- Around line 23-24: The test currently falls back to a hardcoded developer path
via the _cloud_path variable and unconditionally calls sys.path.insert(0,
_cloud_path); remove the hardcoded fallback and only prepend to sys.path when
the GRADATA_CLOUD_PATH environment variable is set (i.e., check
os.environ.get("GRADATA_CLOUD_PATH") and if it's truthy call sys.path.insert(0,
path)); ensure no other hardcoded paths remain and rely on normal package
imports when the env var is absent.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 36c6accd-8e32-4391-8f3a-6b8b3129e802
📒 Files selected for processing (6)
src/gradata/_core.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/enhancements/meta_rules_storage.pytests/test_convergence_gate.pytests/test_core_behavioral.pytests/test_pipeline_e2e.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_convergence_gate.pytests/test_core_behavioral.pytests/test_pipeline_e2e.py
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/_core.pysrc/gradata/enhancements/meta_rules_storage.pysrc/gradata/enhancements/behavioral_extractor.py
🪛 Ruff (0.15.9)
tests/test_convergence_gate.py
[warning] 48-49: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
tests/test_pipeline_e2e.py
[error] 93-93: Ambiguous variable name: l
(E741)
[error] 220-220: Ambiguous variable name: l
(E741)
src/gradata/enhancements/behavioral_extractor.py
[warning] 54-54: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
[warning] 168-168: Too many return statements (14 > 6)
(PLR0911)
[warning] 168-168: Too many branches (15 > 12)
(PLR0912)
[warning] 196-196: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 239-239: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 278-278: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 314-314: Too many return statements (19 > 6)
(PLR0911)
[warning] 314-314: Too many branches (18 > 12)
(PLR0912)
[warning] 314-314: Unused function argument: category
(ARG001)
[error] 424-425: try-except-pass detected, consider logging the exception
(S110)
[warning] 424-424: Do not catch blind exception: Exception
(BLE001)
Use getattr() instead of direct attribute access for spawn_queue to satisfy Pyright's type narrowing after hasattr() check. Co-Authored-By: Gradata <noreply@gradata.ai>
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 94 file(s) based on 4 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 94 file(s) based on 4 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gradata/_core.py (1)
232-255:⚠️ Potential issue | 🟠 MajorKeep the legacy extractor as the fallback when the new extractor throws.
Right now the broad
trywraps both branches. Ifextract_instruction()raises, this code drops straight toprimary.descriptionand never callsextract_behavioral_instruction(), so the explicit fallback path disappears on exceptions.Proposed fix
- try: - from gradata.enhancements.behavioral_extractor import ( - extract_instruction, - ) - behavioral_desc = extract_instruction( - draft, final, primary, category=cat, - ) - if not behavioral_desc: - # Fallback to keyword templates - from gradata.enhancements.edit_classifier import ( - extract_behavioral_instruction, - ) - from gradata.enhancements.instruction_cache import InstructionCache - if not isinstance(brain._instruction_cache, InstructionCache): - brain._instruction_cache = InstructionCache( - lessons_path.parent / "instruction_cache.json" - ) - behavioral_desc = extract_behavioral_instruction( - diff, primary, cache=brain._instruction_cache, # type: ignore[arg-type] - ) - desc = behavioral_desc or primary.description - except Exception as e: - _log.debug("Behavioral extraction failed: %s", e) - desc = primary.description + behavioral_desc = None + try: + from gradata.enhancements.behavioral_extractor import ( + extract_instruction, + ) + behavioral_desc = extract_instruction( + draft, final, primary, category=cat, + ) + except Exception as e: + _log.debug("Primary behavioral extraction failed: %s", e) + + if not behavioral_desc: + try: + from gradata.enhancements.edit_classifier import ( + extract_behavioral_instruction, + ) + from gradata.enhancements.instruction_cache import InstructionCache + if not isinstance(brain._instruction_cache, InstructionCache): + brain._instruction_cache = InstructionCache( + lessons_path.parent / "instruction_cache.json" + ) + behavioral_desc = extract_behavioral_instruction( + diff, primary, cache=brain._instruction_cache, # type: ignore[arg-type] + ) + except Exception as e: + _log.debug("Legacy behavioral extraction failed: %s", e) + + desc = behavioral_desc or primary.description🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_core.py` around lines 232 - 255, The try currently wraps both the new extractor and the fallback, so if extract_instruction(draft, final, primary, category=cat) raises you skip the fallback; change the logic so only the call to extract_instruction is wrapped in a try/except: call extract_instruction inside a small try block (catch Exception as e and log the failure), and if it returns falsy or raises, then proceed to import and call extract_behavioral_instruction(diff, primary, cache=brain._instruction_cache) (creating InstructionCache into brain._instruction_cache from lessons_path.parent / "instruction_cache.json" if needed) to produce behavioral_desc; finally set desc = behavioral_desc or primary.description.
♻️ Duplicate comments (5)
src/gradata/enhancements/meta_rules_storage.py (2)
345-486:⚠️ Potential issue | 🟠 MajorThe new correction-pattern pipeline is still not reachable from production flow.
These helpers are implemented here, but the provided
src/gradata/_core.pystill only countsextract_patterns(...)results and never callsupsert_correction_patterns_batch(...), andbrain_end_session()still does not callquery_graduation_candidates(). As shipped,correction_patternsnever influences graduation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/meta_rules_storage.py` around lines 345 - 486, The production flow never persists or uses correction patterns: ensure extract_patterns results are upserted and graduation queries are invoked by wiring the new helpers into the core session lifecycle—call upsert_correction_patterns_batch (or upsert_correction_pattern for single items) from the code path that currently handles extract_patterns results in src/gradata/_core.py so patterns are saved per session, and modify brain_end_session to call query_graduation_candidates (with appropriate min_sessions/min_score) and feed any returned candidates into the existing graduation path so correction_patterns influence meta-rule graduation.
445-486:⚠️ Potential issue | 🟡 MinorAlways close the SQLite connection on query errors.
conn.close()only runs afterfetchall()succeeds. If the SELECT fails, this connection leaks.Proposed fix
def query_graduation_candidates( db_path: str | Path, min_sessions: int = 2, min_score: float = 3.0, ) -> list[dict]: @@ """ conn = sqlite3.connect(str(db_path)) - conn.row_factory = sqlite3.Row - rows = conn.execute( - """WITH representative AS ( - SELECT - pattern_hash, - category, - representative_text, - ROW_NUMBER() OVER (PARTITION BY pattern_hash ORDER BY created_at DESC) AS rn - FROM correction_patterns - ), - aggregates AS ( - SELECT - pattern_hash, - COUNT(DISTINCT session_id) AS distinct_sessions, - SUM(severity_weight) AS weighted_score, - MIN(created_at) AS first_seen, - MAX(created_at) AS last_seen, - GROUP_CONCAT(DISTINCT session_id) AS session_ids - FROM correction_patterns - GROUP BY pattern_hash - HAVING COUNT(DISTINCT session_id) >= ? - AND SUM(severity_weight) >= ? - ) - SELECT - r.pattern_hash, - r.category, - r.representative_text, - a.distinct_sessions, - a.weighted_score, - a.first_seen, - a.last_seen, - a.session_ids - FROM representative r - INNER JOIN aggregates a ON r.pattern_hash = a.pattern_hash - WHERE r.rn = 1 - ORDER BY a.weighted_score DESC - """, - (min_sessions, min_score), - ).fetchall() - conn.close() - return [dict(r) for r in rows] + try: + conn.row_factory = sqlite3.Row + rows = conn.execute( + """WITH representative AS ( + SELECT + pattern_hash, + category, + representative_text, + ROW_NUMBER() OVER (PARTITION BY pattern_hash ORDER BY created_at DESC) AS rn + FROM correction_patterns + ), + aggregates AS ( + SELECT + pattern_hash, + COUNT(DISTINCT session_id) AS distinct_sessions, + SUM(severity_weight) AS weighted_score, + MIN(created_at) AS first_seen, + MAX(created_at) AS last_seen, + GROUP_CONCAT(DISTINCT session_id) AS session_ids + FROM correction_patterns + GROUP BY pattern_hash + HAVING COUNT(DISTINCT session_id) >= ? + AND SUM(severity_weight) >= ? + ) + SELECT + r.pattern_hash, + r.category, + r.representative_text, + a.distinct_sessions, + a.weighted_score, + a.first_seen, + a.last_seen, + a.session_ids + FROM representative r + INNER JOIN aggregates a ON r.pattern_hash = a.pattern_hash + WHERE r.rn = 1 + ORDER BY a.weighted_score DESC + """, + (min_sessions, min_score), + ).fetchall() + return [dict(r) for r in rows] + finally: + conn.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/meta_rules_storage.py` around lines 445 - 486, The DB connection created with sqlite3.connect(...) and closed via conn.close() can leak if conn.execute(...) or fetchall() raises; wrap the usage of conn (the sqlite3.Connection) in a try/finally or use the context manager form (with sqlite3.connect(str(db_path)) as conn:) so that conn.close() always runs, ensuring the query block that calls conn.execute(...).fetchall() and the subsequent return [dict(r) for r in rows] still work but the connection is closed on errors; update the code around the connect/execute/fetchall/conn.close calls accordingly.tests/test_core_behavioral.py (1)
34-44:⚠️ Potential issue | 🟠 MajorThese tests still don't prove the legacy fallback ran.
"INSTINCT" in content or "PATTERN" in contentandlen(content) > 0will pass even when_core.pyskips the legacy extractor and persistsprimary.description. Patchgradata.enhancements.edit_classifier.extract_behavioral_instructionto return a sentinel string and assert that exact string is written in both theNoneand exception branches.As per coding guidelines, "tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness".
Proposed fix
def test_correct_falls_back_to_old_description(): @@ - with patch( - "gradata.enhancements.behavioral_extractor.extract_instruction", - return_value=None, - ): + with patch( + "gradata.enhancements.behavioral_extractor.extract_instruction", + return_value=None, + ), patch( + "gradata.enhancements.edit_classifier.extract_behavioral_instruction", + return_value="LEGACY SENTINEL", + ): brain.correct( draft="Dear Sir, We are pleased.", final="Hey, here's the deal.", ) @@ - assert "INSTINCT" in content or "PATTERN" in content + assert "LEGACY SENTINEL" in content @@ def test_correct_persists_legacy_on_extractor_failure(): @@ - with patch( - "gradata.enhancements.behavioral_extractor.extract_instruction", - side_effect=Exception("Simulated extraction failure"), - ): + with patch( + "gradata.enhancements.behavioral_extractor.extract_instruction", + side_effect=Exception("Simulated extraction failure"), + ), patch( + "gradata.enhancements.edit_classifier.extract_behavioral_instruction", + return_value="LEGACY SENTINEL", + ): brain.correct( draft="Dear Sir, We are pleased to inform you.", final="Hey, here's what we decided.", ) @@ - assert "INSTINCT" in content or "PATTERN" in content or len(content) > 0 + assert "LEGACY SENTINEL" in contentAlso applies to: 52-63
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_core_behavioral.py` around lines 34 - 44, Update the test to verify the legacy fallback by making gradata.enhancements.edit_classifier.extract_behavioral_instruction return a unique sentinel string (e.g., "LEGACY_SENTINEL") and then assert that this exact sentinel is written to lessons.md in both the None and exception branches; specifically patch/mock extract_behavioral_instruction to return the sentinel and replace the loose checks ("INSTINCT" or "PATTERN" / len>0) with an exact equality/assertion against the sentinel, ensuring coverage for paths where _core.py falls back to persisting primary.description.src/gradata/enhancements/behavioral_extractor.py (1)
316-317: 🧹 Nitpick | 🔵 TrivialDrop the unused
categoryparameter fromgenerate_instruction().
categoryis still threaded through the API but never read, which makes the generator look category-aware when it is not.Proposed fix
-def generate_instruction(match: ArchetypeMatch, category: str = "") -> str: +def generate_instruction(match: ArchetypeMatch) -> str: @@ - instruction = generate_instruction(match, category) + instruction = generate_instruction(match)Also applies to: 479-479
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/behavioral_extractor.py` around lines 316 - 317, The function generate_instruction currently accepts an unused category parameter; remove this parameter from its signature (def generate_instruction(match: ArchetypeMatch) -> str) and from any other duplicate definition or overload (the other occurrence at the second generate_instruction), and update all call sites to stop passing category so the API no longer falsely suggests category-awareness; ensure any documentation/type hints referencing generate_instruction's category argument are cleaned up as well.tests/test_pipeline_e2e.py (1)
20-25:⚠️ Potential issue | 🟠 MajorRemove the machine-specific cloud path from the test bootstrap.
Falling back to
C:/Users/olive/...makes this suite environment-specific. Only prependGRADATA_CLOUD_PATHwhen it is explicitly set, otherwise rely on the package import fallback.As per coding guidelines, "tests/**: Test files. Verify: no hardcoded paths".
Proposed fix
# Try cloud-only override first (real discovery), fall back to SDK stubs _CLOUD_DISCOVERY = False +_cloud_path = os.environ.get("GRADATA_CLOUD_PATH") +if _cloud_path: + sys.path.insert(0, _cloud_path) try: - _cloud_path = os.environ.get("GRADATA_CLOUD_PATH", "C:/Users/olive/SpritesWork/brain/cloud-only") - sys.path.insert(0, _cloud_path) from meta_rules import discover_meta_rules, merge_into_meta # type: ignore[import] _CLOUD_DISCOVERY = True except ImportError: from gradata.enhancements.meta_rules import discover_meta_rules🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pipeline_e2e.py` around lines 20 - 25, The test bootstrap currently hardcodes a machine-specific fallback path in _cloud_path and always prepends it to sys.path; change this so GRADATA_CLOUD_PATH is used only when explicitly set: if os.environ.get("GRADATA_CLOUD_PATH") is truthy, set _cloud_path to that value and insert it into sys.path before attempting from meta_rules import discover_meta_rules, merge_into_meta, otherwise do not modify sys.path and let the normal package import fallback occur; keep the existing _CLOUD_DISCOVERY flag and import statement names unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/_context_compile.py`:
- Line 35: The current re.split call (parts = re.split(r"\s*[—\-\-]\s*", stem,
maxsplit=1)) uses a character class and incorrectly splits on single hyphens;
update the regex to match the whole separator (em dash, double hyphen, or a
hyphen used as a separator with surrounding spaces) so names with internal
hyphens aren't split: replace the pattern with something like
r"(?:\s*—\s*|\s*--\s*|\s+-\s+)" in the re.split call (referencing the same parts
variable and the re.split invocation).
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 211-214: The current substring checks on _CONSTRAINT_WORDS falsely
match parts of words; change the detection to use regex word-boundary searches
instead: for each word in _CONSTRAINT_WORDS use re.search(r'\b' +
re.escape(word) + r'\b', final, flags=re.IGNORECASE) and ensure the same word is
NOT found in draft using the same word-boundary search before adding to
new_constraints; keep the subsequent call to
_find_sentence_containing(new_match, ...) the same (use the matched constraint
token), and add an import for the re module if missing.
In `@tests/test_pipeline_e2e.py`:
- Around line 95-96: The test is asserting the wrong set of severity labels;
update the assertion that checks the variable severity (computed as
result.get("outcome") or result.get("data", {}).get("severity")) to use the
SDK's real labels: "as-is", "minor", "moderate", "major", and "discarded"
instead of the current ("trivial","minor","moderate","major","rewrite"); ensure
the assertion accepts only that exact set.
---
Outside diff comments:
In `@src/gradata/_core.py`:
- Around line 232-255: The try currently wraps both the new extractor and the
fallback, so if extract_instruction(draft, final, primary, category=cat) raises
you skip the fallback; change the logic so only the call to extract_instruction
is wrapped in a try/except: call extract_instruction inside a small try block
(catch Exception as e and log the failure), and if it returns falsy or raises,
then proceed to import and call extract_behavioral_instruction(diff, primary,
cache=brain._instruction_cache) (creating InstructionCache into
brain._instruction_cache from lessons_path.parent / "instruction_cache.json" if
needed) to produce behavioral_desc; finally set desc = behavioral_desc or
primary.description.
---
Duplicate comments:
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 316-317: The function generate_instruction currently accepts an
unused category parameter; remove this parameter from its signature (def
generate_instruction(match: ArchetypeMatch) -> str) and from any other duplicate
definition or overload (the other occurrence at the second
generate_instruction), and update all call sites to stop passing category so the
API no longer falsely suggests category-awareness; ensure any documentation/type
hints referencing generate_instruction's category argument are cleaned up as
well.
In `@src/gradata/enhancements/meta_rules_storage.py`:
- Around line 345-486: The production flow never persists or uses correction
patterns: ensure extract_patterns results are upserted and graduation queries
are invoked by wiring the new helpers into the core session lifecycle—call
upsert_correction_patterns_batch (or upsert_correction_pattern for single items)
from the code path that currently handles extract_patterns results in
src/gradata/_core.py so patterns are saved per session, and modify
brain_end_session to call query_graduation_candidates (with appropriate
min_sessions/min_score) and feed any returned candidates into the existing
graduation path so correction_patterns influence meta-rule graduation.
- Around line 445-486: The DB connection created with sqlite3.connect(...) and
closed via conn.close() can leak if conn.execute(...) or fetchall() raises; wrap
the usage of conn (the sqlite3.Connection) in a try/finally or use the context
manager form (with sqlite3.connect(str(db_path)) as conn:) so that conn.close()
always runs, ensuring the query block that calls conn.execute(...).fetchall()
and the subsequent return [dict(r) for r in rows] still work but the connection
is closed on errors; update the code around the
connect/execute/fetchall/conn.close calls accordingly.
In `@tests/test_core_behavioral.py`:
- Around line 34-44: Update the test to verify the legacy fallback by making
gradata.enhancements.edit_classifier.extract_behavioral_instruction return a
unique sentinel string (e.g., "LEGACY_SENTINEL") and then assert that this exact
sentinel is written to lessons.md in both the None and exception branches;
specifically patch/mock extract_behavioral_instruction to return the sentinel
and replace the loose checks ("INSTINCT" or "PATTERN" / len>0) with an exact
equality/assertion against the sentinel, ensuring coverage for paths where
_core.py falls back to persisting primary.description.
In `@tests/test_pipeline_e2e.py`:
- Around line 20-25: The test bootstrap currently hardcodes a machine-specific
fallback path in _cloud_path and always prepends it to sys.path; change this so
GRADATA_CLOUD_PATH is used only when explicitly set: if
os.environ.get("GRADATA_CLOUD_PATH") is truthy, set _cloud_path to that value
and insert it into sys.path before attempting from meta_rules import
discover_meta_rules, merge_into_meta, otherwise do not modify sys.path and let
the normal package import fallback occur; keep the existing _CLOUD_DISCOVERY
flag and import statement names unchanged.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3a290ff3-aba0-4383-b14b-d129664a0a91
📒 Files selected for processing (6)
src/gradata/_context_compile.pysrc/gradata/_core.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/enhancements/meta_rules_storage.pytests/test_core_behavioral.pytests/test_pipeline_e2e.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/_context_compile.pysrc/gradata/enhancements/meta_rules_storage.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/_core.py
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_core_behavioral.pytests/test_pipeline_e2e.py
🪛 Ruff (0.15.9)
src/gradata/enhancements/behavioral_extractor.py
[warning] 56-56: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
[warning] 170-170: Too many return statements (14 > 6)
(PLR0911)
[warning] 170-170: Too many branches (15 > 12)
(PLR0912)
[warning] 198-198: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 241-241: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 280-280: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 316-316: Too many return statements (19 > 6)
(PLR0911)
[warning] 316-316: Too many branches (18 > 12)
(PLR0912)
[warning] 316-316: Unused function argument: category
(ARG001)
[warning] 438-438: Do not catch blind exception: Exception
(BLE001)
tests/test_pipeline_e2e.py
[error] 102-102: Ambiguous variable name: l
(E741)
[error] 229-229: Ambiguous variable name: l
(E741)
src/gradata/_core.py
[warning] 105-105: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 498-498: Missing return type annotation for private function _lesson_key
Add return type annotation: str
(ANN202)
[warning] 536-536: Do not catch blind exception: Exception
(BLE001)
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 3 file(s) based on 3 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 3 file(s) based on 3 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
src/gradata/_context_compile.py (1)
35-35:⚠️ Potential issue | 🟠 MajorReintroduce en-dash support in the separator regex.
This pattern dropped
–(en dash), so stems likeJane Doe – Acmewon’t split anymore, which regresses extraction and diverges fromsrc/gradata/_tag_taxonomy.pyandsrc/gradata/_fact_extractor.py.Proposed fix
- parts = re.split(r"(?:\s*—\s*|\s*--\s*|\s+-\s+)", stem, maxsplit=1) + parts = re.split(r"(?:\s*[—–]\s*|\s*--\s*|\s+-\s+)", stem, maxsplit=1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_context_compile.py` at line 35, The separator regex used in the re.split call that assigns parts (re.split(..., stem, maxsplit=1)) no longer matches the en-dash (–), breaking splits like "Jane Doe – Acme"; update the pattern used in the parts = re.split(...) expression to include the en-dash (either as the literal character or \u2013) alongside the existing em-dash/hyphen variants so it matches `—`, `–`, `--`, and ` - `, keeping behavior consistent with src/gradata/_tag_taxonomy.py and src/gradata/_fact_extractor.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 53-57: The comment on the ArchetypeMatch dataclass for the
confidence field uses an EN DASH; update the comment in the ArchetypeMatch
definition (the confidence: float comment) to replace the EN DASH (`–`) with a
standard hyphen-minus (`-`) so it reads e.g. `# 0.0-1.0`, ensuring editors/tools
won't misinterpret the character.
- Around line 473-474: The docstring for the llm_provider is incorrect: update
the description to match the actual interface used by _try_llm_extract — instead
of extract(draft, final, classification) -> str, document that llm_provider
exposes a complete(prompt: str, max_tokens: int = ..., timeout: int = ...) ->
str (or similar synchronous completion API that accepts a prompt and optional
parameters and returns the model text); reference llm_provider and the
_try_llm_extract function in the behavioral_extractor module and ensure the
docstring mentions the expected parameters (prompt, max_tokens, timeout) and
return type (str).
- Around line 196-199: Zip calls pairing draft_sent_sets and final_sent_sets
(and the other zip usages involving draft_sents/final_sents) are currently
unchecked; change the zip(...) usages to zip(..., strict=True) to force length
equality at runtime. Update the list comprehension that builds added_sents
(using final_sents, final_sent_sets, draft_sent_sets and _sentence_overlap) and
the other places where zip is used (the other pairings referenced in this file)
to include strict=True so mismatched lengths raise immediately.
In `@tests/test_pipeline_e2e.py`:
- Around line 209-211: The test currently asserts a float equality on
m.context_weights["sales"] using ==; change this to use pytest.approx by
replacing the direct equality with assert m.context_weights["sales"] ==
pytest.approx(1.5) and ensure pytest is imported in the test module (add import
pytest at top if missing) so floating-point comparisons follow the project's
guideline.
- Around line 101-103: The list comprehension uses an ambiguous loop variable
name `l`; rename it to a descriptive name like `lesson` in the comprehension
that builds `process_lessons` (where you call fresh_brain._load_lessons()) so it
becomes [lesson for lesson in lessons if lesson.category == "PROCESS"], and
update any references to the loop variable (the comprehension itself and nearby
code/comments) to use `lesson` for clarity.
- Around line 228-236: The loop uses an ambiguous single-letter variable "l";
rename it to "lesson" throughout the loop that builds promoted (the for loop
iterating over lessons) and update all references where Lesson(...),
lesson.category, lesson.date, lesson.description are used so the logic is
unchanged (keep the condition checking lesson.category == "PROCESS" and
constructing Lesson(..., state=LessonState.RULE, confidence=0.90) for promoted).
Ensure the variable name change is applied consistently in that block to avoid
shadowing or unresolved names.
- Around line 251-258: The test test_same_correction_twice_same_session
currently only asserts r1 and r2 are not None; update it to assert concrete
expected values from fresh_brain.correct (inspect the return shape) — for
example assert r1 contains an identifier (e.g., "id" in r1) and that r1["id"] ==
r2["id"] if deduplication should return the same record, or assert r1["created"]
is True and r2["created"] is False (or assert a "deduplicated" flag) to
explicitly verify deduplication behavior when calling fresh_brain.correct with
the same SALES_CORRECTIONS[0] and session=95.
---
Duplicate comments:
In `@src/gradata/_context_compile.py`:
- Line 35: The separator regex used in the re.split call that assigns parts
(re.split(..., stem, maxsplit=1)) no longer matches the en-dash (–), breaking
splits like "Jane Doe – Acme"; update the pattern used in the parts =
re.split(...) expression to include the en-dash (either as the literal character
or \u2013) alongside the existing em-dash/hyphen variants so it matches `—`,
`–`, `--`, and ` - `, keeping behavior consistent with
src/gradata/_tag_taxonomy.py and src/gradata/_fact_extractor.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0075571e-9add-451b-a01f-9d2832ee5e6f
📒 Files selected for processing (3)
src/gradata/_context_compile.pysrc/gradata/enhancements/behavioral_extractor.pytests/test_pipeline_e2e.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/_context_compile.pysrc/gradata/enhancements/behavioral_extractor.py
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_pipeline_e2e.py
🪛 Ruff (0.15.9)
tests/test_pipeline_e2e.py
[error] 102-102: Ambiguous variable name: l
(E741)
[error] 229-229: Ambiguous variable name: l
(E741)
src/gradata/enhancements/behavioral_extractor.py
[warning] 56-56: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
[warning] 170-170: Too many return statements (14 > 6)
(PLR0911)
[warning] 170-170: Too many branches (15 > 12)
(PLR0912)
[warning] 198-198: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 242-242: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 281-281: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 317-317: Too many return statements (19 > 6)
(PLR0911)
[warning] 317-317: Too many branches (18 > 12)
(PLR0912)
[warning] 317-317: Unused function argument: category
(ARG001)
[warning] 439-439: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (8)
src/gradata/enhancements/behavioral_extractor.py (5)
317-317: Unusedcategoryparameter.The
categoryparameter is declared but never used in the function body. This was flagged in a previous review.
1-31: LGTM - Module setup and imports.Proper use of
from __future__ import annotations, logging setup, and TYPE_CHECKING guard for the EditClassification import.
64-96: LGTM - Detection vocabulary constants.The frozensets and compiled regex patterns are well-organized and appropriate for the detection logic.
103-164: LGTM - Sentence-level helper functions.Clean, pure functions with clear purposes. The
_sentence_overlapcorrectly returns Jaccard index in [0,1], and_to_imperativehandles a good variety of prefix patterns.
419-445: LGTM - LLM extraction with proper error handling.The function now correctly uses
llm_provider.complete()with appropriate parameters and logs failures at warning level. The broad exception catch is acceptable here since we want to gracefully handle any LLM provider failure.tests/test_pipeline_e2e.py (3)
22-28: Hardcoded Windows path still present.Line 23 contains a hardcoded Windows-specific path
C:/Users/olive/SpritesWork/brain/cloud-only. This was flagged in a previous review but remains unchanged. The fallback import works, but the hardcoded path will cause issues in CI/other environments.
45-84: LGTM - Test fixtures and helper function.
SALES_CORRECTIONSprovides good test data, and_simulate_sessionproperly extracts severity with a robust fallback chain matching the SDK's return structure.
282-304: LGTM - Pattern tracking test with specific assertions.Good test coverage for the correction pattern tracking functionality. Assertions check specific values (
pattern_hash,distinct_sessions >= 3,weighted_score >= 3.0).
| @dataclass | ||
| class ArchetypeMatch: | ||
| archetype: Archetype | ||
| confidence: float # 0.0–1.0 | ||
| context: dict # archetype-specific extracted data |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Minor: EN DASH in comment.
Line 56 uses an EN DASH (–) instead of a hyphen-minus (-) in the comment # 0.0–1.0. While functionally harmless, it can cause issues with some tools and editors.
Proposed fix
`@dataclass`
class ArchetypeMatch:
archetype: Archetype
- confidence: float # 0.0–1.0
+ confidence: float # 0.0-1.0
context: dict # archetype-specific extracted data🧰 Tools
🪛 Ruff (0.15.9)
[warning] 56-56: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/behavioral_extractor.py` around lines 53 - 57, The
comment on the ArchetypeMatch dataclass for the confidence field uses an EN
DASH; update the comment in the ArchetypeMatch definition (the confidence: float
comment) to replace the EN DASH (`–`) with a standard hyphen-minus (`-`) so it
reads e.g. `# 0.0-1.0`, ensuring editors/tools won't misinterpret the character.
| draft_sent_sets = [set(s.lower().split()) for s in draft_sents] | ||
| final_sent_sets = [set(s.lower().split()) for s in final_sents] | ||
| added_sents = [s for s, ws in zip(final_sents, final_sent_sets) | ||
| if not any(_sentence_overlap(ws, ds) > 0.5 for ds in draft_sent_sets)] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding strict=True to zip() calls for defensive coding.
Lines 198, 242, and 281 use zip() without the strict= parameter. While the lists are constructed together and guaranteed to have the same length, adding strict=True (Python 3.10+) would catch unexpected mismatches during future refactoring.
Example for line 198
- added_sents = [s for s, ws in zip(final_sents, final_sent_sets)
+ added_sents = [s for s, ws in zip(final_sents, final_sent_sets, strict=True)
if not any(_sentence_overlap(ws, ds) > 0.5 for ds in draft_sent_sets)]🧰 Tools
🪛 Ruff (0.15.9)
[warning] 198-198: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/behavioral_extractor.py` around lines 196 - 199, Zip
calls pairing draft_sent_sets and final_sent_sets (and the other zip usages
involving draft_sents/final_sents) are currently unchecked; change the zip(...)
usages to zip(..., strict=True) to force length equality at runtime. Update the
list comprehension that builds added_sents (using final_sents, final_sent_sets,
draft_sent_sets and _sentence_overlap) and the other places where zip is used
(the other pairings referenced in this file) to include strict=True so
mismatched lengths raise immediately.
| llm_provider: Optional LLM provider for refinement of low-confidence matches. | ||
| Interface: llm_provider.extract(draft, final, classification) -> str |
There was a problem hiding this comment.
Docstring/implementation mismatch for llm_provider interface.
The docstring states:
llm_provider: Optional LLM provider for refinement of low-confidence matches.
Interface: llm_provider.extract(draft, final, classification) -> str
But the implementation in _try_llm_extract (line 435) calls llm_provider.complete(prompt, max_tokens=100, timeout=10). Update the docstring to reflect the actual interface.
Proposed fix
category: Correction category (DRAFTING, PROCESS, etc.)
- llm_provider: Optional LLM provider for refinement of low-confidence matches.
- Interface: llm_provider.extract(draft, final, classification) -> str
+ llm_provider: Optional LLM provider for refinement of low-confidence matches.
+ Interface: llm_provider.complete(prompt, *, max_tokens, timeout) -> str📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| llm_provider: Optional LLM provider for refinement of low-confidence matches. | |
| Interface: llm_provider.extract(draft, final, classification) -> str | |
| llm_provider: Optional LLM provider for refinement of low-confidence matches. | |
| Interface: llm_provider.complete(prompt, *, max_tokens, timeout) -> str |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/behavioral_extractor.py` around lines 473 - 474, The
docstring for the llm_provider is incorrect: update the description to match the
actual interface used by _try_llm_extract — instead of extract(draft, final,
classification) -> str, document that llm_provider exposes a complete(prompt:
str, max_tokens: int = ..., timeout: int = ...) -> str (or similar synchronous
completion API that accepts a prompt and optional parameters and returns the
model text); reference llm_provider and the _try_llm_extract function in the
behavioral_extractor module and ensure the docstring mentions the expected
parameters (prompt, max_tokens, timeout) and return type (str).
| lessons = fresh_brain._load_lessons() | ||
| process_lessons = [l for l in lessons if l.category == "PROCESS"] | ||
| assert len(process_lessons) > 0, "Should have PROCESS lessons after 3 corrections" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Rename ambiguous variable l to lesson.
The variable name l is ambiguous (easily confused with 1 or I). Use a descriptive name like lesson for clarity.
Proposed fix
lessons = fresh_brain._load_lessons()
- process_lessons = [l for l in lessons if l.category == "PROCESS"]
+ process_lessons = [lesson for lesson in lessons if lesson.category == "PROCESS"]
assert len(process_lessons) > 0, "Should have PROCESS lessons after 3 corrections"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| lessons = fresh_brain._load_lessons() | |
| process_lessons = [l for l in lessons if l.category == "PROCESS"] | |
| assert len(process_lessons) > 0, "Should have PROCESS lessons after 3 corrections" | |
| lessons = fresh_brain._load_lessons() | |
| process_lessons = [lesson for lesson in lessons if lesson.category == "PROCESS"] | |
| assert len(process_lessons) > 0, "Should have PROCESS lessons after 3 corrections" |
🧰 Tools
🪛 Ruff (0.15.9)
[error] 102-102: Ambiguous variable name: l
(E741)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pipeline_e2e.py` around lines 101 - 103, The list comprehension
uses an ambiguous loop variable name `l`; rename it to a descriptive name like
`lesson` in the comprehension that builds `process_lessons` (where you call
fresh_brain._load_lessons()) so it becomes [lesson for lesson in lessons if
lesson.category == "PROCESS"], and update any references to the loop variable
(the comprehension itself and nearby code/comments) to use `lesson` for clarity.
| assert m.applies_when == ["task_type=sales", "session_type=sales"] | ||
| assert m.never_when == ["task_type=system"] | ||
| assert m.context_weights["sales"] == 1.5 |
There was a problem hiding this comment.
Use pytest.approx for float comparison.
Line 211 compares a float value directly with ==. Per coding guidelines, floating-point comparisons should use pytest.approx to handle potential precision issues from serialization/deserialization.
Proposed fix
assert m.applies_when == ["task_type=sales", "session_type=sales"]
assert m.never_when == ["task_type=system"]
- assert m.context_weights["sales"] == 1.5
+ assert m.context_weights["sales"] == pytest.approx(1.5)As per coding guidelines: "tests/**: floating point comparisons use pytest.approx"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pipeline_e2e.py` around lines 209 - 211, The test currently
asserts a float equality on m.context_weights["sales"] using ==; change this to
use pytest.approx by replacing the direct equality with assert
m.context_weights["sales"] == pytest.approx(1.5) and ensure pytest is imported
in the test module (add import pytest at top if missing) so floating-point
comparisons follow the project's guideline.
| promoted = [] | ||
| for l in lessons: | ||
| if l.category == "PROCESS": | ||
| promoted.append(Lesson( | ||
| date=l.date, state=LessonState.RULE, confidence=0.90, | ||
| category=l.category, description=l.description, | ||
| )) | ||
| else: | ||
| promoted.append(l) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Rename ambiguous variable l to lesson.
Same issue as line 102 - the variable name l should be renamed for clarity.
Proposed fix
# Promote lessons to RULE (simulating what graduation does over many sessions)
promoted = []
- for l in lessons:
- if l.category == "PROCESS":
+ for lesson in lessons:
+ if lesson.category == "PROCESS":
promoted.append(Lesson(
- date=l.date, state=LessonState.RULE, confidence=0.90,
- category=l.category, description=l.description,
+ date=lesson.date, state=LessonState.RULE, confidence=0.90,
+ category=lesson.category, description=lesson.description,
))
else:
- promoted.append(l)
+ promoted.append(lesson)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| promoted = [] | |
| for l in lessons: | |
| if l.category == "PROCESS": | |
| promoted.append(Lesson( | |
| date=l.date, state=LessonState.RULE, confidence=0.90, | |
| category=l.category, description=l.description, | |
| )) | |
| else: | |
| promoted.append(l) | |
| promoted = [] | |
| for lesson in lessons: | |
| if lesson.category == "PROCESS": | |
| promoted.append(Lesson( | |
| date=lesson.date, state=LessonState.RULE, confidence=0.90, | |
| category=lesson.category, description=lesson.description, | |
| )) | |
| else: | |
| promoted.append(lesson) |
🧰 Tools
🪛 Ruff (0.15.9)
[error] 229-229: Ambiguous variable name: l
(E741)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pipeline_e2e.py` around lines 228 - 236, The loop uses an
ambiguous single-letter variable "l"; rename it to "lesson" throughout the loop
that builds promoted (the for loop iterating over lessons) and update all
references where Lesson(...), lesson.category, lesson.date, lesson.description
are used so the logic is unchanged (keep the condition checking lesson.category
== "PROCESS" and constructing Lesson(..., state=LessonState.RULE,
confidence=0.90) for promoted). Ensure the variable name change is applied
consistently in that block to avoid shadowing or unresolved names.
| def test_same_correction_twice_same_session(self, fresh_brain): | ||
| corr = SALES_CORRECTIONS[0] | ||
| r1 = fresh_brain.correct(draft=corr["draft"], final=corr["final"], | ||
| category=corr["category"], session=95) | ||
| r2 = fresh_brain.correct(draft=corr["draft"], final=corr["final"], | ||
| category=corr["category"], session=95) | ||
| assert r1 is not None | ||
| assert r2 is not None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Strengthen assertions beyond truthiness checks.
The assertions only verify that results are not None. Per coding guidelines, assertions should check specific values. Consider adding assertions that verify expected deduplication behavior (e.g., both corrections succeeded, or a deduplication indicator is present).
Proposed improvement
def test_same_correction_twice_same_session(self, fresh_brain):
corr = SALES_CORRECTIONS[0]
r1 = fresh_brain.correct(draft=corr["draft"], final=corr["final"],
category=corr["category"], session=95)
r2 = fresh_brain.correct(draft=corr["draft"], final=corr["final"],
category=corr["category"], session=95)
- assert r1 is not None
- assert r2 is not None
+ assert r1 is not None, "First correction should succeed"
+ assert r2 is not None, "Duplicate correction should succeed (not crash)"
+ # Verify both have valid severity outcomes
+ assert r1.get("outcome") or r1.get("data", {}).get("severity")
+ assert r2.get("outcome") or r2.get("data", {}).get("severity")As per coding guidelines: "tests/**: assertions check specific values not just truthiness"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pipeline_e2e.py` around lines 251 - 258, The test
test_same_correction_twice_same_session currently only asserts r1 and r2 are not
None; update it to assert concrete expected values from fresh_brain.correct
(inspect the return shape) — for example assert r1 contains an identifier (e.g.,
"id" in r1) and that r1["id"] == r2["id"] if deduplication should return the
same record, or assert r1["created"] is True and r2["created"] is False (or
assert a "deduplicated" flag) to explicitly verify deduplication behavior when
calling fresh_brain.correct with the same SALES_CORRECTIONS[0] and session=95.
Merged github/main into feat/sdk-hook-port. Resolved conflicts in _core.py, behavioral_extractor.py, meta_rules_storage.py, test_pipeline_e2e.py. Kept main's meta-rule logic + feature branch's CodeRabbit fixes. 1775 tests pass. Co-Authored-By: Gradata <noreply@gradata.ai>
…y + behavioral extraction - Remove hardcoded Windows path in test_pipeline_e2e.py, use env var only (P1) - Wrap query_graduation_candidates in try/finally to prevent SQLite connection leak (P1) - Restore en-dash (U+2013) in _context_compile.py prospect name splitting regex (P1) - Fix tautological assertion in test_core_behavioral.py fallback test (Major) - Fix docstring: llm_provider.extract() -> llm_provider.complete() (Major) - Remove unused category parameter from generate_instruction() (Minor) - Downgrade _try_llm_extract exception log from warning to debug (Minor) - Use pytest.approx for float comparison in test_pipeline_e2e.py (Minor) Co-Authored-By: Gradata <noreply@gradata.ai>
Summary
What changed
behavioral_extractor.pymeta_rules_storage.py_core.pytest_pipeline_e2e.pytest_convergence_gate.pytest_core_behavioral.pyTest plan
Generated with Gradata
Greptile Summary
This PR introduces a sentence-level archetype-based behavioral extraction pipeline as the primary extraction path, replacing the previous word-diff approach. It adds 12 correction archetypes (
behavioral_extractor.py), cross-session pattern tracking with severity-weighted graduation scoring (meta_rules_storage.py), wires the new extractor into the core correction flow (_core.py), and backs everything with 12 new e2e pipeline tests.Key changes:
behavioral_extractor.py— New 497-line module implementing 12 archetypes, a template generator, quality gate, and LLM refinement hookmeta_rules_storage.py— Newcorrection_patternstable with batch upsert and graduation candidate query_core.py— Switched toextract_instruction()as the primary path, with keyword-template fallback when it returnsNone_context_compile.py— Regex separator pattern refined to(?:\\s*—\\s*|\\s*--\\s*|\\s+-\\s+)(the en-dash drop from the previous review remains unaddressed)test_pipeline_e2e.py— 12 new e2e tests, including cloud-gated@_requires_cloudtests and a pattern-tracking roundtrip testConfidence Score: 4/5
Safe to merge after fixing the llm_provider docstring mismatch; all runtime paths use the correct .complete() interface today.
The core logic is solid — archetype detection, template generation, quality gate, and the _core.py wiring are all correct. The previously flagged connection leak and hardcoded path are still open but non-blocking. The one new P1 is a docstring documenting the wrong interface method for llm_provider, which causes AttributeError for future custom provider authors — a targeted one-line fix. All 1699 tests pass.
src/gradata/enhancements/behavioral_extractor.py (docstring interface mismatch), src/gradata/_context_compile.py (still missing en-dash in separator regex, missing from future import annotations)
Vulnerabilities
No security concerns identified. The LLM prompt built in
_try_llm_extractembedsdraftandfinaltext directly, but since this is an internal SDK module and both values come from authenticated user corrections (not external untrusted input), there is no injection risk in the current usage path.Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["brain.correct(draft, final)"] --> B{Category converged?} B -- Yes --> C[Use primary.description] B -- No --> D["extract_instruction(draft, final, classification)"] D --> E[detect_archetype] E --> F[generate_instruction via template] F --> G{_is_actionable?} G -- Yes, confidence >= 0.60 --> H[Return instruction] G -- Yes, confidence < 0.60 --> I{llm_provider connected?} I -- Yes --> J[_try_llm_extract → .complete] I -- No --> H J -- Refined --> K[Return refined] J -- Failed --> H G -- No --> L{llm_provider connected?} L -- Yes --> J L -- No --> M[Generic category fallback] H --> N{behavioral_desc returned?} M --> N N -- Yes --> O[Use behavioral_desc as lesson description] N -- No --> P["Fallback: extract_behavioral_instruction (keyword templates)"] P --> Q[Store lesson] O --> QPrompt To Fix All With AI
Reviews (5): Last reviewed commit: "fix: apply CodeRabbit auto-fixes" | Re-trigger Greptile
Context used:
Rule 1: Never use print() ... (source)