Skip to content

feat: behavioral extraction + business reposition#4

Merged
Gradata merged 18 commits intomainfrom
feat/behavioral-extraction
Apr 6, 2026
Merged

feat: behavioral extraction + business reposition#4
Gradata merged 18 commits intomainfrom
feat/behavioral-extraction

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 6, 2026

Summary

  • Behavioral extraction engine: Corrections now produce actionable instructions ("Use casual, direct tone") instead of useless diff fingerprints ("Content change (added: getattr)"). Hybrid approach: cache → template → LLM fallback.
  • Business reposition: README, pyproject.toml, and marketing strategy rewritten around "AI that learns your judgment" — not "procedural memory for AI agents."
  • Session counter fix: Auto-session-note hook no longer inflates session count from subagent/autoresearch runs.

What changed

File Change
src/gradata/enhancements/instruction_cache.py New — JSON file cache for extracted instructions
src/gradata/enhancements/edit_classifier.py Added extract_behavioral_instruction() with 21 templates + LLM fallback
src/gradata/_core.py brain.correct() now uses behavioral extraction with graceful fallback
README.md Full rewrite — "AI that learns your judgment" positioning
pyproject.toml Updated description and keywords
docs/gradata-marketing-strategy.md Full rewrite — personalized intelligence positioning
.claude/hooks/stop/auto-session-note.js Skip session bump for subagents/autoresearch

Evidence

Ablation experiment (S93): +13.2% quality improvement with brain rules, driven by preference adherence (+1.5), not correctness (+0.3).

E2E verified: brain.correct() now stores "Write in a casual, direct tone — avoid formal business language" instead of "Tone casualized (formality shift: +4)".

Test plan

  • 12 new tests (instruction cache, behavioral extraction, core integration)
  • Full suite: 1351 passed, 7 skipped, 0 failures
  • E2E smoke test with real brain.correct() call

Generated with Gradata

Greptile Summary

This PR delivers the behavioral extraction engine — the core upgrade that makes Gradata produce actionable instructions ("Use casual, direct tone") instead of opaque diff fingerprints ("Content change (added: getattr)"). It also introduces brain.convergence(), auto-detects TTY interactivity in onboard(), and repositions the project marketing around "AI that learns your judgment."

Key changes:

  • instruction_cache.py — New JSON-file cache (cache → template → LLM resolution order) with a clean flush()/dirty-tracking design
  • edit_classifier.py — 21-template library + LLM fallback via extract_behavioral_instruction()
  • _core.pybrain.correct() wired to behavioral extraction with graceful fallback; brain_convergence() added for session-over-session trend analysis
  • brain.py_instruction_cache lazily initialized with isinstance guard (correctly addresses the hasattr issue from a prior review); convergence() exposed; interactive param now accepts None for auto-detection
  • onboard.py — TTY-aware interactivity detection prevents blocking in CI/scripts; EOFError fallback added to _ask()
  • 12 new tests covering cache, extraction, convergence, and integration paths

One concrete bug found: InstructionCache.flush() is never called anywhere in the correction pipeline or session lifecycle. LLM-extracted instructions are stored in memory for the session but never written to instruction_cache.json. On every new Brain instantiation the cache is loaded from a file that was never updated, so the persistent-cache benefit never materialises and the LLM is called repeatedly for the same patterns across sessions.

Confidence Score: 4/5

Safe to merge after adding a single flush() call in brain_end_session; the bug doesn't break correctness, only persistence

One concrete P1 bug (cache never flushed) prevents the persistent-cache feature from functioning across sessions, but the core behavioral extraction pipeline is correct and well-tested. The isinstance guard correctly addresses the hasattr issue from a prior review. All other changes are clean and well-structured. A one-line fix to call flush() in brain_end_session would bring this to 5/5.

src/gradata/_core.py — flush() call missing from brain_end_session; also verify the flush is import-guarded the same way the cache init is

Important Files Changed

Filename Overview
src/gradata/enhancements/instruction_cache.py New JSON-file cache for behavioral instructions; flush() method well-designed with dirty tracking but never called from the correction pipeline
src/gradata/enhancements/edit_classifier.py Adds extract_behavioral_instruction() with 21 templates + LLM fallback; module-level logger pattern not followed inside _call_llm_for_instruction
src/gradata/_core.py Behavioral extraction wired into brain.correct() with correct isinstance guard; flush() never called so LLM-extracted instructions are not persisted between sessions
src/gradata/brain.py Lazy _instruction_cache init, convergence() method, and interactive=None auto-detection plumbing added cleanly
src/gradata/onboard.py TTY-aware _is_interactive() helper added; _ask() now guards EOFError to prevent blocking in CI/scripts
.claude/hooks/stop/auto-session-note.js New stop hook for auto session tracking; CLAUDE_AGENT/CLAUDE_WORKTREE env-var guards are robust; branch-based skip heuristic was flagged in a prior review thread
tests/test_behavioral_extraction.py Comprehensive tests for template matching, cache hits, LLM-disabled path, tone and process templates
tests/test_instruction_cache.py Well-structured cache tests including disk persistence, corruption handling, and key determinism
tests/test_core_behavioral.py Integration tests verify brain.correct() uses behavioral description and falls back gracefully when extraction returns None
tests/test_convergence.py Convergence metric tests cover empty brain, trend detection, and converged state; uses tmpdir without explicit cleanup but pytest handles this

Sequence Diagram

sequenceDiagram
    participant U as User
    participant B as Brain.correct()
    participant C as brain_correct()
    participant E as extract_behavioral_instruction()
    participant K as InstructionCache
    participant L as _call_llm_for_instruction()
    participant A as Anthropic API

    U->>B: correct(draft, final)
    B->>C: brain_correct(brain, draft, final)
    C->>C: classify diff → primary classification
    C->>C: isinstance(brain._instruction_cache, InstructionCache)?
    Note over C: Lazily initializes cache from instruction_cache.json
    C->>E: extract_behavioral_instruction(diff, primary, cache=cache)
    E->>K: cache.get(cache_key)
    alt Cache hit
        K-->>E: cached instruction
    else Template match
        E->>E: _match_template(classification)
        E->>K: cache.put(cache_key, template)
        Note over K: dirty=True, NOT flushed to disk ⚠️
    else LLM fallback
        E->>L: _call_llm_for_instruction(diff, classification)
        L->>A: messages.create(model=claude-haiku-4-5)
        A-->>L: behavioral instruction text
        L-->>E: instruction string
        E->>K: cache.put(cache_key, instruction)
        Note over K: dirty=True, NOT flushed to disk ⚠️
    end
    E-->>C: behavioral_desc
    C->>C: lesson written with behavioral_desc
    Note over K: flush() never called — cache lost on GC ⚠️
Loading

Comments Outside Diff (1)

  1. src/gradata/_core.py, line 318-325 (link)

    P1 bus.emit() calls not guarded against handler exceptions (Rule 4)

    The brain.bus.emit() calls are not wrapped in try/except blocks. If any registered EventBus handler raises, the exception will propagate up through brain_correct() and silently swallow the lesson that was just written — breaking the correction pipeline for the caller.

    The same issue also exists in the early-return import-error path (lines 83–89).

    Rule Used: # Code Review Rules

    Rule 1: Never use print() ... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/_core.py
    Line: 318-325
    
    Comment:
    **`bus.emit()` calls not guarded against handler exceptions (Rule 4)**
    
    The `brain.bus.emit()` calls are not wrapped in `try/except` blocks. If any registered EventBus handler raises, the exception will propagate up through `brain_correct()` and silently swallow the lesson that was just written — breaking the correction pipeline for the caller.
    
    The same issue also exists in the early-return import-error path (lines 83–89).
    
    
    
    **Rule Used:** # Code Review Rules
    
    ## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

Fix All in Claude Code

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gradata/_core.py
Line: 156-170

Comment:
**InstructionCache never flushed — LLM extractions lost between sessions**

`cache.put()` sets `self._dirty = True` inside `extract_behavioral_instruction()`, but `flush()` is never called anywhere in the correction pipeline or session lifecycle. Grep confirms the only `flush()` call in `src/gradata/` is `mcp_server.py:107` for an unrelated stream, and `brain._instruction_cache` is only referenced at `_core.py:160-165` and `brain.py:69`.

Consequence: every Brain instantiation creates a fresh `InstructionCache` from the file on disk, which was never written — so the cache always starts empty and each session triggers fresh LLM calls for patterns already seen before.

Fix: call `brain._instruction_cache.flush()` in `brain_end_session` (or immediately after `cache.put()` if write-through is preferred):

```python
# In brain_end_session, after meta-rules block:
if isinstance(brain._instruction_cache, InstructionCache):
    brain._instruction_cache.flush()
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/enhancements/edit_classifier.py
Line: 359-362

Comment:
**Module-level logger should be defined at module scope**

`_call_llm_for_instruction` imports `logging` and creates `_log` inside the function body. Every invocation re-executes `getLogger()`, and the pattern is inconsistent with the rest of the SDK (e.g. `instruction_cache.py:16`, `_core.py:19`) where the logger is defined once at module level.

Move it to module level alongside the other module constants:

```python
# After the existing imports at the top of edit_classifier.py
import logging
_log = logging.getLogger("gradata")
```

Then remove the two lines inside `_call_llm_for_instruction`.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (7): Last reviewed commit: "fix: resolve Pyright type mismatch on _i..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Rule used - # Code Review Rules

Rule 1: Never use print() ... (source)

Oliver Le and others added 7 commits April 6, 2026 00:33
Co-Authored-By: Gradata <noreply@gradata.ai>
…plates)

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Only bump session number for human-interactive sessions on main branch.
Subagents, autoresearch worktrees, and non-main branches skip the bump.
Uses cfg.LOOP_STATE as single source of truth instead of hardcoded path.

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds behavioral instruction extraction (template + optional LLM) with a JSON-backed InstructionCache; integrates extraction into brain corrections and meta-rule emission; exposes Brain.convergence(); bumps package to v0.4.0; and adds tests, docs, examples, and marketing content reflecting the new behavioral/convergence framing.

Changes

Cohort / File(s) Summary
Documentation & Marketing
README.md, docs/gradata-marketing-strategy.md, docs/ablation-experiment-s93.md, CLAUDE.md, CHANGELOG.md, CONTRIBUTING.md
Rewrote README and changelog for v0.4.0 messaging (behavioral extraction, convergence); added marketing strategy, ablation report, CLAUDE protocol; trimmed prior architecture/details and updated contribution guidance.
Package Metadata
pyproject.toml
Bumped project version to 0.4.0 and updated description and keywords.
Behavioral Extraction & Cache
src/gradata/enhancements/edit_classifier.py, src/gradata/enhancements/instruction_cache.py
New extract_behavioral_instruction() implementing template matcher + optional LLM fallback and cache-key logic; new InstructionCache JSON-backed cache with deterministic keying, get/put/flush, and tolerant IO.
Core Brain Integration
src/gradata/_core.py, src/gradata/brain.py, src/gradata/onboard.py
brain_correct now attempts behavioral extraction via InstructionCache with fallback to previous description; brain_end_session emits meta_rule.created for new meta-rules; added brain_convergence() and Brain.convergence(); onboarding interactive auto-detection and non-blocking _ask().
Tests
tests/test_behavioral_extraction.py, tests/test_instruction_cache.py, tests/test_core_behavioral.py, tests/test_convergence.py
Added tests covering extraction templates, LLM toggle, cache persistence/corruption resilience, brain integration/fallbacks, and convergence metric/trend detection.
Examples & Repo Layout
examples/basic_usage.py, examples/README.md, .gitignore
Added runnable example demonstrating learning loop and convergence; examples/ un-ignored in .gitignore.
Other docs & small edits
README.md (trimmed sections), CLAUDE.md
Removed many prior Event Bus/embeddings/integration details; added new behavioral extraction and convergence framing and diagrams.

Sequence Diagram

sequenceDiagram
    participant User as User Correction
    participant Brain as Brain.correct()
    participant Classifier as EditClassifier
    participant Cache as InstructionCache
    participant Templates as Template Matcher
    participant LLM as Claude (optional)
    participant Core as _core (meta-rule)

    User->>Brain: correct(draft, final)
    Brain->>Classifier: extract_behavioral_instruction(diff, classification)
    Classifier->>Classifier: parse diff & compute cache_key
    Classifier->>Cache: get(cache_key)
    alt Cache Hit
        Cache-->>Classifier: cached_instruction
        Classifier-->>Brain: instruction
    else Cache Miss
        Classifier->>Templates: match_template(classification.description)
        alt Template Match
            Templates-->>Classifier: template_instruction
            Classifier->>Cache: put(cache_key, template_instruction)
            Classifier-->>Brain: instruction
        else No Template Match
            alt LLM Enabled
                Classifier->>LLM: extract from diff snippets
                LLM-->>Classifier: extracted_instruction
                Classifier->>Cache: put(cache_key, extracted_instruction)
                Classifier-->>Brain: instruction
            else LLM Disabled
                Classifier-->>Brain: None (use primary.description fallback)
            end
        end
    end
    Brain->>Brain: Persist lesson (instruction or fallback)
    Brain->>Core: end_session -> discover meta-rules
    Core->>Core: emit `meta_rule.created` for new IDs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

feature, docs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the two main changes: behavioral instruction extraction (a technical feature) and business repositioning (messaging/positioning changes).
Description check ✅ Passed The PR description is directly related to the changeset—it explains the behavioral extraction engine, cache implementation, business repositioning, and supporting infrastructure changes documented in the raw summary.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/behavioral-extraction

Comment @coderabbitai help to get the list of available commands and usage tips.

Replaces "Agent Operating System" header with actual product description:
"AI that learns your judgment." Updates architecture section, learning
pipeline, and test counts to reflect behavioral extraction changes.

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai coderabbitai Bot added bug Something isn't working docs feature labels Apr 6, 2026
Comment on lines +292 to +318
_INSTRUCTION_TEMPLATES: dict[str, str] = {
# CODE patterns
"getattr": "Use getattr() for safe attribute access on objects that may lack the attribute",
"valueerror,typeerror": "Add explicit ValueError/TypeError guards for invalid inputs",
"except,try,false": "Wrap risky operations in try/except with explicit error handling",
"collections,callable,import,abc": "Import from collections.abc for abstract base types",
"list,str,int": "Add explicit type hints for function parameters and return values",
"optional,defined,ignore,type": "Use Optional[] and type: ignore for conditional imports",
"hash": "Use hash() or __hash__ instead of hashlib for non-cryptographic hashing",
"noqa": "Suppress specific linter warnings with targeted noqa comments",
"logging,getlogger": "Use module-level logger via logging.getLogger(__name__)",
"none,else,true": "Handle None/falsy cases explicitly with early returns",
# TONE patterns
"casualized": "Write in a casual, direct tone — avoid formal business language",
"formalized": "Use professional, formal tone for this context",
"softened": "Use softer, more empathetic language",
"strengthened": "Be more direct and assertive — remove hedging words",
# PROCESS patterns
"first,plan": "Always plan before implementing — plan then adversary review then build",
"check,verify": "Verify data and assumptions before acting on them",
"approve,review": "Get review or approval before proceeding with changes",
"audit,before": "Audit existing code/state before making modifications",
"before,research": "Research the topic thoroughly before producing output",
# STRUCTURE patterns
"reordered": "Present information in a more logical order",
"heading,structure": "Use clear heading hierarchy for document structure",
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Multi-word template keys never match due to sort mismatch

The _match_template() function generates lookup keys by sorting the extracted words alphabetically (",".join(sorted(added_words[:n]))), but several keys in _INSTRUCTION_TEMPLATES are not in alphabetical order. These templates can never be reached:

Template key (as written) Sorted form (what the matcher generates)
"valueerror,typeerror" "typeerror,valueerror"
"except,try,false" "except,false,try"
"collections,callable,import,abc" "abc,callable,collections,import"
"list,str,int" "int,list,str"
"optional,defined,ignore,type" "defined,ignore,optional,type"
"none,else,true" "else,none,true"
"logging,getlogger" "getlogger,logging"

The fix is to align the keys in _INSTRUCTION_TEMPLATES with the sorted order used by the matching algorithm.

Suggested change
_INSTRUCTION_TEMPLATES: dict[str, str] = {
# CODE patterns
"getattr": "Use getattr() for safe attribute access on objects that may lack the attribute",
"valueerror,typeerror": "Add explicit ValueError/TypeError guards for invalid inputs",
"except,try,false": "Wrap risky operations in try/except with explicit error handling",
"collections,callable,import,abc": "Import from collections.abc for abstract base types",
"list,str,int": "Add explicit type hints for function parameters and return values",
"optional,defined,ignore,type": "Use Optional[] and type: ignore for conditional imports",
"hash": "Use hash() or __hash__ instead of hashlib for non-cryptographic hashing",
"noqa": "Suppress specific linter warnings with targeted noqa comments",
"logging,getlogger": "Use module-level logger via logging.getLogger(__name__)",
"none,else,true": "Handle None/falsy cases explicitly with early returns",
# TONE patterns
"casualized": "Write in a casual, direct tone — avoid formal business language",
"formalized": "Use professional, formal tone for this context",
"softened": "Use softer, more empathetic language",
"strengthened": "Be more direct and assertive — remove hedging words",
# PROCESS patterns
"first,plan": "Always plan before implementing — plan then adversary review then build",
"check,verify": "Verify data and assumptions before acting on them",
"approve,review": "Get review or approval before proceeding with changes",
"audit,before": "Audit existing code/state before making modifications",
"before,research": "Research the topic thoroughly before producing output",
# STRUCTURE patterns
"reordered": "Present information in a more logical order",
"heading,structure": "Use clear heading hierarchy for document structure",
}
_INSTRUCTION_TEMPLATES: dict[str, str] = {
# CODE patterns
"getattr": "Use getattr() for safe attribute access on objects that may lack the attribute",
"typeerror,valueerror": "Add explicit ValueError/TypeError guards for invalid inputs",
"except,false,try": "Wrap risky operations in try/except with explicit error handling",
"abc,callable,collections,import": "Import from collections.abc for abstract base types",
"int,list,str": "Add explicit type hints for function parameters and return values",
"defined,ignore,optional,type": "Use Optional[] and type: ignore for conditional imports",
"hash": "Use hash() or __hash__ instead of hashlib for non-cryptographic hashing",
"noqa": "Suppress specific linter warnings with targeted noqa comments",
"getlogger,logging": "Use module-level logger via logging.getLogger(__name__)",
"else,none,true": "Handle None/falsy cases explicitly with early returns",
# TONE patterns
"casualized": "Write in a casual, direct tone — avoid formal business language",
"formalized": "Use professional, formal tone for this context",
"softened": "Use softer, more empathetic language",
"strengthened": "Be more direct and assertive — remove hedging words",
# PROCESS patterns
"first,plan": "Always plan before implementing — plan then adversary review then build",
"check,verify": "Verify data and assumptions before acting on them",
"approve,review": "Get review or approval before proceeding with changes",
"audit,before": "Audit existing code/state before making modifications",
"before,research": "Research the topic thoroughly before producing output",
# STRUCTURE patterns
"reordered": "Present information in a more logical order",
"heading,structure": "Use clear heading hierarchy for document structure",
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/enhancements/edit_classifier.py
Line: 292-318

Comment:
**Multi-word template keys never match due to sort mismatch**

The `_match_template()` function generates lookup keys by sorting the extracted words alphabetically (`",".join(sorted(added_words[:n]))`), but several keys in `_INSTRUCTION_TEMPLATES` are **not** in alphabetical order. These templates can never be reached:

| Template key (as written) | Sorted form (what the matcher generates) |
|---|---|
| `"valueerror,typeerror"` | `"typeerror,valueerror"` |
| `"except,try,false"` | `"except,false,try"` |
| `"collections,callable,import,abc"` | `"abc,callable,collections,import"` |
| `"list,str,int"` | `"int,list,str"` |
| `"optional,defined,ignore,type"` | `"defined,ignore,optional,type"` |
| `"none,else,true"` | `"else,none,true"` |
| `"logging,getlogger"` | `"getlogger,logging"` |

The fix is to align the keys in `_INSTRUCTION_TEMPLATES` with the sorted order used by the matching algorithm.

```suggestion
_INSTRUCTION_TEMPLATES: dict[str, str] = {
    # CODE patterns
    "getattr": "Use getattr() for safe attribute access on objects that may lack the attribute",
    "typeerror,valueerror": "Add explicit ValueError/TypeError guards for invalid inputs",
    "except,false,try": "Wrap risky operations in try/except with explicit error handling",
    "abc,callable,collections,import": "Import from collections.abc for abstract base types",
    "int,list,str": "Add explicit type hints for function parameters and return values",
    "defined,ignore,optional,type": "Use Optional[] and type: ignore for conditional imports",
    "hash": "Use hash() or __hash__ instead of hashlib for non-cryptographic hashing",
    "noqa": "Suppress specific linter warnings with targeted noqa comments",
    "getlogger,logging": "Use module-level logger via logging.getLogger(__name__)",
    "else,none,true": "Handle None/falsy cases explicitly with early returns",
    # TONE patterns
    "casualized": "Write in a casual, direct tone — avoid formal business language",
    "formalized": "Use professional, formal tone for this context",
    "softened": "Use softer, more empathetic language",
    "strengthened": "Be more direct and assertive — remove hedging words",
    # PROCESS patterns
    "first,plan": "Always plan before implementing — plan then adversary review then build",
    "check,verify": "Verify data and assumptions before acting on them",
    "approve,review": "Get review or approval before proceeding with changes",
    "audit,before": "Audit existing code/state before making modifications",
    "before,research": "Research the topic thoroughly before producing output",
    # STRUCTURE patterns
    "reordered": "Present information in a more logical order",
    "heading,structure": "Use clear heading hierarchy for document structure",
}
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment on lines +31 to +41
const branchResult = cfg.spawnSafe('git', ['-C', WORK, 'rev-parse', '--abbrev-ref', 'HEAD'], {
encoding: 'utf8', timeout: 3000,
});
const branch = (branchResult.stdout || '').trim();
if (branch && branch !== 'main' && branch !== 'HEAD') {
// Non-main branch = likely autoresearch/worktree, skip bump
process.exit(0);
}
} catch (_) {}

// --- 1. Detect current session number ---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Branch-based skip may silence sessions on non-main feature branches

The heuristic that treats every non-main, non-HEAD branch as "autoresearch/worktree" is too broad. A human can easily be doing interactive work on a feature branch (e.g., feat/behavioral-extraction) and will silently lose their session note and counter bump for the entire duration of that branch.

Consider tightening the heuristic — e.g., only skip when the branch name matches a known autoresearch naming convention (such as a autoresearch/ or worktree/ prefix), rather than skipping all non-main branches:

// Only skip branches that match known autoresearch/worktree naming patterns
if (branch && /^(autoresearch|worktree|auto-research)[\/\-]/.test(branch)) {
  process.exit(0);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: .claude/hooks/stop/auto-session-note.js
Line: 31-41

Comment:
**Branch-based skip may silence sessions on non-`main` feature branches**

The heuristic that treats every non-`main`, non-`HEAD` branch as "autoresearch/worktree" is too broad. A human can easily be doing interactive work on a feature branch (e.g., `feat/behavioral-extraction`) and will silently lose their session note and counter bump for the entire duration of that branch.

Consider tightening the heuristic — e.g., only skip when the branch name matches a known autoresearch naming convention (such as a `autoresearch/` or `worktree/` prefix), rather than skipping all non-`main` branches:

```js
// Only skip branches that match known autoresearch/worktree naming patterns
if (branch && /^(autoresearch|worktree|auto-research)[\/\-]/.test(branch)) {
  process.exit(0);
}
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Oliver Le and others added 4 commits April 6, 2026 01:03
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
The meta-rule pipeline was creating and saving meta-rules but never
emitting the bus event. This meant embeddings were never cached for
meta-rules and downstream subscribers never fired.

Co-Authored-By: Gradata <noreply@gradata.ai>
New in v0.4.0:
- Behavioral instruction extraction (replaces diff fingerprints)
- brain.convergence() metric
- Meta-rule event emission fix
- Instruction cache for LLM extraction

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/gradata-marketing-strategy.md`:
- Line 242: The phrase "open source SDK" should be hyphenated when used
adjectivally; update the text fragment "Free to train (open source SDK). Pay to
rent." to "Free to train (open-source SDK). Pay to rent." so "open-source"
correctly modifies "SDK" in the docs content.

In `@README.md`:
- Around line 108-115: The documentation is inconsistent: the section header
"Rent Trained Brains (Coming Soon)" signals the feature is not yet launched,
while the pricing table row "Brain rental" lists it as available for the Team
tier; update one of these to match the true status. If the feature is live,
remove "(Coming Soon)" from the "Rent Trained Brains" heading and ensure any
explanatory text reflects availability; if it is not live, change the pricing
table's "Brain rental" cell to "-- (Coming Soon)" (or similar) and add a short
note referencing the "Rent Trained Brains (Coming Soon)" section so both the
header and pricing table consistently indicate planned availability.
- Around line 76-88: Replace the incorrect word "rentable" with "actionable" in
the Behavioral Extraction section where the example line reads `"Use getattr()
for safe attribute access on objects that may lack the attribute"`; update the
sentence that currently says "behavioral instructions — rentable" to "behavioral
instructions — actionable" so the phrasing is correct and consistent with the
later "Rent Trained Brains" reference.
- Around line 37-54: The fenced code block in README.md is missing a language
tag; update the triple-backtick fence that wraps the snippet containing "You
correct your AI" to include a language identifier (e.g., change ``` to ```text
or ```plaintext) so the block renders consistently—locate the fenced block in
the README (the one starting with ``` and containing the AI convergence diagram)
and add the language token after the opening backticks.
- Around line 164-181: The README's fenced code block showing the src/gradata/
directory tree lacks a language identifier, which can affect rendering; update
the triple-backtick fence to include a language token such as "text" or "bash"
so the block begins with ```text (or ```bash) before the directory listing (the
block containing "src/gradata/" and its file entries) to ensure consistent
markdown rendering.

In `@src/gradata/_core.py`:
- Around line 156-169: The code redundantly calls
brain._find_lessons_path(create=True) inside the try block; instead reuse the
already-resolved lessons_path variable and avoid recomputing it, and move/hoist
the InstructionCache instantiation (currently created inside the try) up where
lessons_path is first resolved so a single InstructionCache instance is reused;
update the try block that calls extract_behavioral_instruction(diff, primary,
cache=_inst_cache) to use the hoisted _inst_cache (which may be None if
lessons_path is missing) and keep the existing fallback to primary.description
via behavioral_desc or primary.description.

In `@src/gradata/enhancements/edit_classifier.py`:
- Around line 370-381: The Anthropic API call can hang because
client.messages.create(...) has no timeout; update the call in
edit_classifier.py (the anthropic.Anthropic() client and its
client.messages.create invocation) to include a timeout parameter (for example
timeout=30) so the request fails fast when unresponsive, and keep the existing
exception handling to handle timeouts; ensure you pass the timeout into the same
client.messages.create(...) call that uses model="claude-haiku-4-5-20251001",
max_tokens=100, and messages=[{"role":"user","content":prompt}].
- Around line 386-397: The pyright error occurs because InstructionCache is only
imported inside the function as a string-quoted forward reference; move the
import into a TYPE_CHECKING guarded import at module scope so static analyzers
can see the symbol and change the parameter annotation in
extract_behavioral_instruction to use InstructionCache | None (no quotes).
Specifically, add "from typing import TYPE_CHECKING" and under "if
TYPE_CHECKING: from gradata.enhancements.instruction_cache import
InstructionCache" near the top of the module, then update the
extract_behavioral_instruction signature to use cache: InstructionCache | None =
None.

In `@src/gradata/enhancements/instruction_cache.py`:
- Around line 30-39: The put method currently swallows OSError and risks
concurrent-write races; update InstructionCache.put to catch OSError as e and
call a debug-level logger (e.g., self._logger.debug or module logger) including
the exception, the cache path (self._path), and the key being written, then
re-tain the graceful fallback behavior; do not remove the try/except, but
replace the bare except with "except OSError as e" and a debug log that includes
str(e) and context. Additionally, document the lack of cross-process file
locking (either in the InstructionCache class docstring or surrounding module
README) stating that concurrent put() calls may race and data loss can occur so
callers are aware; optionally add a TODO comment in put mentioning where to add
file-locking (e.g., portalocker) if stronger guarantees are required.

In `@tests/test_behavioral_extraction.py`:
- Around line 38-49: The test currently couples to InstructionCache.make_key by
pre-populating the cache; instead, change test_cache_hit_skips_llm to drive the
real extraction flow: call extract_behavioral_instruction(diff, classification)
once to populate the cache, then call it a second time with the same inputs and
assert the returned value is identical (and that any LLM/mock was not invoked on
the second call). Remove direct uses of InstructionCache.make_key and cache.put
so the test relies on extract_behavioral_instruction and InstructionCache
internal keying.

In `@tests/test_core_behavioral.py`:
- Around line 29-44: The test test_correct_falls_back_to_old_description
currently only checks for LessonState enum names and should instead assert that
the fallback description is written; update the test so after calling
Brain.correct (with extract_behavioral_instruction patched to return None) you
read the created lesson and assert the lesson's primary.description contains the
expected fallback text (e.g., "Content change" or "Tone casualized") or matches
the old diff-fingerprint pattern; locate this logic around
test_correct_falls_back_to_old_description and use Brain.correct and the
persisted lessons markdown content to check primary.description rather than just
"INSTINCT" or "PATTERN".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd96b2c5-c028-4d73-9f9d-673bc90eda52

📥 Commits

Reviewing files that changed from the base of the PR and between 4f3f1a4 and 4360d9a.

⛔ Files ignored due to path filters (1)
  • .claude/hooks/stop/auto-session-note.js is excluded by !.claude/**
📒 Files selected for processing (9)
  • README.md
  • docs/gradata-marketing-strategy.md
  • pyproject.toml
  • src/gradata/_core.py
  • src/gradata/enhancements/edit_classifier.py
  • src/gradata/enhancements/instruction_cache.py
  • tests/test_behavioral_extraction.py
  • tests/test_core_behavioral.py
  • tests/test_instruction_cache.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/_core.py
  • src/gradata/enhancements/instruction_cache.py
  • src/gradata/enhancements/edit_classifier.py
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_instruction_cache.py
  • tests/test_core_behavioral.py
  • tests/test_behavioral_extraction.py
🪛 GitHub Actions: CI
src/gradata/enhancements/edit_classifier.py

[error] 390-390: pyright error: "InstructionCache" is not defined (reportUndefinedVariable)


[warning] 350-350: pyright warning: Import "anthropic" could not be resolved (reportMissingImports)

🪛 LanguageTool
docs/gradata-marketing-strategy.md

[grammar] ~242-~242: Use a hyphen to join words.
Context: ... ### Revenue Model Free to train (open source SDK). Pay to rent. Subscription ...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.22.0)
README.md

[warning] 37-37: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 164-164: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/gradata-marketing-strategy.md

[warning] 1-1: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🔇 Additional comments (8)
pyproject.toml (1)

8-8: LGTM!

Metadata updates align with the PR's business repositioning objective. Description and keywords reflect the new "AI that learns your judgment" messaging.

Also applies to: 15-15

src/gradata/enhancements/instruction_cache.py (1)

41-44: LGTM!

The make_key implementation is deterministic and handles word ordering via sorted(). The 16-character hex digest (64 bits) provides sufficient collision resistance for a cache key.

tests/test_instruction_cache.py (1)

11-47: LGTM!

Tests comprehensively cover the InstructionCache contract: cache miss, put/get, persistence across instances, key generation determinism, and corruption resilience. Assertions check specific values rather than just truthiness.

tests/test_core_behavioral.py (1)

11-26: LGTM!

The test correctly verifies that when extract_behavioral_instruction returns a value, it appears in the lesson description stored in lessons.md.

tests/test_behavioral_extraction.py (1)

15-22: LGTM!

Test coverage for behavioral extraction is solid: template matching for various categories (CODE, TONE, PROCESS), cache behavior, and the llm_enabled=False path. The _make_diff helper is clean and reusable.

Also applies to: 25-36, 52-63, 66-77, 79-89

src/gradata/enhancements/edit_classifier.py (3)

292-318: LGTM!

The _INSTRUCTION_TEMPLATES dictionary provides a solid set of pattern-to-instruction mappings covering CODE, TONE, PROCESS, and STRUCTURE categories. The template keys use comma-separated sorted words for multi-word matching, which aligns with the lookup logic in _match_template.


321-343: LGTM!

The _match_template function correctly prioritizes tone keywords, then structural patterns, then parses "added:" tokens for multi-word and single-word template lookups. The sorted key generation ensures deterministic matching.


372-376: The model "claude-haiku-4-5-20251001" is a valid Claude API model identifier documented by Anthropic. No changes needed.

			> Likely an incorrect or invalid review comment.


### Revenue Model

Free to train (open source SDK). Pay to rent.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor grammar fix: hyphenate "open-source" as an adjective.

When "open source" modifies a noun ("SDK"), it should be hyphenated as "open-source SDK".

📝 Proposed fix
-Free to train (open source SDK). Pay to rent.
+Free to train (open-source SDK). Pay to rent.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Free to train (open source SDK). Pay to rent.
Free to train (open-source SDK). Pay to rent.
🧰 Tools
🪛 LanguageTool

[grammar] ~242-~242: Use a hyphen to join words.
Context: ... ### Revenue Model Free to train (open source SDK). Pay to rent. Subscription ...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/gradata-marketing-strategy.md` at line 242, The phrase "open source SDK"
should be hyphenated when used adjectivally; update the text fragment "Free to
train (open source SDK). Pay to rent." to "Free to train (open-source SDK). Pay
to rent." so "open-source" correctly modifies "SDK" in the docs content.

Comment thread README.md
Comment on lines 37 to 54
```
Human corrects AI output
You correct your AI
|
brain.correct(draft, final)
|
Diff computed > severity classified > lesson created
Behavioral instruction extracted:
"We are pleased..." → "Hey, check this out"
= "Write in a casual, direct tone"
|
Confidence grows with each reinforcement:
0.40 = INSTINCT (new, unproven)
0.60 = PATTERN (seen enough to trust)
0.90 = RULE (injected into every prompt)
Confidence grows with reinforcement:
INSTINCT (0.40) → PATTERN (0.60) → RULE (0.90)
|
3+ related rules > META-RULE (general principle)
3+ related rules → META-RULE emerges
"Use casual tone" + "No sign-offs" + "Short sentences"
= "Match Oliver's direct communication style"
|
brain.apply_brain_rules() > AI stops making that mistake
Your AI converges on YOUR judgment
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify language for fenced code block.

The code block lacks a language identifier. Consider adding text or plaintext to improve markdown rendering consistency.

📝 Proposed fix
-```
+```text
 You correct your AI
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
Human corrects AI output
You correct your AI
|
brain.correct(draft, final)
|
Diff computed > severity classified > lesson created
Behavioral instruction extracted:
"We are pleased..." → "Hey, check this out"
= "Write in a casual, direct tone"
|
Confidence grows with each reinforcement:
0.40 = INSTINCT (new, unproven)
0.60 = PATTERN (seen enough to trust)
0.90 = RULE (injected into every prompt)
Confidence grows with reinforcement:
INSTINCT (0.40) → PATTERN (0.60) → RULE (0.90)
|
3+ related rules > META-RULE (general principle)
3+ related rules → META-RULE emerges
"Use casual tone" + "No sign-offs" + "Short sentences"
= "Match Oliver's direct communication style"
|
brain.apply_brain_rules() > AI stops making that mistake
Your AI converges on YOUR judgment
```
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 37-37: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 37 - 54, The fenced code block in README.md is
missing a language tag; update the triple-backtick fence that wraps the snippet
containing "You correct your AI" to include a language identifier (e.g., change
``` to ```text or ```plaintext) so the block renders consistently—locate the
fenced block in the README (the one starting with ``` and containing the AI
convergence diagram) and add the language token after the opening backticks.

Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md
Comment on lines 164 to 181
```
src/gradata/
brain.py # Brain class (public API)
events_bus.py # Central event bus (v0.3.0)
_core.py # Correction pipeline + graduation
events_bus.py # Central event bus
_core.py # Correction pipeline + behavioral extraction
_events.py # Append-only event log (JSONL + SQLite)
_types.py # Lesson, LessonState, typed models
enhancements/
edit_classifier.py # Classification + behavioral instruction extraction
instruction_cache.py # LLM extraction cache
self_improvement.py # Graduation pipeline
meta_rules.py # Meta-rule synthesis
diff_engine.py # Edit distance, severity
meta_rules.py # Meta-rule synthesis
rules/
rule_engine.py # Inject rules into prompts
rule_ranker.py # Context-aware ranking (v0.3.0)
scope.py # Task classification
integrations/
embeddings.py # Two-tier embeddings (v0.3.0)
session_history.py # Rule effectiveness (v0.3.0)
openai_adapter.py # OpenAI integration
anthropic_adapter.py # Anthropic integration
langchain_adapter.py # LangChain integration
crewai_adapter.py # CrewAI integration
hooks/
auto_correct.py # Automatic diff capture
inject-brain-rules.js # Rule injection + QMD context
tool-finding-capture.js # Lint/test findings to lessons
session-history-sync.js # Cross-session effectiveness
rule_ranker.py # Context-aware ranking
integrations/ # OpenAI, Anthropic, LangChain, CrewAI
contrib/patterns/ # Optional agentic patterns
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify language for fenced code block.

The code block lacks a language identifier for the directory tree structure. Consider adding text or bash to improve markdown rendering consistency.

📝 Proposed fix
-```
+```text
 src/gradata/
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
src/gradata/
brain.py # Brain class (public API)
events_bus.py # Central event bus (v0.3.0)
_core.py # Correction pipeline + graduation
events_bus.py # Central event bus
_core.py # Correction pipeline + behavioral extraction
_events.py # Append-only event log (JSONL + SQLite)
_types.py # Lesson, LessonState, typed models
enhancements/
edit_classifier.py # Classification + behavioral instruction extraction
instruction_cache.py # LLM extraction cache
self_improvement.py # Graduation pipeline
meta_rules.py # Meta-rule synthesis
diff_engine.py # Edit distance, severity
meta_rules.py # Meta-rule synthesis
rules/
rule_engine.py # Inject rules into prompts
rule_ranker.py # Context-aware ranking (v0.3.0)
scope.py # Task classification
integrations/
embeddings.py # Two-tier embeddings (v0.3.0)
session_history.py # Rule effectiveness (v0.3.0)
openai_adapter.py # OpenAI integration
anthropic_adapter.py # Anthropic integration
langchain_adapter.py # LangChain integration
crewai_adapter.py # CrewAI integration
hooks/
auto_correct.py # Automatic diff capture
inject-brain-rules.js # Rule injection + QMD context
tool-finding-capture.js # Lint/test findings to lessons
session-history-sync.js # Cross-session effectiveness
rule_ranker.py # Context-aware ranking
integrations/ # OpenAI, Anthropic, LangChain, CrewAI
contrib/patterns/ # Optional agentic patterns
```
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 164-164: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 164 - 181, The README's fenced code block showing the
src/gradata/ directory tree lacks a language identifier, which can affect
rendering; update the triple-backtick fence to include a language token such as
"text" or "bash" so the block begins with ```text (or ```bash) before the
directory listing (the block containing "src/gradata/" and its file entries) to
ensure consistent markdown rendering.

Comment thread src/gradata/enhancements/edit_classifier.py Outdated
Comment thread src/gradata/enhancements/edit_classifier.py
Comment on lines +30 to +39
def put(self, key: str, instruction: str) -> None:
self._data[key] = instruction
try:
self._path.parent.mkdir(parents=True, exist_ok=True)
self._path.write_text(
json.dumps(self._data, indent=2, ensure_ascii=False),
encoding="utf-8",
)
except OSError:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Silent error swallowing and lack of file locking may cause issues.

  1. Silent OSError suppression (line 38-39): Disk write failures are silently ignored. While graceful degradation is reasonable for a cache, consider logging at debug level so issues are discoverable.

  2. No file locking: Concurrent put() calls from multiple processes sharing the same brain directory will race. The second writer overwrites the first's entry, causing silent data loss. For a cache this may be acceptable (worst case: extra LLM calls), but document this limitation.

♻️ Proposed fix: Add debug logging for write failures
+import logging
+
+_log = logging.getLogger("gradata")
+

 class InstructionCache:
     ...
     def put(self, key: str, instruction: str) -> None:
         self._data[key] = instruction
         try:
             self._path.parent.mkdir(parents=True, exist_ok=True)
             self._path.write_text(
                 json.dumps(self._data, indent=2, ensure_ascii=False),
                 encoding="utf-8",
             )
         except OSError:
-            pass
+            _log.debug("Cache write failed for %s", self._path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/instruction_cache.py` around lines 30 - 39, The put
method currently swallows OSError and risks concurrent-write races; update
InstructionCache.put to catch OSError as e and call a debug-level logger (e.g.,
self._logger.debug or module logger) including the exception, the cache path
(self._path), and the key being written, then re-tain the graceful fallback
behavior; do not remove the try/except, but replace the bare except with "except
OSError as e" and a debug log that includes str(e) and context. Additionally,
document the lack of cross-process file locking (either in the InstructionCache
class docstring or surrounding module README) stating that concurrent put()
calls may race and data loss can occur so callers are aware; optionally add a
TODO comment in put mentioning where to add file-locking (e.g., portalocker) if
stronger guarantees are required.

Comment on lines +38 to +49
def test_cache_hit_skips_llm():
diff = _make_diff("old code", "new code")
classification = EditClassification(
category="CODE", confidence=0.6, severity="moderate",
description="Content change (added: getattr)",
)
with tempfile.TemporaryDirectory() as d:
cache = InstructionCache(Path(d) / "cache.json")
key = InstructionCache.make_key("CODE", ["getattr"], [])
cache.put(key, "Use getattr() for safe attribute access on optional objects")
result = extract_behavioral_instruction(diff, classification, cache=cache)
assert result == "Use getattr() for safe attribute access on optional objects"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider using the actual extraction flow to populate the cache.

The test manually constructs the cache key (line 46) and pre-populates the cache. This couples the test to the internal key-generation implementation. If make_key logic changes, this test may silently become invalid.

A more robust approach would be to call extract_behavioral_instruction once to populate the cache, then call it again to verify the cache hit.

♻️ Alternative approach using extraction flow
 def test_cache_hit_skips_llm():
-    diff = _make_diff("old code", "new code")
+    diff = _make_diff("data[0]", "getattr(data, 'field', None)")
     classification = EditClassification(
         category="CODE", confidence=0.6, severity="moderate",
         description="Content change (added: getattr)",
     )
     with tempfile.TemporaryDirectory() as d:
         cache = InstructionCache(Path(d) / "cache.json")
-        key = InstructionCache.make_key("CODE", ["getattr"], [])
-        cache.put(key, "Use getattr() for safe attribute access on optional objects")
+        # First call populates the cache via template match
+        first_result = extract_behavioral_instruction(diff, classification, cache=cache)
+        assert first_result is not None
+        # Second call should return the cached value
         result = extract_behavioral_instruction(diff, classification, cache=cache)
-        assert result == "Use getattr() for safe attribute access on optional objects"
+        assert result == first_result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_behavioral_extraction.py` around lines 38 - 49, The test currently
couples to InstructionCache.make_key by pre-populating the cache; instead,
change test_cache_hit_skips_llm to drive the real extraction flow: call
extract_behavioral_instruction(diff, classification) once to populate the cache,
then call it a second time with the same inputs and assert the returned value is
identical (and that any LLM/mock was not invoked on the second call). Remove
direct uses of InstructionCache.make_key and cache.put so the test relies on
extract_behavioral_instruction and InstructionCache internal keying.

Comment on lines +29 to +44
def test_correct_falls_back_to_old_description():
"""When extraction returns None, old diff-fingerprint description is used."""
with tempfile.TemporaryDirectory() as d:
brain = Brain.init(d)
with patch(
"gradata.enhancements.edit_classifier.extract_behavioral_instruction",
return_value=None,
):
brain.correct(
draft="Dear Sir, We are pleased.",
final="Hey, here's the deal.",
)
lessons_path = Path(d) / "lessons.md"
if lessons_path.exists():
content = lessons_path.read_text(encoding="utf-8")
assert "INSTINCT" in content or "PATTERN" in content
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Weak assertion: test doesn't verify the fallback description content.

The test name suggests it validates that the "old diff-fingerprint description" is used when extraction returns None. However, the assertion only checks that "INSTINCT" or "PATTERN" appears in the file—these are LessonState enum values, not the description.

This test would pass even if the description were empty or incorrect, as long as a lesson was created. Consider asserting on the actual description content (e.g., the primary.description value like "Content change" or "Tone casualized").

💚 Proposed fix: Assert on expected fallback description pattern
         lessons_path = Path(d) / "lessons.md"
         if lessons_path.exists():
             content = lessons_path.read_text(encoding="utf-8")
-            assert "INSTINCT" in content or "PATTERN" in content
+            # Verify a lesson was created with a description from classify_edits
+            assert "INSTINCT" in content  # lesson state
+            # The fallback description should contain classification info
+            assert "Tone" in content or "Content" in content or "change" in content.lower()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_correct_falls_back_to_old_description():
"""When extraction returns None, old diff-fingerprint description is used."""
with tempfile.TemporaryDirectory() as d:
brain = Brain.init(d)
with patch(
"gradata.enhancements.edit_classifier.extract_behavioral_instruction",
return_value=None,
):
brain.correct(
draft="Dear Sir, We are pleased.",
final="Hey, here's the deal.",
)
lessons_path = Path(d) / "lessons.md"
if lessons_path.exists():
content = lessons_path.read_text(encoding="utf-8")
assert "INSTINCT" in content or "PATTERN" in content
def test_correct_falls_back_to_old_description():
"""When extraction returns None, old diff-fingerprint description is used."""
with tempfile.TemporaryDirectory() as d:
brain = Brain.init(d)
with patch(
"gradata.enhancements.edit_classifier.extract_behavioral_instruction",
return_value=None,
):
brain.correct(
draft="Dear Sir, We are pleased.",
final="Hey, here's the deal.",
)
lessons_path = Path(d) / "lessons.md"
if lessons_path.exists():
content = lessons_path.read_text(encoding="utf-8")
# Verify a lesson was created with a description from classify_edits
assert "INSTINCT" in content # lesson state
# The fallback description should contain classification info
assert "Tone" in content or "Content" in content or "change" in content.lower()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_core_behavioral.py` around lines 29 - 44, The test
test_correct_falls_back_to_old_description currently only checks for LessonState
enum names and should instead assert that the fallback description is written;
update the test so after calling Brain.correct (with
extract_behavioral_instruction patched to return None) you read the created
lesson and assert the lesson's primary.description contains the expected
fallback text (e.g., "Content change" or "Tone casualized") or matches the old
diff-fingerprint pattern; locate this logic around
test_correct_falls_back_to_old_description and use Brain.correct and the
persisted lessons markdown content to check primary.description rather than just
"INSTINCT" or "PATTERN".

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Line 25: The README/CLAUDE.md currently contains a hardcoded user-specific
Windows path "C:/Users/olive/SpritesWork/brain/"; update this to be configurable
or clearly marked as user-specific by replacing the literal with an environment
variable placeholder like "$BRAIN_VAULT_PATH" (or a similar placeholder) and/or
add a short comment next to the path explaining "Customize per-user" so
contributors know to change it for their environment; ensure the new text
appears where the current path string is (the Brain vault line) so consumers see
the placeholder and guidance instead of the hardcoded path.

In `@src/gradata/_core.py`:
- Around line 805-806: In the except Exception: block that currently returns
empty, keep the exception type but add debug-level logging of the caught
exception before returning (e.g., call an existing logger or logging.exception /
logger.exception with context like "DB connectivity error" and include the
exception), so change the except Exception: return empty into an except
Exception as e: logger.exception("...: %s", e) (or logger.debug/exception) then
return empty; reference the existing except Exception and the variable empty to
locate the spot.

In `@tests/test_convergence.py`:
- Around line 49-59: Add missing test coverage in tests/test_convergence.py for
the two untested trend outcomes by adding a test that constructs a brain via
_make_brain_with_corrections(5, [2,4,6,8,10]) and asserts
brain.convergence()["trend"] == "diverging", and another test that uses
_make_brain_with_corrections(2, [5,3]) and asserts brain.convergence()["trend"]
== "insufficient_data"; place these alongside the existing test_convergence_*
functions to ensure the convergence() logic (called on the brain object) is
fully covered for all four trend values.
- Around line 17-23: The test currently swallows all exceptions around
brain.emit(..., session_num) which can hide setup failures; replace the
try/except with either no try (let exceptions propagate) or catch Exception as e
and call pytest.fail(f"brain.emit failed: {e}") (or log and re-raise) so that
failures in brain.emit are reported and the test fails fast; locate the
try/except surrounding the brain.emit call and change it to propagate or
explicitly fail with the caught exception.
- Around line 10-24: The helper _make_brain_with_corrections currently uses
tempfile.mkdtemp() which leaves temp dirs behind; update the function to accept
pytest's tmp_path fixture (e.g., change signature to
_make_brain_with_corrections(tmp_path, num_sessions, corrections_per)) and use
tmp_path / "lessons.md" to write the file and pass tmp_path (or str(tmp_path))
into Brain instead of the mkdtemp path; remove tempfile.mkdtemp usage so pytest
will automatically clean up the temporary directory after the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fc6d9b96-35fc-43d7-902b-5488e449f8b5

📥 Commits

Reviewing files that changed from the base of the PR and between 4360d9a and 3f0f175.

📒 Files selected for processing (6)
  • CLAUDE.md
  • docs/ablation-experiment-s93.md
  • pyproject.toml
  • src/gradata/_core.py
  • src/gradata/brain.py
  • tests/test_convergence.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/brain.py
  • src/gradata/_core.py
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_convergence.py
🪛 GitHub Actions: CI
src/gradata/brain.py

[warning] 670-670: pyright warning: Import "gradata.enhancements.memory_extraction" could not be resolved (reportMissingImports)

🪛 LanguageTool
docs/ablation-experiment-s93.md

[grammar] ~15-~15: Ensure spelling is correct
Context: ...cold) | 5.0 | 8.0 | +3.0 | | 5 | Email (followup) | 4.0 | 5.7 | +1.7 | | 6 | Process | 6...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

CLAUDE.md

[style] ~40-~40: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...VER create docs/READMEs unless asked. - NEVER save files to root folder. Use sdk/, sc...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.22.0)
CLAUDE.md

[warning] 1-1: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 6-6: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 10-10: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 13-13: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 33-33: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🔇 Additional comments (9)
src/gradata/_core.py (4)

156-169: Redundant _find_lessons_path() call.

Line 160 calls brain._find_lessons_path(create=True) again, but lessons_path was already resolved at line 145. Reuse the existing variable to avoid redundant path resolution.

♻️ Proposed fix
                     # Try behavioral extraction (LLM + cache + templates)
                     try:
                         from gradata.enhancements.edit_classifier import extract_behavioral_instruction
                         from gradata.enhancements.instruction_cache import InstructionCache
-                        _cache_path = brain._find_lessons_path(create=True)
                         _inst_cache = InstructionCache(
-                            _cache_path.parent / "instruction_cache.json"
-                        ) if _cache_path else None
+                            lessons_path.parent / "instruction_cache.json"
+                        ) if lessons_path else None
                         behavioral_desc = extract_behavioral_instruction(
                             diff, primary, cache=_inst_cache,
                         )
                         desc = behavioral_desc or primary.description
                     except Exception:
                         desc = primary.description

455-467: LGTM!

The meta_rule.created event emission correctly includes all fields expected by the subscriber in embeddings.py (description, principle). Exception swallowing is appropriate for non-critical event emission.


816-827: Edge case: trend classification for exactly 3 sessions.

When len(counts) == 3, first_half = counts[:1] (1 element) and second_half = counts[1:] (2 elements). This is a valid split but may produce less stable trend classifications for small datasets. The behavior is acceptable but worth documenting.


780-835: LGTM overall for brain_convergence.

The function correctly computes per-session correction counts and derives a trend classification. The edge cases (empty results, insufficient data) are handled appropriately with the empty sentinel return.

pyproject.toml (1)

7-15: LGTM!

Version bump to 0.4.0 and updated description/keywords appropriately reflect the behavioral extraction feature release and business repositioning.

docs/ablation-experiment-s93.md (1)

1-54: LGTM!

Well-documented ablation experiment with clear methodology, results, and actionable insights. The +13.2% improvement calculation is mathematically consistent with the reported averages (6.60 → 7.47).

CLAUDE.md (1)

1-46: LGTM overall!

The agent protocol documentation is comprehensive, covering the learning pipeline, architecture, and operational rules. The structure aligns well with the PR's behavioral extraction features.

src/gradata/brain.py (1)

334-338: LGTM!

Clean delegation to brain_convergence in _core.py. The docstring accurately describes the return value.

tests/test_convergence.py (1)

1-59: Good foundational test coverage for the convergence API.

The tests verify the core functionality: return structure, correct counts, empty brain handling, and primary trend classifications. As per coding guidelines, assertions check specific values rather than just truthiness.

Comment thread CLAUDE.md
- events_bus.py (central nervous system wiring all components)

User config (not SDK): domain/ | .carl/ | skills/
Brain vault: C:/Users/olive/SpritesWork/brain/ (events.jsonl, system.db, sessions/).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making paths configurable or noting they are user-specific.

Line 25 contains a hardcoded Windows path (C:/Users/olive/SpritesWork/brain/). If this file is committed to the repository and shared with other contributors, consider either:

  • Using environment variable placeholders (e.g., $BRAIN_VAULT_PATH)
  • Adding a comment noting this should be customized per-user
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 25, The README/CLAUDE.md currently contains a hardcoded
user-specific Windows path "C:/Users/olive/SpritesWork/brain/"; update this to
be configurable or clearly marked as user-specific by replacing the literal with
an environment variable placeholder like "$BRAIN_VAULT_PATH" (or a similar
placeholder) and/or add a short comment next to the path explaining "Customize
per-user" so contributors know to change it for their environment; ensure the
new text appears where the current path string is (the Brain vault line) so
consumers see the placeholder and guidance instead of the hardcoded path.

Comment thread src/gradata/_core.py
Comment thread tests/test_convergence.py
Comment on lines +10 to +24
def _make_brain_with_corrections(num_sessions: int, corrections_per: list[int]) -> Brain:
"""Create a brain with simulated correction events across sessions."""
d = tempfile.mkdtemp()
(Path(d) / "lessons.md").write_text("", encoding="utf-8")
brain = Brain(d)
for session_num, count in enumerate(corrections_per, start=1):
for _ in range(count):
try:
brain.emit("CORRECTION", "test", {
"category": "TEST", "severity": "minor",
"edit_distance": 0.1, "summary": "test correction",
}, ["category:TEST"], session_num)
except Exception:
pass
return brain
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Temp directories are not cleaned up; consider using tmp_path fixture.

tempfile.mkdtemp() creates directories that persist after tests complete. Use pytest's tmp_path fixture for automatic cleanup.

♻️ Proposed refactor using tmp_path
-def _make_brain_with_corrections(num_sessions: int, corrections_per: list[int]) -> Brain:
+def _make_brain_with_corrections(tmp_path: Path, corrections_per: list[int]) -> Brain:
     """Create a brain with simulated correction events across sessions."""
-    d = tempfile.mkdtemp()
-    (Path(d) / "lessons.md").write_text("", encoding="utf-8")
-    brain = Brain(d)
+    (tmp_path / "lessons.md").write_text("", encoding="utf-8")
+    brain = Brain(tmp_path)
     for session_num, count in enumerate(corrections_per, start=1):
         for _ in range(count):
             try:
                 brain.emit("CORRECTION", "test", {
                     "category": "TEST", "severity": "minor",
                     "edit_distance": 0.1, "summary": "test correction",
                 }, ["category:TEST"], session_num)
             except Exception:
                 pass
     return brain

Then update tests to accept tmp_path as a fixture parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_convergence.py` around lines 10 - 24, The helper
_make_brain_with_corrections currently uses tempfile.mkdtemp() which leaves temp
dirs behind; update the function to accept pytest's tmp_path fixture (e.g.,
change signature to _make_brain_with_corrections(tmp_path, num_sessions,
corrections_per)) and use tmp_path / "lessons.md" to write the file and pass
tmp_path (or str(tmp_path)) into Brain instead of the mkdtemp path; remove
tempfile.mkdtemp usage so pytest will automatically clean up the temporary
directory after the test.

Comment thread tests/test_convergence.py
Comment on lines +17 to +23
try:
brain.emit("CORRECTION", "test", {
"category": "TEST", "severity": "minor",
"edit_distance": 0.1, "summary": "test correction",
}, ["category:TEST"], session_num)
except Exception:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Silently swallowing emit exceptions may hide test setup failures.

If brain.emit() fails, the test will proceed with fewer corrections than expected, potentially causing false positives. Consider logging or failing fast on setup errors.

🛡️ Proposed fix
             try:
                 brain.emit("CORRECTION", "test", {
                     "category": "TEST", "severity": "minor",
                     "edit_distance": 0.1, "summary": "test correction",
                 }, ["category:TEST"], session_num)
-            except Exception:
-                pass
+            except Exception as e:
+                raise AssertionError(f"Test setup failed: emit raised {e}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_convergence.py` around lines 17 - 23, The test currently swallows
all exceptions around brain.emit(..., session_num) which can hide setup
failures; replace the try/except with either no try (let exceptions propagate)
or catch Exception as e and call pytest.fail(f"brain.emit failed: {e}") (or log
and re-raise) so that failures in brain.emit are reported and the test fails
fast; locate the try/except surrounding the brain.emit call and change it to
propagate or explicitly fail with the caught exception.

Comment thread tests/test_convergence.py
Comment on lines +49 to +59
def test_convergence_includes_trend():
brain = _make_brain_with_corrections(5, [10, 8, 6, 4, 2])
result = brain.convergence()
assert "trend" in result
assert result["trend"] == "converging" # declining corrections


def test_convergence_flat_is_converged():
brain = _make_brain_with_corrections(5, [2, 2, 2, 2, 2])
result = brain.convergence()
assert result["trend"] == "converged"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add test coverage for "diverging" and "insufficient_data" trends.

The trend classification has four possible values but only two are tested. Consider adding tests for edge cases.

🧪 Suggested additional tests
def test_convergence_diverging():
    """Test diverging trend when corrections increase over time."""
    brain = _make_brain_with_corrections(5, [2, 4, 6, 8, 10])
    result = brain.convergence()
    assert result["trend"] == "diverging"


def test_convergence_insufficient_data():
    """Test insufficient_data when fewer than 3 sessions."""
    brain = _make_brain_with_corrections(2, [5, 3])
    result = brain.convergence()
    assert result["trend"] == "insufficient_data"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_convergence.py` around lines 49 - 59, Add missing test coverage in
tests/test_convergence.py for the two untested trend outcomes by adding a test
that constructs a brain via _make_brain_with_corrections(5, [2,4,6,8,10]) and
asserts brain.convergence()["trend"] == "diverging", and another test that uses
_make_brain_with_corrections(2, [5,3]) and asserts brain.convergence()["trend"]
== "insufficient_data"; place these alongside the existing test_convergence_*
functions to ensure the convergence() logic (called on the brain object) is
fully covered for all four trend values.

Oliver Le and others added 3 commits April 6, 2026 01:34
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Three-layer defense against EOFError when Brain.init() is called
programmatically (scripts, CI, tests):
1. _ask() returns default immediately when stdin is not a TTY
2. _ask() catches EOFError as fallback (Windows python -c reports isatty=True)
3. onboard() auto-detects interactive mode from TTY when not specified
4. Brain.init() passes through interactive=None so onboard() can auto-detect

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/basic_usage.py`:
- Around line 20-23: The loop assigns severity but never uses it and misses
other useful fields from brain.correct; update the loop that iterates over
corrections (variables draft, final) and the call to brain.correct to extract
and use result fields like result.get("diff"), result.get("lesson_created"),
result.get("category"), and result.get("explanation") instead of discarding
them—e.g., compute diff = result.get("diff"), lesson_created =
result.get("lesson_created"), category = result.get("category") and then print
or log those values alongside the trimmed draft/final strings to demonstrate the
API and avoid unused-variable warnings.
- Around line 5-9: The example currently creates an empty brain directory
manually (Path("./demo-brain"), brain_dir.mkdir, writing lessons.md) and then
calls Brain(...) which only requires an existing directory; instead call the
bootstrapping helper Brain.init(...) to create the full manifest, subdirectories
and seed rules. Replace the manual directory creation and file write with a call
to Brain.init using the same brain_dir (or its string path) so the brain is
properly initialized rather than just an empty folder; you can still reference
Brain.__init__ behavior for context but use Brain.init() for idiomatic
bootstrapping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e450fa3-c493-4960-8f08-302844a8e69f

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0f175 and 0055364.

📒 Files selected for processing (8)
  • .gitignore
  • CHANGELOG.md
  • CONTRIBUTING.md
  • README.md
  • examples/README.md
  • examples/basic_usage.py
  • src/gradata/brain.py
  • src/gradata/onboard.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/brain.py
  • src/gradata/onboard.py
🪛 GitHub Actions: CI
src/gradata/brain.py

[warning] 670-670: pyright warning: Import "gradata.enhancements.memory_extraction" could not be resolved (reportMissingImports)

🪛 LanguageTool
CONTRIBUTING.md

[style] ~24-~24: The double modal “required Run” is nonstandard (only accepted in certain dialects). Consider “to be Run”.
Context: ... - Python 3.11+, type hints required - Run pyright for type checking (target: ze...

(NEEDS_FIXED)

🪛 markdownlint-cli2 (0.22.0)
CHANGELOG.md

[warning] 42-42: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

CONTRIBUTING.md

[warning] 42-42: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (12)
README.md (4)

42-59: Specify language for fenced code block.

The code block lacks a language identifier. Add text to improve markdown rendering consistency.

📝 Proposed fix
-```
+```text
 You correct your AI

86-87: Fix typo: "rentable" should be "actionable".

Line 86 contains a typo. Behavioral instructions are "actionable" (usable/executable), not "rentable."

✏️ Proposed fix
-New approach (behavioral instructions — rentable):
+New approach (behavioral instructions — actionable):

152-169: Specify language for fenced code block.

The architecture diagram code block lacks a language identifier. Add text to improve markdown rendering consistency.

📝 Proposed fix
-```
+```text
 src/gradata/

1-182: LGTM on the overall README restructure.

The repositioning from "procedural memory" to "AI that learns your judgment" is clear and compelling. The ablation experiment results, behavioral extraction explanation, and comparison table effectively communicate the value proposition.

.gitignore (1)

122-122: LGTM!

The change to track examples/ aligns with the new example files being added to the repository for open source distribution.

CONTRIBUTING.md (1)

3-44: LGTM!

Clear and concise contributor guidelines with proper code block language identifiers. The development setup, test commands, and code style requirements are well documented.

examples/README.md (1)

1-12: LGTM!

Clear documentation for the examples directory with proper markdown formatting.

src/gradata/brain.py (2)

205-206: LGTM!

Correctly passes the interactive parameter to onboard() instead of forcing a value, enabling proper non-interactive support.


334-338: LGTM!

Clean delegation to brain_convergence() with appropriate docstring. The return type matches the implementation which provides sessions, corrections_per_session, trend, total_corrections, and total_sessions.

src/gradata/onboard.py (2)

131-146: LGTM!

Robust TTY detection with graceful handling of non-interactive environments and EOFError. The three-layer handling (not a TTY → return default, TTY but EOF → return default, normal input → return answer) ensures the function works reliably in scripts, CI, and tests.


268-288: LGTM!

The updated signature with interactive: bool | None = None and auto-detection logic aligns with Brain.init(). The check on line 287 properly guards against missing stdin attribute and None stdin.

CHANGELOG.md (1)

3-47: LGTM!

Well-structured changelog following Keep a Changelog conventions. The v0.4.0 release notes accurately document the behavioral instruction extraction, convergence API, and related fixes.

Comment thread examples/basic_usage.py
Comment on lines +5 to +9
# Create a brain (or open existing)
brain_dir = Path("./demo-brain")
brain_dir.mkdir(exist_ok=True)
(brain_dir / "lessons.md").write_text("", encoding="utf-8")
brain = Brain(str(brain_dir))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Brain.__init__ requirements
ast-grep --pattern $'class Brain:
  $$$
  def __init__($$$):
    $$$
    if not self.dir.exists():
      $$$
    $$$
'

Repository: Gradata/gradata

Length of output: 50371


🏁 Script executed:

# Check run_migrations and brain.manifest.json handling in __init__
rg -A 3 "run_migrations" src/gradata/brain.py | head -20

Repository: Gradata/gradata

Length of output: 230


🏁 Script executed:

# Search for brain.manifest.json usage in Brain.__init__ and related code
rg "brain\.manifest" src/gradata/ -A 2 -B 2 | head -50

Repository: Gradata/gradata

Length of output: 2855


🏁 Script executed:

# Check the onboard function to understand Brain.init() setup
fd -t f "onboard" src/gradata/ 2>/dev/null | head -5

Repository: Gradata/gradata

Length of output: 81


🏁 Script executed:

# Check onboard.py to see what Brain.init() actually creates
cat -n src/gradata/onboard.py | head -100

Repository: Gradata/gradata

Length of output: 3400


🏁 Script executed:

# Check if brain.manifest() method exists and what it expects
rg -A 10 "def manifest" src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 556


🏁 Script executed:

# Check if brain.manifest.json is read in Brain.__init__
rg "manifest_path\|brain\.manifest" src/gradata/brain.py | head -20

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Check what run_migrations does with a non-existent database
cat -n src/gradata/_migrations.py | head -60

Repository: Gradata/gradata

Length of output: 2540


🏁 Script executed:

# Also check if system.db is created before run_migrations is called
rg "_create_db\|run_migrations" src/gradata/onboard.py -B 5 -A 5

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Search onboard.py more carefully for database creation
rg -n "def onboard|def _create_db|system.db" src/gradata/onboard.py

Repository: Gradata/gradata

Length of output: 268


🏁 Script executed:

# Get the full onboard function to understand the flow
sed -n '200,400p' src/gradata/onboard.py

Repository: Gradata/gradata

Length of output: 6934


🏁 Script executed:

# Check what get_connection does - does it create the db if missing?
rg -A 20 "def get_connection" src/gradata/_db.py | head -30

Repository: Gradata/gradata

Length of output: 754


🏁 Script executed:

# Check if Brain.__init__ reads or requires brain.manifest.json
rg "self.manifest_path" src/gradata/brain.py -A 3 | head -20

Repository: Gradata/gradata

Length of output: 443


🏁 Script executed:

# Check if the example file exists and what it actually does
cat -n examples/basic_usage.py

Repository: Gradata/gradata

Length of output: 1899


🏁 Script executed:

# Check if Brain.correct() has any undocumented requirements
rg -A 5 "def correct" src/gradata/brain.py | head -15

Repository: Gradata/gradata

Length of output: 484


🏁 Script executed:

# Check if there are any issues with reading/writing lessons.md in __init__
rg "_find_lessons_path|_load_lessons" src/gradata/brain.py | head -10

Repository: Gradata/gradata

Length of output: 522


Consider using Brain.init() for proper brain bootstrapping.

While the example will work as written (line 52-54 in Brain.init only requires the directory to exist, and run_migrations automatically creates system.db via sqlite3.connect), the idiomatic approach for new brains is Brain.init(). This method creates the proper directory structure, manifest, subdirectories, and seed rules, giving the brain a solid foundation rather than a minimal setup. Manually creating an empty directory is functional but incomplete.

For comparison:

  • Brain() requires only an existing directory; creates system.db on first use
  • Brain.init() bootstraps a complete brain with manifest, domain scaffold, and optional seed rules

No changes are strictly necessary, but using Brain.init() is the recommended pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/basic_usage.py` around lines 5 - 9, The example currently creates an
empty brain directory manually (Path("./demo-brain"), brain_dir.mkdir, writing
lessons.md) and then calls Brain(...) which only requires an existing directory;
instead call the bootstrapping helper Brain.init(...) to create the full
manifest, subdirectories and seed rules. Replace the manual directory creation
and file write with a call to Brain.init using the same brain_dir (or its string
path) so the brain is properly initialized rather than just an empty folder; you
can still reference Brain.__init__ behavior for context but use Brain.init() for
idiomatic bootstrapping.

Comment thread examples/basic_usage.py
Comment on lines +20 to +23
for draft, final in corrections:
result = brain.correct(draft=draft, final=final)
severity = result.get("diff", {})
print(f" Corrected: {draft[:40]}... → {final[:40]}...")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider handling the correction result more meaningfully.

The severity variable is assigned but unused. The result dict from brain.correct() contains useful fields like lesson_created, category, etc. that could demonstrate the API better.

💡 Suggestion
 for draft, final in corrections:
     result = brain.correct(draft=draft, final=final)
-    severity = result.get("diff", {})
-    print(f"  Corrected: {draft[:40]}... → {final[:40]}...")
+    print(f"  Corrected: {draft[:40]}... → {final[:40]}...")
+    if result.get("lesson_created"):
+        print(f"    → Lesson: {result.get('category', 'UNKNOWN')}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for draft, final in corrections:
result = brain.correct(draft=draft, final=final)
severity = result.get("diff", {})
print(f" Corrected: {draft[:40]}... → {final[:40]}...")
for draft, final in corrections:
result = brain.correct(draft=draft, final=final)
print(f" Corrected: {draft[:40]}... → {final[:40]}...")
if result.get("lesson_created"):
print(f" → Lesson: {result.get('category', 'UNKNOWN')}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/basic_usage.py` around lines 20 - 23, The loop assigns severity but
never uses it and misses other useful fields from brain.correct; update the loop
that iterates over corrections (variables draft, final) and the call to
brain.correct to extract and use result fields like result.get("diff"),
result.get("lesson_created"), result.get("category"), and
result.get("explanation") instead of discarding them—e.g., compute diff =
result.get("diff"), lesson_created = result.get("lesson_created"), category =
result.get("category") and then print or log those values alongside the trimmed
draft/final strings to demonstrate the API and avoid unused-variable warnings.

- InstructionCache: add dirty flag + flush() (write once, not per put)
- Cache held as singleton on Brain instance (not recreated per call)
- LLM extraction: 12s timeout, model as constant, debug logging on failure
- Meta-rule emit: debug logging instead of bare except pass
- Behavioral extraction: debug logging on failure
- onboard.py: extract _is_interactive() helper (was duplicated 3x)
- Use existing lessons_path instead of redundant _find_lessons_path call

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai coderabbitai Bot added breaking-change and removed bug Something isn't working docs feature labels Apr 6, 2026
- Add _instruction_cache attribute to Brain.__init__
- Add TYPE_CHECKING import for InstructionCache in edit_classifier
- Remove quoted string annotation (from __future__ handles it)

Co-Authored-By: Gradata <noreply@gradata.ai>
Comment thread src/gradata/_core.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (1)
src/gradata/enhancements/edit_classifier.py (1)

394-405: ⚠️ Potential issue | 🟠 Major

Expose InstructionCache to pyright at module scope.

Line 398 is still failing reportUndefinedVariable because the annotation references a symbol that only exists inside the function body. Add a TYPE_CHECKING import at module scope and switch the signature to cache: InstructionCache | None = None.

Suggested fix
 from __future__ import annotations

 import re
 from collections import Counter
 from dataclasses import dataclass
+from typing import TYPE_CHECKING

 from gradata.enhancements.diff_engine import DiffResult
+
+if TYPE_CHECKING:
+    from gradata.enhancements.instruction_cache import InstructionCache
 def extract_behavioral_instruction(
     diff: DiffResult,
     classification: EditClassification,
     *,
-    cache: "InstructionCache | None" = None,
+    cache: InstructionCache | None = None,
     llm_enabled: bool = True,
 ) -> str | None:

As per coding guidelines, "This is the core SDK. Check for: type safety (from future import annotations required)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/edit_classifier.py` around lines 394 - 405, Add a
top-level TYPE_CHECKING import and expose InstructionCache at module scope so
static checkers see it, e.g. add "from typing import TYPE_CHECKING" and "if
TYPE_CHECKING: from gradata.enhancements.instruction_cache import
InstructionCache" near the module imports (ensure "from __future__ import
annotations" is present in the module per SDK guidelines), then update the
extract_behavioral_instruction signature to use a real type name instead of a
string: change the parameter to "cache: InstructionCache | None = None" and
remove the inner import of InstructionCache inside the function; keep the
function name extract_behavioral_instruction and the symbol InstructionCache
referenced exactly as shown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_core.py`:
- Around line 160-166: The InstructionCache created on brain._instruction_cache
is populated by extract_behavioral_instruction (via cache.put()) but never
persisted; call brain._instruction_cache.flush() at an appropriate shutdown/save
point (for example immediately after lesson writes where behavioral_desc is
used, or in the session cleanup/exit handler or context manager) to ensure
cached instructions are written to lessons_path.parent /
"instruction_cache.json"; locate the cache via brain._instruction_cache and
invoke its flush() method to persist to disk.
- Around line 164-166: The call to extract_behavioral_instruction(diff, primary,
cache=brain._instruction_cache) defaults llm_enabled=True causing synchronous
LLM calls on cache misses; update the call site in _core.py where
behavioral_desc is assigned to pass llm_enabled=False (or wrap it behind a
Brain-level config flag like brain.llm_enabled and only call with
llm_enabled=True when that flag is set) so template+cache extraction stays local
and network LLM fallback becomes opt-in; reference
extract_behavioral_instruction, behavioral_desc, and brain._instruction_cache
when making the change.
- Around line 160-165: Declare a typed _instruction_cache attribute on the Brain
class in src/gradata/brain.py (e.g., add _instruction_cache:
Optional[InstructionCache] = None) so pyright won't flag dynamic attribute
access; ensure code that currently lazy-initializes brain._instruction_cache
(the block calling InstructionCache(...) before
extract_behavioral_instruction(...)) still sets that attribute. Also ensure you
call brain._instruction_cache.flush() at session boundaries—add an explicit
flush before brain_correct() returns and/or inside brain_end_session() so any
puts from extract_behavioral_instruction(...) are persisted to disk via
InstructionCache.flush().

In `@src/gradata/enhancements/edit_classifier.py`:
- Around line 407-416: The cache key currently uses only category plus parsed
added/removed word lists, which collapses to the same key when
classification.description lacks added:/cut: markers and causes wrong cache
hits; update the key construction in the block using InstructionCache.make_key
so it incorporates whether markers were present (e.g., include boolean flags for
added_match and cut_match) or fall back to the full classification.description
(or a hash of it) when markers are absent, then use that enhanced cache_key for
cache.get and cache.set to avoid cross-matching marker-free TONE/STRUCTURE
edits.
- Around line 331-338: The code only checks sorted prefixes of added_words which
misses templates when the emitted tokens are in a different order; update the
logic in _classify_section so that after extracting added_words you iterate
combinations (use itertools.combinations) for n from min(len(added_words), 4)
down to 1 and for each combination build the key from the combination order
(",".join(comb)) and also check the sorted variant (",".join(sorted(comb)))
against _INSTRUCTION_TEMPLATES, returning the matching template when found; keep
the existing added_match and added_words extraction but replace the prefix-only
loop with this combinations-based check.

In `@src/gradata/enhancements/instruction_cache.py`:
- Around line 29-33: When loading the cache in the InstructionCache code (the
block that checks cache_path.is_file() and assigns self._data =
json.loads(...)), ensure the parsed JSON is a dict before assigning to
self._data: after calling json.loads(cache_path.read_text(...)) validate that
the result is an instance of dict and if not (e.g. list/str/number) treat it as
corrupted and set self._data = {} so subsequent calls to self.get(...) won't
raise AttributeError; keep the existing exception handling for
JSONDecodeError/OSError but add this type check to degrade gracefully.

In `@src/gradata/onboard.py`:
- Line 273: The interactivity detection logic is duplicated between onboard()
(which calls _is_interactive()) and Brain.init() (which also applies has_args
logic), so extract a single helper (e.g., resolve_interactive or
is_interactive_context) that accepts the explicit interactive parameter, current
environment/context and has_args flag and returns the resolved bool; change
onboard() and Brain.init() to call that helper instead of performing their own
checks (keep the interactive: bool | None param and pass it through), and remove
the extra has_args branching from Brain.init() so both code paths use the same
resolution routine to prevent divergence.

In `@tests/test_instruction_cache.py`:
- Around line 41-48: The test currently only exercises in-memory behavior; after
corrupting the on-disk file and creating InstructionCache(path), call
cache.put("key1", "test") then cache.flush() and then re-open a new
InstructionCache(path) (e.g., cache2 = InstructionCache(path)) and assert
cache2.get("key1") == "test" to verify the corrupt file was replaced on disk;
reference the InstructionCache constructor plus its put(), flush(), and get()
methods when making this change.

---

Duplicate comments:
In `@src/gradata/enhancements/edit_classifier.py`:
- Around line 394-405: Add a top-level TYPE_CHECKING import and expose
InstructionCache at module scope so static checkers see it, e.g. add "from
typing import TYPE_CHECKING" and "if TYPE_CHECKING: from
gradata.enhancements.instruction_cache import InstructionCache" near the module
imports (ensure "from __future__ import annotations" is present in the module
per SDK guidelines), then update the extract_behavioral_instruction signature to
use a real type name instead of a string: change the parameter to "cache:
InstructionCache | None = None" and remove the inner import of InstructionCache
inside the function; keep the function name extract_behavioral_instruction and
the symbol InstructionCache referenced exactly as shown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5e61bce7-37cd-4098-81a3-f3e2b8f5c379

📥 Commits

Reviewing files that changed from the base of the PR and between 0055364 and 10668ad.

📒 Files selected for processing (5)
  • src/gradata/_core.py
  • src/gradata/enhancements/edit_classifier.py
  • src/gradata/enhancements/instruction_cache.py
  • src/gradata/onboard.py
  • tests/test_instruction_cache.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/onboard.py
  • src/gradata/enhancements/instruction_cache.py
  • src/gradata/_core.py
  • src/gradata/enhancements/edit_classifier.py
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_instruction_cache.py
🪛 GitHub Actions: CI
src/gradata/_core.py

[error] 161-161: pyright error: Cannot assign to attribute "_instruction_cache" for class "Brain". Attribute "_instruction_cache" is unknown (reportAttributeAccessIssue).


[error] 165-165: pyright error: Cannot access attribute "_instruction_cache" for class "Brain". Attribute "_instruction_cache" is unknown (reportAttributeAccessIssue).

src/gradata/enhancements/edit_classifier.py

[error] 398-398: pyright error: "InstructionCache" is not defined (reportUndefinedVariable).


[warning] 355-355: pyright warning: Import "anthropic" could not be resolved (reportMissingImports).

🔇 Additional comments (2)
src/gradata/onboard.py (2)

23-23: Good addition of safe TTY detection helper.

The sys.stdin guarding is solid and avoids brittle interactive checks in environments where stdin may be missing.

Also applies to: 131-134


137-152: Non-blocking _ask() fallback is a strong reliability improvement.

Returning defaults when non-interactive and handling EOFError removes blocking/EOF failures in CI, hooks, and piped execution paths.

Comment thread src/gradata/_core.py Outdated
Comment on lines +160 to +166
if not hasattr(brain, '_instruction_cache'):
brain._instruction_cache = InstructionCache(
lessons_path.parent / "instruction_cache.json"
)
behavioral_desc = extract_behavioral_instruction(
diff, primary, cache=brain._instruction_cache,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n -C2 '\b_instruction_cache\b|\bInstructionCache\b|\.flush\s*\(' src tests

Repository: Gradata/gradata

Length of output: 8504


🏁 Script executed:

cat -n src/gradata/enhancements/instruction_cache.py

Repository: Gradata/gradata

Length of output: 2371


🏁 Script executed:

rg -n "\.flush\(\)" src --type py -A2 -B2

Repository: Gradata/gradata

Length of output: 448


🏁 Script executed:

grep -r "def.*cleanup\|def.*shutdown\|def.*exit\|atexit\|__del__" src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

sed -n '150,180p' src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 1735


🏁 Script executed:

rg -n "class Brain|def __del__|def cleanup|def close|def __exit__" src/gradata/_core.py -A 5

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

rg -n "atexit|finally|with.*Brain" src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

cat -n src/gradata/enhancements/edit_classifier.py | sed -n '390,430p'

Repository: Gradata/gradata

Length of output: 1690


🏁 Script executed:

rg -n "_instruction_cache\.flush|brain\._instruction_cache\.flush" src

Repository: Gradata/gradata

Length of output: 167


Call flush() to persist cached instructions before process exit.

The code creates brain._instruction_cache and populates it via cache.put() (in extract_behavioral_instruction()), but never calls cache.flush(). Since InstructionCache persists to disk only on explicit flush(), cached instructions are lost when the process terminates. Add a flush point after lesson writes or at session end (e.g., in a cleanup handler or context manager exit).

🧰 Tools
🪛 GitHub Actions: CI

[error] 161-161: pyright error: Cannot assign to attribute "_instruction_cache" for class "Brain". Attribute "_instruction_cache" is unknown (reportAttributeAccessIssue).


[error] 165-165: pyright error: Cannot access attribute "_instruction_cache" for class "Brain". Attribute "_instruction_cache" is unknown (reportAttributeAccessIssue).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 160 - 166, The InstructionCache created on
brain._instruction_cache is populated by extract_behavioral_instruction (via
cache.put()) but never persisted; call brain._instruction_cache.flush() at an
appropriate shutdown/save point (for example immediately after lesson writes
where behavioral_desc is used, or in the session cleanup/exit handler or context
manager) to ensure cached instructions are written to lessons_path.parent /
"instruction_cache.json"; locate the cache via brain._instruction_cache and
invoke its flush() method to persist to disk.

Comment thread src/gradata/_core.py Outdated
Comment on lines +160 to +165
if not hasattr(brain, '_instruction_cache'):
brain._instruction_cache = InstructionCache(
lessons_path.parent / "instruction_cache.json"
)
behavioral_desc = extract_behavioral_instruction(
diff, primary, cache=brain._instruction_cache,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find src/gradata -type f -name "*.py" | head -20

Repository: Gradata/gradata

Length of output: 599


🏁 Script executed:

wc -l src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 83


🏁 Script executed:

sed -n '150,175p' src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 1506


🏁 Script executed:

head -30 src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 1175


🏁 Script executed:

find src/gradata -name "brain.py" -type f

Repository: Gradata/gradata

Length of output: 79


🏁 Script executed:

head -100 src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 3688


🏁 Script executed:

rg "class Brain" src/gradata/brain.py -A 50

Repository: Gradata/gradata

Length of output: 2211


🏁 Script executed:

rg "_instruction_cache" src/gradata

Repository: Gradata/gradata

Length of output: 464


🏁 Script executed:

grep -n "_instruction_cache" src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

rg "class Brain" src/gradata/brain.py -A 200 | head -150

Repository: Gradata/gradata

Length of output: 6047


🏁 Script executed:

sed -n '40,150p' src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 4670


🏁 Script executed:

find src/gradata/enhancements -name "instruction_cache.py" -type f

Repository: Gradata/gradata

Length of output: 104


🏁 Script executed:

sed -n '1,200p' src/gradata/enhancements/instruction_cache.py

Repository: Gradata/gradata

Length of output: 1958


🏁 Script executed:

rg "class InstructionCache" src/gradata/enhancements/instruction_cache.py -A 30

Repository: Gradata/gradata

Length of output: 1095


🏁 Script executed:

rg "_instruction_cache.flush" src/gradata

Repository: Gradata/gradata

Length of output: 167


🏁 Script executed:

rg "flush\(\)" src/gradata/enhancements/instruction_cache.py -B 3 -A 3

Repository: Gradata/gradata

Length of output: 277


🏁 Script executed:

grep -rn "flush()" src/gradata --include="*.py" | grep -i instruction

Repository: Gradata/gradata

Length of output: 171


🏁 Script executed:

sed -n '156,175p' src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 1177


🏁 Script executed:

sed -n '140,200p' src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 3365


🏁 Script executed:

python3 << 'EOF'
import subprocess
result = subprocess.run(['grep', '-n', '.flush\\(\\)', 'src/gradata/_core.py'], capture_output=True, text=True)
print("Flush calls in _core.py:")
print(result.stdout if result.stdout else "No flush() calls found in _core.py")
EOF

Repository: Gradata/gradata

Length of output: 118


🏁 Script executed:

sed -n '200,300p' src/gradata/_core.py | grep -A 20 "def correct"

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

rg "def correct" src/gradata/_core.py -A 40 | head -50

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

rg "def end_session" src/gradata/_core.py -A 30 | head -40

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

grep -n "def correct\|def end_session" src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 214


🏁 Script executed:

rg "brain_correct\|brain_end_session" src/gradata --include="*.py"

Repository: Gradata/gradata

Length of output: 500


🏁 Script executed:

rg "from gradata._core import" src/gradata --include="*.py" -A 10

Repository: Gradata/gradata

Length of output: 500


🏁 Script executed:

sed -n '250,350p' src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 4838


🏁 Script executed:

sed -n '100,250p' src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 7979


🏁 Script executed:

sed -n '250,400p' src/gradata/_core.py | tail -100

Repository: Gradata/gradata

Length of output: 4514


🏁 Script executed:

rg "def brain_correct" src/gradata/_core.py -A 100 | head -120

Repository: Gradata/gradata

Length of output: 4338


Declare _instruction_cache on Brain and add explicit cache flush on session boundaries.

Lines 160-165 dynamically attach _instruction_cache to the brain object without declaring it as a class attribute, causing pyright reportAttributeAccessIssue failures. Per coding guidelines, declare this as a proper Brain attribute in src/gradata/brain.py.

Additionally, InstructionCache.flush() is never called in the codebase. The cache docstring states "writes to disk only on flush()", meaning all behavioral instruction modifications (from cache.put() calls in extract_behavioral_instruction()) are lost on process exit. Call brain._instruction_cache.flush() before brain_correct() returns and/or in brain_end_session() to persist cached instructions to disk.

🧰 Tools
🪛 GitHub Actions: CI

[error] 161-161: pyright error: Cannot assign to attribute "_instruction_cache" for class "Brain". Attribute "_instruction_cache" is unknown (reportAttributeAccessIssue).


[error] 165-165: pyright error: Cannot access attribute "_instruction_cache" for class "Brain". Attribute "_instruction_cache" is unknown (reportAttributeAccessIssue).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 160 - 165, Declare a typed
_instruction_cache attribute on the Brain class in src/gradata/brain.py (e.g.,
add _instruction_cache: Optional[InstructionCache] = None) so pyright won't flag
dynamic attribute access; ensure code that currently lazy-initializes
brain._instruction_cache (the block calling InstructionCache(...) before
extract_behavioral_instruction(...)) still sets that attribute. Also ensure you
call brain._instruction_cache.flush() at session boundaries—add an explicit
flush before brain_correct() returns and/or inside brain_end_session() so any
puts from extract_behavioral_instruction(...) are persisted to disk via
InstructionCache.flush().

Comment thread src/gradata/_core.py
Comment on lines +164 to +166
behavioral_desc = extract_behavioral_instruction(
diff, primary, cache=brain._instruction_cache,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the file structure and read the code around lines 164-166
wc -l src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 83


🏁 Script executed:

# Read the context around lines 164-166
sed -n '150,180p' src/gradata/_core.py | cat -n

Repository: Gradata/gradata

Length of output: 1952


🏁 Script executed:

# Find the extract_behavioral_instruction function definition
rg -n "def extract_behavioral_instruction" src/gradata/

Repository: Gradata/gradata

Length of output: 268


🏁 Script executed:

# Look for brain.correct() definition
rg -n "def correct" src/gradata/ -A 3

Repository: Gradata/gradata

Length of output: 2220


🏁 Script executed:

# Check for llm_enabled parameter and its defaults
rg -n "llm_enabled" src/gradata/ -B 2 -A 2

Repository: Gradata/gradata

Length of output: 974


🏁 Script executed:

# Read the extract_behavioral_instruction function in full
sed -n '394,450p' src/gradata/enhancements/edit_classifier.py | cat -n

Repository: Gradata/gradata

Length of output: 1643


🏁 Script executed:

# Check brain.correct() full implementation
sed -n '301,350p' src/gradata/brain.py | cat -n

Repository: Gradata/gradata

Length of output: 3375


🏁 Script executed:

# Check if there are any configuration classes or settings for llm_enabled
rg -n "llm_enabled" src/gradata/ -B 3 -A 3 --context=5

Repository: Gradata/gradata

Length of output: 1886


🏁 Script executed:

# Look for where extract_behavioral_instruction is called with llm_enabled parameter
rg -n "extract_behavioral_instruction" src/gradata/ -B 2 -A 2

Repository: Gradata/gradata

Length of output: 1428


🏁 Script executed:

# Check Brain class initialization and config
sed -n '1,100p' src/gradata/brain.py | cat -n

Repository: Gradata/gradata

Length of output: 4388


🏁 Script executed:

# Look for any initialization parameters in Brain class
rg -n "class Brain" src/gradata/ -A 20

Repository: Gradata/gradata

Length of output: 12609


🏁 Script executed:

# Check _call_llm_for_instruction to understand the blocking concern
rg -n "_call_llm_for_instruction" src/gradata/ -A 10

Repository: Gradata/gradata

Length of output: 1526


🏁 Script executed:

# Check if there's a config mechanism for LLM settings
rg -n "llm" src/gradata/_config.py -i -B 2 -A 2

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Look for any environment variables or settings that might control LLM
rg -n "environ.*llm\|llm.*environ" src/gradata/ -i

Repository: Gradata/gradata

Length of output: 167


🏁 Script executed:

# Check the full _call_llm_for_instruction implementation to understand blocking behavior
sed -n '350,392p' src/gradata/enhancements/edit_classifier.py | cat -n

Repository: Gradata/gradata

Length of output: 1818


Pass llm_enabled=False to extract_behavioral_instruction() or gate LLM fallback behind a Brain-level config flag.

The call at lines 164-166 doesn't specify llm_enabled, so it defaults to True. This means every cache miss after template matching will block correct() on a synchronous Anthropic request. Since template and cache extraction are cheap, the network call should be opt-in rather than the default path.

🧰 Tools
🪛 GitHub Actions: CI

[error] 165-165: pyright error: Cannot access attribute "_instruction_cache" for class "Brain". Attribute "_instruction_cache" is unknown (reportAttributeAccessIssue).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 164 - 166, The call to
extract_behavioral_instruction(diff, primary, cache=brain._instruction_cache)
defaults llm_enabled=True causing synchronous LLM calls on cache misses; update
the call site in _core.py where behavioral_desc is assigned to pass
llm_enabled=False (or wrap it behind a Brain-level config flag like
brain.llm_enabled and only call with llm_enabled=True when that flag is set) so
template+cache extraction stays local and network LLM fallback becomes opt-in;
reference extract_behavioral_instruction, behavioral_desc, and
brain._instruction_cache when making the change.

Comment thread src/gradata/_core.py
Comment on lines +801 to +805
rows = conn.execute(
"SELECT session, COUNT(*) as cnt FROM events "
"WHERE type = 'CORRECTION' AND session IS NOT NULL AND session > 0 "
"GROUP BY session ORDER BY session"
).fetchall()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include zero-correction sessions in the convergence dataset.

This query groups only CORRECTION rows, so sessions that finished with zero corrections disappear entirely. A history like [5, 0, 0] collapses to [5] and returns insufficient_data, which is the opposite of the intended signal. Use SESSION_END as the session backbone and left-join correction counts.

Suggested query shape
-            rows = conn.execute(
-                "SELECT session, COUNT(*) as cnt FROM events "
-                "WHERE type = 'CORRECTION' AND session IS NOT NULL AND session > 0 "
-                "GROUP BY session ORDER BY session"
-            ).fetchall()
+            rows = conn.execute(
+                "SELECT s.session, COALESCE(c.cnt, 0) "
+                "FROM ("
+                "  SELECT DISTINCT session FROM events "
+                "  WHERE type = 'SESSION_END' AND session IS NOT NULL AND session > 0"
+                ") s "
+                "LEFT JOIN ("
+                "  SELECT session, COUNT(*) AS cnt FROM events "
+                "  WHERE type = 'CORRECTION' AND session IS NOT NULL AND session > 0 "
+                "  GROUP BY session"
+                ") c ON c.session = s.session "
+                "ORDER BY s.session"
+            ).fetchall()

Comment on lines +331 to +338
added_match = re.search(r"added:\s*([^)]+)", desc_lower)
if added_match:
added_words = [w.strip() for w in added_match.group(1).split(",")]
# Try multi-word keys (sorted for consistency)
for n in range(min(len(added_words), 4), 0, -1):
key = ",".join(sorted(added_words[:n]))
if key in _INSTRUCTION_TEMPLATES:
return _INSTRUCTION_TEMPLATES[key]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Try combinations instead of sorted prefixes.

_classify_section() can emit added: tokens in set iteration order. This loop only checks sorted prefixes, so templates like check,verify are missed whenever an unrelated token appears earlier, which makes extraction nondeterministic and pushes avoidable misses into the LLM fallback.

Suggested fix
+        from itertools import combinations
         # Try multi-word keys (sorted for consistency)
-        for n in range(min(len(added_words), 4), 0, -1):
-            key = ",".join(sorted(added_words[:n]))
-            if key in _INSTRUCTION_TEMPLATES:
-                return _INSTRUCTION_TEMPLATES[key]
+        normalized = sorted(set(added_words))
+        for n in range(min(len(normalized), 4), 0, -1):
+            for combo in combinations(normalized, n):
+                key = ",".join(combo)
+                if key in _INSTRUCTION_TEMPLATES:
+                    return _INSTRUCTION_TEMPLATES[key]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/edit_classifier.py` around lines 331 - 338, The code
only checks sorted prefixes of added_words which misses templates when the
emitted tokens are in a different order; update the logic in _classify_section
so that after extracting added_words you iterate combinations (use
itertools.combinations) for n from min(len(added_words), 4) down to 1 and for
each combination build the key from the combination order (",".join(comb)) and
also check the sorted variant (",".join(sorted(comb))) against
_INSTRUCTION_TEMPLATES, returning the matching template when found; keep the
existing added_match and added_words extraction but replace the prefix-only loop
with this combinations-based check.

Comment on lines +407 to +416
added_match = re.search(r"added:\s*([^)]+)", classification.description.lower())
cut_match = re.search(r"cut:\s*([^);]+)", classification.description.lower())
added_words = [w.strip() for w in added_match.group(1).split(",")] if added_match else []
removed_words = [w.strip() for w in cut_match.group(1).split(",")] if cut_match else []
cache_key = InstructionCache.make_key(classification.category, added_words, removed_words)

if cache:
cached = cache.get(cache_key)
if cached:
return cached
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't reuse one cache key for every marker-free TONE/STRUCTURE edit.

When classification.description has no added:/cut: markers—e.g. Tone casualized, Tone formalized, or Content reordered or reformatted—both token lists are empty, so the cache key collapses to just the category. After the first insert, later opposite edits in that category become false cache hits and return the wrong instruction.

Suggested fix
     added_words = [w.strip() for w in added_match.group(1).split(",")] if added_match else []
     removed_words = [w.strip() for w in cut_match.group(1).split(",")] if cut_match else []
-    cache_key = InstructionCache.make_key(classification.category, added_words, removed_words)
+    key_added = added_words
+    key_removed = removed_words
+    if not key_added and not key_removed:
+        key_added = [classification.description.strip().lower()]
+    cache_key = InstructionCache.make_key(classification.category, key_added, key_removed)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/edit_classifier.py` around lines 407 - 416, The
cache key currently uses only category plus parsed added/removed word lists,
which collapses to the same key when classification.description lacks
added:/cut: markers and causes wrong cache hits; update the key construction in
the block using InstructionCache.make_key so it incorporates whether markers
were present (e.g., include boolean flags for added_match and cut_match) or fall
back to the full classification.description (or a hash of it) when markers are
absent, then use that enhanced cache_key for cache.get and cache.set to avoid
cross-matching marker-free TONE/STRUCTURE edits.

Comment on lines +29 to +33
if cache_path.is_file():
try:
self._data = json.loads(cache_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
self._data = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject non-object JSON when loading the cache.

json.loads() can succeed with a list, string, or number. In that case self._data stops being a dict and get() raises AttributeError instead of degrading gracefully on a corrupt cache file.

Suggested fix
         if cache_path.is_file():
             try:
-                self._data = json.loads(cache_path.read_text(encoding="utf-8"))
+                loaded = json.loads(cache_path.read_text(encoding="utf-8"))
+                if isinstance(loaded, dict) and all(
+                    isinstance(k, str) and isinstance(v, str)
+                    for k, v in loaded.items()
+                ):
+                    self._data = loaded
+                else:
+                    self._data = {}
             except (json.JSONDecodeError, OSError):
                 self._data = {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/instruction_cache.py` around lines 29 - 33, When
loading the cache in the InstructionCache code (the block that checks
cache_path.is_file() and assigns self._data = json.loads(...)), ensure the
parsed JSON is a dict before assigning to self._data: after calling
json.loads(cache_path.read_text(...)) validate that the result is an instance of
dict and if not (e.g. list/str/number) treat it as corrupted and set self._data
= {} so subsequent calls to self.get(...) won't raise AttributeError; keep the
existing exception handling for JSONDecodeError/OSError but add this type check
to degrade gracefully.

Comment thread src/gradata/onboard.py
company: str | None = None,
embedding: str | None = None,
interactive: bool = True,
interactive: bool | None = None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider centralizing interactivity resolution to avoid drift.

onboard() now auto-detects via _is_interactive(), while Brain.init() applies additional has_args logic before delegating. Consolidating this policy in one shared helper would prevent future semantic divergence.

Also applies to: 291-292

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/onboard.py` at line 273, The interactivity detection logic is
duplicated between onboard() (which calls _is_interactive()) and Brain.init()
(which also applies has_args logic), so extract a single helper (e.g.,
resolve_interactive or is_interactive_context) that accepts the explicit
interactive parameter, current environment/context and has_args flag and returns
the resolved bool; change onboard() and Brain.init() to call that helper instead
of performing their own checks (keep the interactive: bool | None param and pass
it through), and remove the extra has_args branching from Brain.init() so both
code paths use the same resolution routine to prevent divergence.

Comment on lines +41 to +48
def test_cache_handles_corrupt_file():
with tempfile.TemporaryDirectory() as d:
path = Path(d) / "cache.json"
path.write_text("not valid json", encoding="utf-8")
cache = InstructionCache(path)
assert cache.get("anything") is None
cache.put("key1", "test")
assert cache.get("key1") == "test"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reopen the cache after flush() here.

With the new dirty-write behavior, this test only proves in-memory put()/get() on the same instance. A regression where the corrupt file is never replaced on disk would still pass, so the recovery path is not actually covered.

Suggested test tightening
 def test_cache_handles_corrupt_file():
     with tempfile.TemporaryDirectory() as d:
         path = Path(d) / "cache.json"
         path.write_text("not valid json", encoding="utf-8")
         cache = InstructionCache(path)
         assert cache.get("anything") is None
         cache.put("key1", "test")
-        assert cache.get("key1") == "test"
+        cache.flush()
+
+        recovered = InstructionCache(path)
+        assert recovered.get("key1") == "test"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_instruction_cache.py` around lines 41 - 48, The test currently
only exercises in-memory behavior; after corrupting the on-disk file and
creating InstructionCache(path), call cache.put("key1", "test") then
cache.flush() and then re-open a new InstructionCache(path) (e.g., cache2 =
InstructionCache(path)) and assert cache2.get("key1") == "test" to verify the
corrupt file was replaced on disk; reference the InstructionCache constructor
plus its put(), flush(), and get() methods when making this change.

Use isinstance check instead of hasattr for proper type narrowing.

Co-Authored-By: Gradata <noreply@gradata.ai>
Comment thread src/gradata/_core.py
Comment on lines +156 to +170
# Try behavioral extraction (LLM + cache + templates)
try:
from gradata.enhancements.edit_classifier import extract_behavioral_instruction
from gradata.enhancements.instruction_cache import InstructionCache
if not isinstance(brain._instruction_cache, InstructionCache):
brain._instruction_cache = InstructionCache(
lessons_path.parent / "instruction_cache.json"
)
behavioral_desc = extract_behavioral_instruction(
diff, primary, cache=brain._instruction_cache, # type: ignore[arg-type]
)
desc = behavioral_desc or primary.description
except Exception as e:
_log.debug("Behavioral extraction failed: %s", e)
desc = primary.description
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 InstructionCache never flushed — LLM extractions lost between sessions

cache.put() sets self._dirty = True inside extract_behavioral_instruction(), but flush() is never called anywhere in the correction pipeline or session lifecycle. Grep confirms the only flush() call in src/gradata/ is mcp_server.py:107 for an unrelated stream, and brain._instruction_cache is only referenced at _core.py:160-165 and brain.py:69.

Consequence: every Brain instantiation creates a fresh InstructionCache from the file on disk, which was never written — so the cache always starts empty and each session triggers fresh LLM calls for patterns already seen before.

Fix: call brain._instruction_cache.flush() in brain_end_session (or immediately after cache.put() if write-through is preferred):

# In brain_end_session, after meta-rules block:
if isinstance(brain._instruction_cache, InstructionCache):
    brain._instruction_cache.flush()
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/_core.py
Line: 156-170

Comment:
**InstructionCache never flushed — LLM extractions lost between sessions**

`cache.put()` sets `self._dirty = True` inside `extract_behavioral_instruction()`, but `flush()` is never called anywhere in the correction pipeline or session lifecycle. Grep confirms the only `flush()` call in `src/gradata/` is `mcp_server.py:107` for an unrelated stream, and `brain._instruction_cache` is only referenced at `_core.py:160-165` and `brain.py:69`.

Consequence: every Brain instantiation creates a fresh `InstructionCache` from the file on disk, which was never written — so the cache always starts empty and each session triggers fresh LLM calls for patterns already seen before.

Fix: call `brain._instruction_cache.flush()` in `brain_end_session` (or immediately after `cache.put()` if write-through is preferred):

```python
# In brain_end_session, after meta-rules block:
if isinstance(brain._instruction_cache, InstructionCache):
    brain._instruction_cache.flush()
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_core.py`:
- Around line 817-828: The trend logic using avg_first and avg_second can
misclassify when avg_first == 0; update the trend computation in
src/gradata/_core.py (the block that computes first_half, second_half,
avg_first, avg_second and sets trend) to branch on avg_first: if avg_first > 0
keep the existing ratio/tolerance checks (avg_second < avg_first * 0.7 and
abs(avg_second - avg_first) <= max(1, avg_first * 0.15)), but if avg_first == 0
use absolute thresholds—e.g., treat avg_second == 0 as "converged", avg_second
<= 1 (or a small absolute threshold) as "converging", and otherwise
"diverging"—and ensure the tolerance calculation only uses avg_first-derived
relative tolerance when avg_first > 0.

In `@src/gradata/brain.py`:
- Around line 69-70: Change the loose annotation on self._instruction_cache from
object | None to the specific InstructionCache | None: add a TYPE_CHECKING
import guard (from typing import TYPE_CHECKING) and inside it import
InstructionCache for static typing, then update the field declaration in the
class to self._instruction_cache: InstructionCache | None = None (keeping the
runtime behaviour unchanged to avoid circular imports). Ensure the symbol
InstructionCache is referenced exactly as imported for the type hint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e85ebf6-9f34-48f1-a667-fcc6a2ab4b66

📥 Commits

Reviewing files that changed from the base of the PR and between 10668ad and 27c64ca.

📒 Files selected for processing (3)
  • src/gradata/_core.py
  • src/gradata/brain.py
  • src/gradata/enhancements/edit_classifier.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/brain.py
  • src/gradata/enhancements/edit_classifier.py
  • src/gradata/_core.py
🔇 Additional comments (11)
src/gradata/enhancements/edit_classifier.py (4)

335-346: Template matching still uses sorted prefixes instead of combinations.

The past review flagged that _classify_section() emits added: tokens in non-deterministic set iteration order. The current fix sorts only the first N words as a prefix (line 340), but this still misses templates when relevant keywords aren't in the first N positions. For example, if added_words = ["foo", "check", "verify"], sorted(added_words[:2]) yields ["check", "foo"] → key "check,foo", missing the "check,verify" template.


411-415: Cache key collision issue for marker-free classifications not addressed.

When classification.description lacks added:/cut: markers (e.g., "Tone casualized", "Tone formalized", "Content reordered or reformatted"), both added_words and removed_words are empty lists. The cache key collapses to just the category, causing false cache hits for opposite edits (e.g., casualizing tone returns a cached formalization instruction).


20-26: LGTM — TYPE_CHECKING import correctly added.

The TYPE_CHECKING guard for InstructionCache resolves the pyright error from the previous review.


350-395: LGTM — LLM extraction with proper timeout and error handling.

The timeout (12 seconds) addresses the previous review concern. The length validation (5-200 chars) and exception handling ensure graceful degradation.

src/gradata/_core.py (5)

156-170: llm_enabled defaults to True, causing synchronous LLM calls on every cache miss.

The call at line 164-166 doesn't specify llm_enabled, so it defaults to True. After template matching fails, every cache miss blocks correct() on a synchronous Anthropic request (up to 12 seconds). Since template and cache extraction are cheap local operations, the LLM fallback should be opt-in.


160-167: InstructionCache.flush() is never called — cached instructions are lost on process exit.

The cache is populated via cache.put() inside extract_behavioral_instruction(), but flush() is never invoked. Per InstructionCache design, data persists to disk only on explicit flush(). Add a flush call after behavioral extraction succeeds or at session boundaries.


801-805: Zero-correction sessions excluded from convergence data.

The query groups only CORRECTION events, so sessions that finished with zero corrections are omitted entirely. A trend like [5, 0, 0] collapses to [5] and returns "insufficient_data", which misrepresents actual convergence.


456-468: LGTM — Meta-rule event emission with proper error handling.

The event payload includes all fields expected by the meta_rule.created handler in embeddings.py. The getattr() calls with defaults handle missing attributes gracefully, and individual exceptions are caught to avoid breaking the loop.


830-836: LGTM — Return structure matches documented interface.

The return dict includes all fields documented in the docstring with correct types.

src/gradata/brain.py (2)

196-208: LGTM — Interactive parameter now correctly forwarded.

The change allows onboard() to handle TTY auto-detection when interactive=None, rather than forcing it to True. This aligns with the onboard() signature which safely handles None via _is_interactive().


336-340: LGTM — Clean delegation for convergence API.

The method correctly delegates to brain_convergence() in _core.py, maintaining the pattern used by other Brain methods.

Comment thread src/gradata/_core.py
Comment on lines +817 to +828
if len(counts) >= 3:
first_half = counts[:len(counts) // 2]
second_half = counts[len(counts) // 2:]
avg_first = sum(first_half) / len(first_half)
avg_second = sum(second_half) / len(second_half)

if avg_second < avg_first * 0.7:
trend = "converging"
elif abs(avg_second - avg_first) <= max(1, avg_first * 0.15):
trend = "converged"
else:
trend = "diverging"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Edge case: avg_first = 0 causes incorrect trend classification.

If all sessions in the first half have zero corrections (e.g., counts = [0, 0, 0, 5, 5, 5]), then avg_first = 0. The check avg_second < avg_first * 0.7 becomes avg_second < 0, which is always false, and the tolerance check abs(avg_second - avg_first) <= max(1, 0) uses floor of 1. This could misclassify trends when early sessions had no corrections.

🛡️ Suggested fix
         if avg_second < avg_first * 0.7:
             trend = "converging"
-        elif abs(avg_second - avg_first) <= max(1, avg_first * 0.15):
+        elif avg_first == 0 and avg_second == 0:
+            trend = "converged"
+        elif avg_first > 0 and abs(avg_second - avg_first) <= max(1, avg_first * 0.15):
             trend = "converged"
         else:
             trend = "diverging"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 817 - 828, The trend logic using avg_first
and avg_second can misclassify when avg_first == 0; update the trend computation
in src/gradata/_core.py (the block that computes first_half, second_half,
avg_first, avg_second and sets trend) to branch on avg_first: if avg_first > 0
keep the existing ratio/tolerance checks (avg_second < avg_first * 0.7 and
abs(avg_second - avg_first) <= max(1, avg_first * 0.15)), but if avg_first == 0
use absolute thresholds—e.g., treat avg_second == 0 as "converged", avg_second
<= 1 (or a small absolute threshold) as "converging", and otherwise
"diverging"—and ensure the tolerance calculation only uses avg_first-derived
relative tolerance when avg_first > 0.

Comment thread src/gradata/brain.py
Comment on lines +69 to +70
self._instruction_cache: object | None = None # lazy: InstructionCache

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Type hint should be InstructionCache | None instead of object | None.

The current object | None type hint is too loose and loses type safety. Use a TYPE_CHECKING guard to import the type for static analysis while avoiding circular imports at runtime.

♻️ Proposed fix

Add to imports section (around line 13):

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from gradata.enhancements.instruction_cache import InstructionCache

Then update the field declaration:

-        self._instruction_cache: object | None = None  # lazy: InstructionCache
+        self._instruction_cache: InstructionCache | None = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/brain.py` around lines 69 - 70, Change the loose annotation on
self._instruction_cache from object | None to the specific InstructionCache |
None: add a TYPE_CHECKING import guard (from typing import TYPE_CHECKING) and
inside it import InstructionCache for static typing, then update the field
declaration in the class to self._instruction_cache: InstructionCache | None =
None (keeping the runtime behaviour unchanged to avoid circular imports). Ensure
the symbol InstructionCache is referenced exactly as imported for the type hint.

@Gradata Gradata merged commit 9f1b116 into main Apr 6, 2026
6 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Apr 15, 2026
3 tasks
Gradata added a commit that referenced this pull request Apr 15, 2026
…e semantics

Problem: 10 sessions of "don't use em-dashes" produced 10 lesson
reinforcements and inflated confidence/fire_count 10x. Ties into
gap-analysis gap #4 (survival-bonus bypass inflating confidence).

Solution: A single sha1 fingerprint on (category, normalized(draft||final))
is persisted in a new observation_dedup SQLite table with seen_count,
first/last session, and timestamps. The ingestion path in
brain_correct (_core.py) calls check_and_register at exactly one point;
if the fingerprint was seen inside a recent 10-session window the
correction is tagged observation_deduped and the lesson-create /
lesson-reinforce branch is skipped, preventing confidence inflation
while preserving pattern extraction, FTS indexing, bus emit, and the
raw CORRECTION event.

Semantics: default is DROP within window. seen_count is still tracked
in the persisted row so MERGE (bump fire_count by seen_count at window
rollover) can be wired in later without data loss. True MERGE wiring
requires extending update_confidence in self_improvement.py, which is
off-limits for this worktree — flagged for polyclaude policy review.

Files:
- src/gradata/enhancements/dedup.py (new)
- src/gradata/_core.py (single hook point in brain_correct)
- tests/test_dedup.py (14 tests: fingerprint stability, normalization,
  category-awareness, window boundaries, end-to-end via Brain.correct)

Tests: 2271 pass, 23 skipped. ruff clean.

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata added a commit that referenced this pull request Apr 15, 2026
…istence (#67)

* feat(dedup): fingerprint-based observation dedup with seen_count merge semantics

Problem: 10 sessions of "don't use em-dashes" produced 10 lesson
reinforcements and inflated confidence/fire_count 10x. Ties into
gap-analysis gap #4 (survival-bonus bypass inflating confidence).

Solution: A single sha1 fingerprint on (category, normalized(draft||final))
is persisted in a new observation_dedup SQLite table with seen_count,
first/last session, and timestamps. The ingestion path in
brain_correct (_core.py) calls check_and_register at exactly one point;
if the fingerprint was seen inside a recent 10-session window the
correction is tagged observation_deduped and the lesson-create /
lesson-reinforce branch is skipped, preventing confidence inflation
while preserving pattern extraction, FTS indexing, bus emit, and the
raw CORRECTION event.

Semantics: default is DROP within window. seen_count is still tracked
in the persisted row so MERGE (bump fire_count by seen_count at window
rollover) can be wired in later without data loss. True MERGE wiring
requires extending update_confidence in self_improvement.py, which is
off-limits for this worktree — flagged for polyclaude policy review.

Files:
- src/gradata/enhancements/dedup.py (new)
- src/gradata/_core.py (single hook point in brain_correct)
- tests/test_dedup.py (14 tests: fingerprint stability, normalization,
  category-awareness, window boundaries, end-to-end via Brain.correct)

Tests: 2271 pass, 23 skipped. ruff clean.

Co-Authored-By: Gradata <noreply@gradata.ai>

* refactor(dedup): simplify hook seam and reuse _db helpers

- Pull the 29-line inline dedup block in brain_correct into a single
  annotate_event_with_dedup() helper inside dedup.py. Call site in
  _core.py is now 7 lines including the import — one seam, one signature.
- Reuse gradata._db.get_connection and ensure_table instead of
  hand-rolling a sqlite3.connect + PRAGMA busy_timeout + CREATE TABLE
  path. Aligns dedup.py with the project's standard schema-creation
  pattern without touching _migrations.py.
- Collapse _normalize_text into two statements.

No behavior change. All 14 dedup tests pass, full suite still at
2271 passed / 23 skipped. ruff clean on the touched files.

Co-Authored-By: Gradata <noreply@gradata.ai>

---------

Co-authored-by: Gradata <noreply@gradata.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant