Skip to content

Review: Full codebase audit via Greptile#1

Closed
Gradata wants to merge 1 commit intomainfrom
greptile-review-test
Closed

Review: Full codebase audit via Greptile#1
Gradata wants to merge 1 commit intomainfrom
greptile-review-test

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 6, 2026

Purpose

Trigger Greptile's automated review on the full v0.3.0 codebase.

This is a test PR to evaluate Greptile's review quality against our own internal review pipeline (autoresearch + simplify + security audit + adversary).

What to review

  • src/gradata/ — full SDK
  • src/gradata/hooks/ — nervous system hooks
  • tests/ — test suite

After review

Compare Greptile findings vs our own. Delete this PR either way.

Generated with Gradata

Greptile Summary

This PR adds a single greptile-review.md trigger file as a test of Greptile's review quality against the team's internal pipeline, auditing the full v0.3.0 "Nervous System" release.

The new EventBus design (events_bus.py) is well-implemented — handler failures are correctly isolated via _safe_call, listener count is bounded at 50, async handlers use a bounded thread pool, and the integration test (test_bus_error_isolation) verifies that a crashing subscriber does not interrupt the correction pipeline. The dual-write event persistence in _events.py (JSONL + SQLite with WAL mode) is solid.

Key findings from the audit:

  • Multi-brain ctx gap (_events.py): supersede(), correction_rate(), compute_leading_indicators(), and audit_trend() bypass the BrainContext injection that emit() and query() support, silently routing all calls to the global _p.DB_PATH. Any user running two Brain instances in a single process will encounter silent cross-brain data corruption on these four code paths.
  • print() instead of logging (Rule 1): Eight production sites across _events.py, rules/rule_tracker.py, mcp_server.py, and _embed.py use print(..., file=sys.stderr) for error and warning messages, making them invisible to callers that use structured logging.
  • Missing from __future__ import annotations (Rule 6): Six SDK modules (_paths.py, _doctor.py, _embed.py, cli.py, _installer.py, _validator.py) are missing the required future import despite using union-type annotations (str | Path | None)."

Confidence Score: 4/5

Codebase is structurally sound with 1339 passing tests; the ctx-gap and logging violations are clean-up items that do not affect single-brain users.

EventBus error isolation is correctly implemented and tested. The main issues are: (1) a multi-brain isolation bug in four _events.py functions that only affects multi-Brain processes, (2) print() logging violations across 4 files, and (3) missing future imports in 6 files. None of these block single-brain functionality. Score is 4 rather than 5 because the ctx-gap in supersede() is a real silent-data-corruption risk for any user running two Brain instances simultaneously.

src/gradata/_events.py (ctx-gap + logging violations), src/gradata/rules/rule_tracker.py (print logging), src/gradata/mcp_server.py (print logging), and the six SDK modules missing from future import annotations

Important Files Changed

Filename Overview
src/gradata/_events.py Core event persistence layer; two print() logging violations and four ctx-unaware functions break multi-brain isolation
src/gradata/_paths.py Path resolution module; missing from future import annotations despite using union type syntax throughout
src/gradata/events_bus.py In-memory EventBus correctly isolates handler errors, bounds listeners to 50, and uses a thread pool — well-implemented
src/gradata/hooks/auto_correct.py PostToolUse hook; well-structured with proper error handling at every call site
src/gradata/_core.py Brain core logic; robust try/except wrapping throughout the correction and graduation pipeline
src/gradata/rules/rule_tracker.py Rule application tracker; three print() warning calls should be replaced with logging.warning()
src/gradata/mcp_server.py MCP JSON-RPC server; two print() calls in brain-init path should use logging.info() / logging.error()
src/gradata/_embed.py Embedding client; print() for missing API key error and missing from future import annotations

Sequence Diagram

sequenceDiagram
    participant U as User/Agent
    participant B as Brain
    participant EB as EventBus (in-memory)
    participant EV as _events (JSONL+SQLite)

    U->>B: brain.correct(draft, final)
    B->>EV: emit("CORRECTION", ...) — dual-write
    EV-->>B: event dict (with id)
    B->>EB: bus.emit("correction.created", payload)
    EB-->>EB: _safe_call(handler, payload) — errors isolated

    U->>B: brain.end_session()
    B->>EV: emit("SESSION_END", ...)
    B->>EB: bus.emit("session.ended", stats)
    EB-->>EB: _safe_call per subscriber
Loading

Comments Outside Diff (3)

  1. src/gradata/_events.py, line 107-122 (link)

    P1 print() used for error logging

    Both JSONL and SQLite write-failure handlers use print(..., file=sys.stderr) instead of the logging module. These messages are invisible to any caller that routes log output through a handler, and cannot be filtered by severity level. The same pattern appears in several other SDK modules:

    • src/gradata/rules/rule_tracker.py:63, :87, :122 — warning-level messages
    • src/gradata/mcp_server.py:505, :511 — brain-init status messages
    • src/gradata/_embed.py:149 — missing API key error

    All of these should use logging.warning() or logging.error() so callers can control log visibility. For example, in _events.py lines 107 and 122:

            except Exception as e:
                logging.getLogger("gradata.events").error("[events] JSONL write failed: %s", e)
            except Exception as e:
                logging.getLogger("gradata.events").error("[events] SQLite write failed: %s", e)

    Rule Used: # Code Review Rules

    Rule 1: Never use print() ... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/_events.py
    Line: 107-122
    
    Comment:
    **`print()` used for error logging**
    
    Both JSONL and SQLite write-failure handlers use `print(..., file=sys.stderr)` instead of the `logging` module. These messages are invisible to any caller that routes log output through a handler, and cannot be filtered by severity level. The same pattern appears in several other SDK modules:
    
    - `src/gradata/rules/rule_tracker.py:63`, `:87`, `:122` — warning-level messages
    - `src/gradata/mcp_server.py:505`, `:511` — brain-init status messages
    - `src/gradata/_embed.py:149` — missing API key error
    
    All of these should use `logging.warning()` or `logging.error()` so callers can control log visibility. For example, in `_events.py` lines 107 and 122:
    
    ```python
            except Exception as e:
                logging.getLogger("gradata.events").error("[events] JSONL write failed: %s", e)
    ```
    
    ```python
            except Exception as e:
                logging.getLogger("gradata.events").error("[events] SQLite write failed: %s", e)
    ```
    
    **Rule Used:** # Code Review Rules
    
    ## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex Fix in Claude Code Fix in Cursor

  2. src/gradata/_paths.py, line 1-12 (link)

    P2 Missing from __future__ import annotations

    _paths.py is a core SDK module that uses str | Path | None union syntax in several public function signatures (e.g. resolve_brain_dir, BrainContext.from_brain_dir, set_brain_dir), but it is missing from __future__ import annotations. Rule 6 requires this import in every Python SDK file.

    The same import is missing from:

    • src/gradata/_doctor.py
    • src/gradata/_embed.py
    • src/gradata/cli.py
    • src/gradata/_installer.py
    • src/gradata/_validator.py

    Add from __future__ import annotations as the first import (after the module docstring) in each of these files.

    Rule Used: # Code Review Rules

    Rule 1: Never use print() ... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/_paths.py
    Line: 1-12
    
    Comment:
    **Missing `from __future__ import annotations`**
    
    `_paths.py` is a core SDK module that uses `str | Path | None` union syntax in several public function signatures (e.g. `resolve_brain_dir`, `BrainContext.from_brain_dir`, `set_brain_dir`), but it is missing `from __future__ import annotations`. Rule 6 requires this import in every Python SDK file.
    
    The same import is missing from:
    - `src/gradata/_doctor.py`
    - `src/gradata/_embed.py`
    - `src/gradata/cli.py`
    - `src/gradata/_installer.py`
    - `src/gradata/_validator.py`
    
    Add `from __future__ import annotations` as the first import (after the module docstring) in each of these files.
    
    **Rule Used:** # Code Review Rules
    
    ## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex Fix in Claude Code Fix in Cursor

  3. src/gradata/_events.py, line 198-216 (link)

    P1 supersede() ignores BrainContext, silently routes to the wrong database in multi-brain setups

    emit() and query() both accept an optional ctx: BrainContext | None parameter so they can be directed at a specific brain's database. However supersede() hard-codes _p.DB_PATH (the process-global path) at line 201 and passes no ctx to its internal emit() call at lines 210–213. In a process with two Brain instances, calling supersede() on events belonging to the second brain will silently read from and write to the first brain's database.

    The same issue affects:

    • correction_rate() (line 220) — uses _p.DB_PATH directly
    • compute_leading_indicators() (line 231) — uses _p.DB_PATH directly
    • audit_trend() (line 376) — uses _p.DB_PATH directly

    Each of these functions should accept ctx: BrainContext | None = None and resolve db_path = ctx.db_path if ctx else _p.DB_PATH, consistent with emit() and query().

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/_events.py
    Line: 198-216
    
    Comment:
    **`supersede()` ignores `BrainContext`, silently routes to the wrong database in multi-brain setups**
    
    `emit()` and `query()` both accept an optional `ctx: BrainContext | None` parameter so they can be directed at a specific brain's database. However `supersede()` hard-codes `_p.DB_PATH` (the process-global path) at line 201 and passes no `ctx` to its internal `emit()` call at lines 210–213. In a process with two `Brain` instances, calling `supersede()` on events belonging to the second brain will silently read from and write to the first brain's database.
    
    The same issue affects:
    - `correction_rate()` (line 220) — uses `_p.DB_PATH` directly
    - `compute_leading_indicators()` (line 231) — uses `_p.DB_PATH` directly
    - `audit_trend()` (line 376) — uses `_p.DB_PATH` directly
    
    Each of these functions should accept `ctx: BrainContext | None = None` and resolve `db_path = ctx.db_path if ctx else _p.DB_PATH`, consistent with `emit()` and `query()`.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex Fix in Claude Code Fix in Cursor

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gradata/_events.py
Line: 107-122

Comment:
**`print()` used for error logging**

Both JSONL and SQLite write-failure handlers use `print(..., file=sys.stderr)` instead of the `logging` module. These messages are invisible to any caller that routes log output through a handler, and cannot be filtered by severity level. The same pattern appears in several other SDK modules:

- `src/gradata/rules/rule_tracker.py:63`, `:87`, `:122` — warning-level messages
- `src/gradata/mcp_server.py:505`, `:511` — brain-init status messages
- `src/gradata/_embed.py:149` — missing API key error

All of these should use `logging.warning()` or `logging.error()` so callers can control log visibility. For example, in `_events.py` lines 107 and 122:

```python
        except Exception as e:
            logging.getLogger("gradata.events").error("[events] JSONL write failed: %s", e)
```

```python
        except Exception as e:
            logging.getLogger("gradata.events").error("[events] SQLite write failed: %s", e)
```

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/_paths.py
Line: 1-12

Comment:
**Missing `from __future__ import annotations`**

`_paths.py` is a core SDK module that uses `str | Path | None` union syntax in several public function signatures (e.g. `resolve_brain_dir`, `BrainContext.from_brain_dir`, `set_brain_dir`), but it is missing `from __future__ import annotations`. Rule 6 requires this import in every Python SDK file.

The same import is missing from:
- `src/gradata/_doctor.py`
- `src/gradata/_embed.py`
- `src/gradata/cli.py`
- `src/gradata/_installer.py`
- `src/gradata/_validator.py`

Add `from __future__ import annotations` as the first import (after the module docstring) in each of these files.

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/_events.py
Line: 198-216

Comment:
**`supersede()` ignores `BrainContext`, silently routes to the wrong database in multi-brain setups**

`emit()` and `query()` both accept an optional `ctx: BrainContext | None` parameter so they can be directed at a specific brain's database. However `supersede()` hard-codes `_p.DB_PATH` (the process-global path) at line 201 and passes no `ctx` to its internal `emit()` call at lines 210–213. In a process with two `Brain` instances, calling `supersede()` on events belonging to the second brain will silently read from and write to the first brain's database.

The same issue affects:
- `correction_rate()` (line 220) — uses `_p.DB_PATH` directly
- `compute_leading_indicators()` (line 231) — uses `_p.DB_PATH` directly
- `audit_trend()` (line 376) — uses `_p.DB_PATH` directly

Each of these functions should accept `ctx: BrainContext | None = None` and resolve `db_path = ctx.db_path if ctx else _p.DB_PATH`, consistent with `emit()` and `query()`.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "test: trigger Greptile review" | Re-trigger Greptile

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Context used:

  • Rule used - # Code Review Rules

Rule 1: Never use print() ... (source)

@Gradata Gradata closed this Apr 6, 2026
@Gradata Gradata deleted the greptile-review-test branch April 6, 2026 05:31
Gradata pushed a commit that referenced this pull request Apr 11, 2026
The recency reminder duplicated the #1 rule at bottom of injection block.
Saves ~135 chars per injection without losing information (rule is already first).

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata added a commit that referenced this pull request Apr 15, 2026
Audit finding gap-analysis/01-internal-audit.md #1.1: RULE_THRESHOLD
was 0.80 while validate_assumptions in rule_engine.py enforces a 0.90
floor at injection time. Any lesson graduated in 0.80-0.89 was silently
blocked from ever firing — non-deterministic behaviour depending on
which gate ran.

Plan:
- Raise RULE_THRESHOLD to 0.90 (injection floor wins; it is the gate
  that governs what actually reaches the model).
- Grep SDK for 0.80/0.90 to catch stale references. self_healing.py's
  DEFAULT_MIN_CONFIDENCE is a distinct detection threshold (catch rules
  whose confidence recently dipped after a correction-cycle penalty);
  left at 0.80 with an explanatory comment.
- Confirm existing tests hold: test_safety_assertion, test_enhancements,
  test_clb, test_spec_compliance range all pass at 0.90.

Adversary check:
- Does 0.90 break pattern->rule promotion tests? No: surviving lessons
  at 0.85 + fsrs_bonus still cross 0.90 in one step (confirmed).
- Does raising the threshold break rule_engine tests? No: rule_engine
  already hard-codes 0.90.
- Does self_healing at 0.80 conflict with the stated goal? Distinct
  concern: it detects rules that *should have* fired, not rules that
  *can* be injected. Documented inline.
Gradata added a commit that referenced this pull request Apr 15, 2026
Audit finding gap-analysis/01-internal-audit.md #1.10: the survival
branch in update_confidence unconditionally incremented fire_count,
bypassing the "no promotion from silence" invariant asserted in
graduate() (self_improvement.py:856-861). A lesson that was never
actually injected could reach PATTERN in 3 sessions purely from
survival bonuses — exactly the failure mode the fire-count gate is
supposed to prevent.

Plan:
- Thread injection evidence through update_confidence via a new
  injected_lesson_keys parameter and an existing lesson attribute
  (_was_injected_this_session). Preserves existing callers that
  don't track injection.
- On survival, still apply the confidence bonus (legacy semantics)
  but only increment fire_count when either signal confirms the
  lesson was injected. sessions_since_fire is also gated on the
  same evidence, so silent lessons correctly age toward UNTESTABLE.
- Extend tests/test_safety_assertion.py with 4 cases covering the
  new behaviour, including the Sybil-style "3 silent survivals"
  scenario that the audit flagged as a promotion-from-silence bug.

Adversary check:
- Existing tests that rely on fire_count post-survival? Audited:
  test_clb.py manually increments fire_count ("# simulate rule being
  applied"), test_enhancements pre-seeds fire_count to the threshold.
  Full suite stays green (2070 passed, 23 skipped).
- Does dropping sessions_since_fire reset hurt UNTESTABLE detection?
  No — it strengthens it. Silent lessons now age correctly.
- Is injected_lesson_keys ever authoritative when upstream hooks
  don't wire it? It's opt-in; lessons without evidence default to
  the conservative "no increment" path, which is the fix's intent.
Gradata added a commit that referenced this pull request Apr 15, 2026
Audit finding gap-analysis/01-internal-audit.md #1.3: the compound
penalty stack (ACCELERATION 1.5 * streak_mult 1.5 * severity_boost 1.8
* rule_override 1.2 * severity_weight 1.3 * FSRS 1.22) can subtract
~0.63 from a RULE at 0.90 in a single correction. Combined with the
Bayesian blend overwrite that follows, the system oscillates under
alternating corrections — exactly the scenario the streak logic is
supposed to handle.

Plan:
- Add MAX_PER_STEP_PENALTY = 0.20 constant with an explanatory comment
  justifying the value (one graduation band's worth of confidence,
  matching MAX_PER_SESSION_DELTA so a single correction cannot chain
  two tier demotions in one tick).
- Reconcile the two update paths. Pick ONE rule: FSRS first, then
  Bayesian blend as a second opinion — but clamp the blend result so
  it can never pull in the opposite direction of the current event.
  Penalty event: blend cannot increase confidence. Reinforcement
  event: blend cannot decrease it. This preserves the Bayesian signal
  while making the per-step update strictly monotone.
- Extend tests with TestPenaltyCap: fully-stacked rewrite on a RULE
  stays within MAX_PER_STEP_PENALTY, and 10 alternating corrections
  never oscillate in the wrong direction.

Adversary check:
- Does the cap weaken rewrite-severity signals? No: 0.20 per step is
  still strong — four rewrite corrections drop a 0.90 RULE to 0.10.
- Does the monotone guard silently hide Bayesian disagreement? No —
  blend is still computed and logged via lesson.alpha/beta, only the
  per-tick assignment is direction-clamped. Over time the posterior
  still dominates via blend_w.
- Does clamping to pre-event confidence break Bayesian convergence?
  For a truly stable posterior, pre/post converge — the clamp is a
  no-op. For a divergent one, FSRS leads and Bayesian follows in the
  same tick-direction.
Gradata added a commit that referenced this pull request Apr 15, 2026
…#71)

* fix(confidence): align RULE_THRESHOLD with injection floor at 0.90

Audit finding gap-analysis/01-internal-audit.md #1.1: RULE_THRESHOLD
was 0.80 while validate_assumptions in rule_engine.py enforces a 0.90
floor at injection time. Any lesson graduated in 0.80-0.89 was silently
blocked from ever firing — non-deterministic behaviour depending on
which gate ran.

Plan:
- Raise RULE_THRESHOLD to 0.90 (injection floor wins; it is the gate
  that governs what actually reaches the model).
- Grep SDK for 0.80/0.90 to catch stale references. self_healing.py's
  DEFAULT_MIN_CONFIDENCE is a distinct detection threshold (catch rules
  whose confidence recently dipped after a correction-cycle penalty);
  left at 0.80 with an explanatory comment.
- Confirm existing tests hold: test_safety_assertion, test_enhancements,
  test_clb, test_spec_compliance range all pass at 0.90.

Adversary check:
- Does 0.90 break pattern->rule promotion tests? No: surviving lessons
  at 0.85 + fsrs_bonus still cross 0.90 in one step (confirmed).
- Does raising the threshold break rule_engine tests? No: rule_engine
  already hard-codes 0.90.
- Does self_healing at 0.80 conflict with the stated goal? Distinct
  concern: it detects rules that *should have* fired, not rules that
  *can* be injected. Documented inline.

* fix(confidence): gate survival fire_count on injection evidence

Audit finding gap-analysis/01-internal-audit.md #1.10: the survival
branch in update_confidence unconditionally incremented fire_count,
bypassing the "no promotion from silence" invariant asserted in
graduate() (self_improvement.py:856-861). A lesson that was never
actually injected could reach PATTERN in 3 sessions purely from
survival bonuses — exactly the failure mode the fire-count gate is
supposed to prevent.

Plan:
- Thread injection evidence through update_confidence via a new
  injected_lesson_keys parameter and an existing lesson attribute
  (_was_injected_this_session). Preserves existing callers that
  don't track injection.
- On survival, still apply the confidence bonus (legacy semantics)
  but only increment fire_count when either signal confirms the
  lesson was injected. sessions_since_fire is also gated on the
  same evidence, so silent lessons correctly age toward UNTESTABLE.
- Extend tests/test_safety_assertion.py with 4 cases covering the
  new behaviour, including the Sybil-style "3 silent survivals"
  scenario that the audit flagged as a promotion-from-silence bug.

Adversary check:
- Existing tests that rely on fire_count post-survival? Audited:
  test_clb.py manually increments fire_count ("# simulate rule being
  applied"), test_enhancements pre-seeds fire_count to the threshold.
  Full suite stays green (2070 passed, 23 skipped).
- Does dropping sessions_since_fire reset hurt UNTESTABLE detection?
  No — it strengthens it. Silent lessons now age correctly.
- Is injected_lesson_keys ever authoritative when upstream hooks
  don't wire it? It's opt-in; lessons without evidence default to
  the conservative "no increment" path, which is the fix's intent.

* fix(confidence): cap per-step penalty and enforce monotone updates

Audit finding gap-analysis/01-internal-audit.md #1.3: the compound
penalty stack (ACCELERATION 1.5 * streak_mult 1.5 * severity_boost 1.8
* rule_override 1.2 * severity_weight 1.3 * FSRS 1.22) can subtract
~0.63 from a RULE at 0.90 in a single correction. Combined with the
Bayesian blend overwrite that follows, the system oscillates under
alternating corrections — exactly the scenario the streak logic is
supposed to handle.

Plan:
- Add MAX_PER_STEP_PENALTY = 0.20 constant with an explanatory comment
  justifying the value (one graduation band's worth of confidence,
  matching MAX_PER_SESSION_DELTA so a single correction cannot chain
  two tier demotions in one tick).
- Reconcile the two update paths. Pick ONE rule: FSRS first, then
  Bayesian blend as a second opinion — but clamp the blend result so
  it can never pull in the opposite direction of the current event.
  Penalty event: blend cannot increase confidence. Reinforcement
  event: blend cannot decrease it. This preserves the Bayesian signal
  while making the per-step update strictly monotone.
- Extend tests with TestPenaltyCap: fully-stacked rewrite on a RULE
  stays within MAX_PER_STEP_PENALTY, and 10 alternating corrections
  never oscillate in the wrong direction.

Adversary check:
- Does the cap weaken rewrite-severity signals? No: 0.20 per step is
  still strong — four rewrite corrections drop a 0.90 RULE to 0.10.
- Does the monotone guard silently hide Bayesian disagreement? No —
  blend is still computed and logged via lesson.alpha/beta, only the
  per-tick assignment is direction-clamped. Over time the posterior
  still dominates via blend_w.
- Does clamping to pre-event confidence break Bayesian convergence?
  For a truly stable posterior, pre/post converge — the clamp is a
  no-op. For a divergent one, FSRS leads and Bayesian follows in the
  same tick-direction.

* fix(confidence): cap per-session delta at one graduation tier transition

Audit finding gap-analysis/01-internal-audit.md Gap 4 (red-team note):
the Sybil attack. Given +0.12 per rewrite reinforcement, 10 coordinated
same-session corrections push a fresh lesson 0 -> PATTERN -> RULE in
one tick. Fire-count gates do not catch this once a brain has enough
historical fires; the math alone lets confidence chain two tier
transitions inside one update_confidence call.

Plan:
- Add MAX_PER_SESSION_DELTA = 0.30 constant. Value chosen so at most
  ONE graduation tier can be crossed (INSTINCT->PATTERN spans 0.20;
  PATTERN->RULE spans 0.30; INSTINCT->RULE would need 0.50).
- Snapshot _pre_session_confidence and _pre_session_state on first
  entry into the per-lesson loop. The pre-confidence attribute was
  already read by graduate() for its jump warning — this extends the
  existing pattern so update_confidence is the single source of truth.
- Clamp confidence to [pre - delta, pre + delta] AFTER per-correction
  updates, BEFORE the inline promotion loop evaluates thresholds.
- Guard both the inline update_confidence graduation AND the separate
  graduate() function so neither can re-promote a lesson whose tier
  already changed this session. Document the invariant inline.
- Extend tests with the Sybil burst scenario: 10 stacked rewrite
  reinforcements must not promote past ONE tier and must respect
  the absolute delta cap.

Adversary check:
- Does 0.30 make real convergence too slow? A lesson still crosses
  one tier per session when corrections support it. Crossing two
  tiers in one session was never the design intent — graduate()'s
  docstring explicitly says fire-count gates are non-bypassable.
- Does the snapshot leak state across sessions? _pre_session_confidence
  is set on first entry; it already existed as a caller-owned attr
  (see test_safety_assertion::TestConfidenceJumpWarning). It survives
  for the lifetime of the Lesson object which, in the SDK, is
  per-session (lessons are re-parsed from disk each tick).
- Does blocking graduate() promotion for already-transitioned lessons
  cause silent demotions? No — the block is symmetric with the
  original _pre_session_state check and transitions are monotone
  per-tier in graduate() anyway (no double-step code path exists).
Gradata pushed a commit that referenced this pull request Apr 20, 2026
…it, multi-agent infra

SDK Portability:
- Created shared .claude/hooks/config.js (env-var driven, single source of truth)
- Migrated 17 hooks from hardcoded paths to config.js imports
- Only config.js has machine-specific defaults now

Wrap-Up Bug Fixes (4 P0 + 5 P1):
- P0: Fixed edited_by_user/edited_by_oliver split-brain (calibration was always wrong)
- P0: Fixed session off-by-one in session-close-data.js
- P0: Fixed double confidence update via marker file guard
- P0: Fixed major_edit phantom field in _store_results
- P1: Fixed check 12 (queries activity_log), check 15 (uses brier_score)
- P1: Auto-detect session type, replaced dead check 08, wrap-up-completed guard

LOOP Split (~7K token savings per non-pipeline prompt):
- loop-core: 13 CRITICAL guardrails (always-on)
- loop-advanced: 54 optimization/analytics rules (recall-triggered)
- COMMANDS domain set inactive (zero usage across all sessions)

New Hooks:
- gate-inject.js: auto-loads gates based on intent (fixes #1 error pattern)
- post-compact.js: saves state before context compression (Audrey PostCompact)
- peer-announce.js: auto-announces session to peers MCP

Multi-Agent Infrastructure:
- Enabled Agent Teams (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS)
- Installed Claude Peers MCP server
- Installed Octopus + UI UX Pro Max plugins

Infrastructure:
- Killed ChromaDB step in brain-maintain.js (saves ~45s/session)
- Wired lesson_applications table for closed-loop learning
- Event emission: HEALTH_CHECK on start + SESSION_END on close (8%→100%)
- Cleaned 5.3GB stale worktrees
- Fixed 2>/dev/null → 2>/dev/null bash compat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gradata added a commit that referenced this pull request May 2, 2026
* cleanup(enhancements): move retrieval_fusion to scoring/, flip Beta-LB gate, add invariant + obfuscation tests

Council v4 verdict (council_2026-05-02T11-59-00.md) and v4-rerun
(council_2026-05-02T11-59-00.md) flagged a small set of
production-readiness items that don't depend on the larger dual-write
work. This commit lands those independently so dual-write atomicity can
ship as its own reviewable PR.

What
- Move src/gradata/enhancements/retrieval_fusion.py into
  enhancements/scoring/retrieval_fusion.py and update importers.
  Council vote 5/7 — RRF is a ranking primitive, lives more naturally
  with scoring/ than as a sibling.
- Flip GRADATA_BETA_LB_GATE default ON in
  enhancements/self_improvement/_graduation.py. The 2026-04 ablation
  documented in the file showed ~15-20% of RULE-tier graduations
  miscalibrated by format-not-content; shipping the fix opt-in was
  textbook silent regression (council 5/7). GRADATA_BETA_LB_GATE=0
  preserves the override-off escape hatch.
- New tests/test_initial_confidence_invariant.py — locks the
  INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary that almost
  promoted every fresh lesson before strict-> was wired in.
- New tests/test_score_obfuscation_gate.py — CI gate that fails the
  build if any raw confidence float in [0,1] leaks into the
  <brain-rules> prompt-bound payload. middleware/_core.py
  build_brain_rules_block() updated to obfuscate.

Why
- Each item is independently testable, low-risk, and clears the runway
  for the dual-write atomicity PR.
- Beta-LB default-on closes a known correctness hole that ships every
  release until flipped.
- Obfuscation gate converts a comment-level guarantee
  (security/score_obfuscation.py) into an enforced one.

Test plan
- pytest tests/test_initial_confidence_invariant.py
  tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py
  tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed.
- pyright src/ — 0 errors, 27 warnings (unchanged baseline).
- ruff on changed files — clean.

Layering check
- No Layer 0 → 2 imports introduced.

Risk
- Beta-LB flip changes the default for graduation calibration. Users
  relying on miscalibrated PATTERN→RULE behavior will see fewer
  graduations until they set GRADATA_BETA_LB_GATE=0. This is the
  intended fix.

Council references
- council_2026-05-02T11-08-25.md (initial v4 review)
- council_2026-05-02T11-59-00.md (v4 rerun, all 7 lenses through
  fallback chain)
- council_2026-05-02T12-24-08.md (PR sequencing decision)

* fix(events): JSONL canonical, SQLite projection, reconcile-on-init, doctor --reconcile

Council v4 (council_2026-05-02T11-59-00.md) ranked dual-write atomicity
the #1 production blocker. Crash mid-write between events.jsonl append
and SQLite INSERT could leave the brain in silent split-brain state
with no recovery path.

What
- src/gradata/_events.py
  - JSONL is the canonical source of truth. Append + fsync FIRST,
    SQLite INSERT is now an idempotent projection derived from JSONL.
  - Added reconcile_jsonl_to_sqlite() that scans JSONL past the
    SQLite watermark and replays missing rows.
  - Single SQLite projection helper used by both the live write path
    and the retain orchestrator.
  - Env-gated crash-window delay for deterministic kill-9 testing
    only (no production effect).
- src/gradata/brain.py
  - Brain.__init__ runs JSONL → SQLite reconciliation after migrations.
  - Brain() resolves BRAIN_DIR / cwd when no explicit path is supplied.
  - observe(text, kind="correction") public event API used by the
    PR2 spec.
- src/gradata/cli.py + src/gradata/_doctor.py
  - New `gradata doctor --reconcile`: scans for drift, reports the
    count, replays missing JSONL rows into SQLite, exits non-zero on
    inconsistency that can't be healed.
- tests/test_dualwrite_atomicity.py
  - Path-agnostic public-API tests covering: happy path, kill-9 mid
    batch (JSONL must lead SQLite, never trail), reconcile replay,
    idempotency, doctor CLI drift report, concurrent-writer JSONL
    line integrity.

Why
- Before: dual-write claimed atomic in CLAUDE.md, no two-phase commit,
  no recovery. Crash → silent data loss or duplicate-on-replay.
- After: JSONL is the log, SQLite is the projection. Every reopen
  reconciles. doctor --reconcile is the operator escape hatch.
  Property: jsonl_count >= sqlite_count, always.

Test plan
- pytest tests/test_dualwrite_atomicity.py — 6 passed.
- Full focused regression on changed surface — 42 passed.
- Non-integration suite (excluding socket-bound daemon/plugin tests
  blocked by sandbox) — 4130 passed, 4 skipped.
- pyright src/ — 0 errors, 27 warnings (unchanged baseline).

Layering check
- _events.py is Layer 0. Brain.__init__ in Layer 2 calls into it.
  No upward imports introduced.

Risk
- Reconcile-on-init runs on every Brain open. For a brain with
  100k events this adds ~50ms-200ms one-time at startup. Watermark
  is incremental so subsequent opens are O(drift) not O(total).
- Concurrent writers serialize via JSONL append + advisory lock.
  Throughput trade-off is acceptable for correctness.

Council references
- council_2026-05-02T11-59-00.md (v4 RISK class, all 7 lenses)
- council_2026-05-02T12-24-08.md (PR sequencing — TDD-first)

Stacks on #163.

* fix(cloud/client): coerce float ts and non-int session for /sync

The backfill script and incremental sync both crashed on real-world
events.jsonl rows that contain:
  - float ts (epoch seconds, e.g. 1776803751.89) — broke str-vs-str
    comparison against the watermark cursor with TypeError.
  - float or string session values (e.g. 4.5, UUID strings) — server
    schema rejects non-ints with HTTP 422.

Coerce ts to str and session to int|None at the format-event boundary.
Also surface the response body in HTTPError so 4xx/5xx debugging is
not opaque.

Discovered while running scripts/backfill_to_cloud.py against a
brain with ~28k events accumulated since 2026-03-22.

---------

Co-authored-by: Oliver Le <oliverle94@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant