Review: Full codebase audit via Greptile by Gradata · Pull Request #1 · Gradata/gradata

Gradata · 2026-04-06T05:17:30Z

Purpose

Trigger Greptile's automated review on the full v0.3.0 codebase.

This is a test PR to evaluate Greptile's review quality against our own internal review pipeline (autoresearch + simplify + security audit + adversary).

What to review

src/gradata/ — full SDK
src/gradata/hooks/ — nervous system hooks
tests/ — test suite

After review

Compare Greptile findings vs our own. Delete this PR either way.

Generated with Gradata

Greptile Summary

This PR adds a single greptile-review.md trigger file as a test of Greptile's review quality against the team's internal pipeline, auditing the full v0.3.0 "Nervous System" release.

The new EventBus design (events_bus.py) is well-implemented — handler failures are correctly isolated via _safe_call, listener count is bounded at 50, async handlers use a bounded thread pool, and the integration test (test_bus_error_isolation) verifies that a crashing subscriber does not interrupt the correction pipeline. The dual-write event persistence in _events.py (JSONL + SQLite with WAL mode) is solid.

Key findings from the audit:

Multi-brain ctx gap (_events.py): supersede(), correction_rate(), compute_leading_indicators(), and audit_trend() bypass the BrainContext injection that emit() and query() support, silently routing all calls to the global _p.DB_PATH. Any user running two Brain instances in a single process will encounter silent cross-brain data corruption on these four code paths.
print() instead of logging (Rule 1): Eight production sites across _events.py, rules/rule_tracker.py, mcp_server.py, and _embed.py use print(..., file=sys.stderr) for error and warning messages, making them invisible to callers that use structured logging.
Missing from __future__ import annotations (Rule 6): Six SDK modules (_paths.py, _doctor.py, _embed.py, cli.py, _installer.py, _validator.py) are missing the required future import despite using union-type annotations (str | Path | None)."

Confidence Score: 4/5

Codebase is structurally sound with 1339 passing tests; the ctx-gap and logging violations are clean-up items that do not affect single-brain users.

EventBus error isolation is correctly implemented and tested. The main issues are: (1) a multi-brain isolation bug in four _events.py functions that only affects multi-Brain processes, (2) print() logging violations across 4 files, and (3) missing future imports in 6 files. None of these block single-brain functionality. Score is 4 rather than 5 because the ctx-gap in supersede() is a real silent-data-corruption risk for any user running two Brain instances simultaneously.

src/gradata/_events.py (ctx-gap + logging violations), src/gradata/rules/rule_tracker.py (print logging), src/gradata/mcp_server.py (print logging), and the six SDK modules missing from future import annotations

Important Files Changed

Filename	Overview
src/gradata/_events.py	Core event persistence layer; two print() logging violations and four ctx-unaware functions break multi-brain isolation
src/gradata/_paths.py	Path resolution module; missing from future import annotations despite using union type syntax throughout
src/gradata/events_bus.py	In-memory EventBus correctly isolates handler errors, bounds listeners to 50, and uses a thread pool — well-implemented
src/gradata/hooks/auto_correct.py	PostToolUse hook; well-structured with proper error handling at every call site
src/gradata/_core.py	Brain core logic; robust try/except wrapping throughout the correction and graduation pipeline
src/gradata/rules/rule_tracker.py	Rule application tracker; three print() warning calls should be replaced with logging.warning()
src/gradata/mcp_server.py	MCP JSON-RPC server; two print() calls in brain-init path should use logging.info() / logging.error()
src/gradata/_embed.py	Embedding client; print() for missing API key error and missing from future import annotations

Sequence Diagram

sequenceDiagram
    participant U as User/Agent
    participant B as Brain
    participant EB as EventBus (in-memory)
    participant EV as _events (JSONL+SQLite)

    U->>B: brain.correct(draft, final)
    B->>EV: emit("CORRECTION", ...) — dual-write
    EV-->>B: event dict (with id)
    B->>EB: bus.emit("correction.created", payload)
    EB-->>EB: _safe_call(handler, payload) — errors isolated

    U->>B: brain.end_session()
    B->>EV: emit("SESSION_END", ...)
    B->>EB: bus.emit("session.ended", stats)
    EB-->>EB: _safe_call per subscriber

Comments Outside Diff (3)

src/gradata/_events.py, line 107-122 (link)

print() used for error logging

Both JSONL and SQLite write-failure handlers use print(..., file=sys.stderr) instead of the logging module. These messages are invisible to any caller that routes log output through a handler, and cannot be filtered by severity level. The same pattern appears in several other SDK modules:

src/gradata/rules/rule_tracker.py:63, :87, :122 — warning-level messages
src/gradata/mcp_server.py:505, :511 — brain-init status messages
src/gradata/_embed.py:149 — missing API key error

All of these should use logging.warning() or logging.error() so callers can control log visibility. For example, in _events.py lines 107 and 122:

        except Exception as e:
            logging.getLogger("gradata.events").error("[events] JSONL write failed: %s", e)

        except Exception as e:
            logging.getLogger("gradata.events").error("[events] SQLite write failed: %s", e)

Rule Used: # Code Review Rules

Rule 1: Never use print() ... (source)

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/gradata/_events.py
Line: 107-122

Comment:
**`print()` used for error logging**

Both JSONL and SQLite write-failure handlers use `print(..., file=sys.stderr)` instead of the `logging` module. These messages are invisible to any caller that routes log output through a handler, and cannot be filtered by severity level. The same pattern appears in several other SDK modules:

- `src/gradata/rules/rule_tracker.py:63`, `:87`, `:122` — warning-level messages
- `src/gradata/mcp_server.py:505`, `:511` — brain-init status messages
- `src/gradata/_embed.py:149` — missing API key error

All of these should use `logging.warning()` or `logging.error()` so callers can control log visibility. For example, in `_events.py` lines 107 and 122:

```python
        except Exception as e:
            logging.getLogger("gradata.events").error("[events] JSONL write failed: %s", e)
```

```python
        except Exception as e:
            logging.getLogger("gradata.events").error("[events] SQLite write failed: %s", e)
```

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

src/gradata/_paths.py, line 1-12 (link)

Missing from __future__ import annotations

_paths.py is a core SDK module that uses str | Path | None union syntax in several public function signatures (e.g. resolve_brain_dir, BrainContext.from_brain_dir, set_brain_dir), but it is missing from __future__ import annotations. Rule 6 requires this import in every Python SDK file.

The same import is missing from:

src/gradata/_doctor.py
src/gradata/_embed.py
src/gradata/cli.py
src/gradata/_installer.py
src/gradata/_validator.py

Add from __future__ import annotations as the first import (after the module docstring) in each of these files.

Rule Used: # Code Review Rules

Rule 1: Never use print() ... (source)

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/gradata/_paths.py
Line: 1-12

Comment:
**Missing `from __future__ import annotations`**

`_paths.py` is a core SDK module that uses `str | Path | None` union syntax in several public function signatures (e.g. `resolve_brain_dir`, `BrainContext.from_brain_dir`, `set_brain_dir`), but it is missing `from __future__ import annotations`. Rule 6 requires this import in every Python SDK file.

The same import is missing from:
- `src/gradata/_doctor.py`
- `src/gradata/_embed.py`
- `src/gradata/cli.py`
- `src/gradata/_installer.py`
- `src/gradata/_validator.py`

Add `from __future__ import annotations` as the first import (after the module docstring) in each of these files.

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

src/gradata/_events.py, line 198-216 (link)

supersede() ignores BrainContext, silently routes to the wrong database in multi-brain setups

emit() and query() both accept an optional ctx: BrainContext | None parameter so they can be directed at a specific brain's database. However supersede() hard-codes _p.DB_PATH (the process-global path) at line 201 and passes no ctx to its internal emit() call at lines 210–213. In a process with two Brain instances, calling supersede() on events belonging to the second brain will silently read from and write to the first brain's database.

The same issue affects:

correction_rate() (line 220) — uses _p.DB_PATH directly
compute_leading_indicators() (line 231) — uses _p.DB_PATH directly
audit_trend() (line 376) — uses _p.DB_PATH directly

Each of these functions should accept ctx: BrainContext | None = None and resolve db_path = ctx.db_path if ctx else _p.DB_PATH, consistent with emit() and query().

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/gradata/_events.py
Line: 198-216

Comment:
**`supersede()` ignores `BrainContext`, silently routes to the wrong database in multi-brain setups**

`emit()` and `query()` both accept an optional `ctx: BrainContext | None` parameter so they can be directed at a specific brain's database. However `supersede()` hard-codes `_p.DB_PATH` (the process-global path) at line 201 and passes no `ctx` to its internal `emit()` call at lines 210–213. In a process with two `Brain` instances, calling `supersede()` on events belonging to the second brain will silently read from and write to the first brain's database.

The same issue affects:
- `correction_rate()` (line 220) — uses `_p.DB_PATH` directly
- `compute_leading_indicators()` (line 231) — uses `_p.DB_PATH` directly
- `audit_trend()` (line 376) — uses `_p.DB_PATH` directly

Each of these functions should accept `ctx: BrainContext | None = None` and resolve `db_path = ctx.db_path if ctx else _p.DB_PATH`, consistent with `emit()` and `query()`.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/gradata/_events.py
Line: 107-122

Comment:
**`print()` used for error logging**

Both JSONL and SQLite write-failure handlers use `print(..., file=sys.stderr)` instead of the `logging` module. These messages are invisible to any caller that routes log output through a handler, and cannot be filtered by severity level. The same pattern appears in several other SDK modules:

- `src/gradata/rules/rule_tracker.py:63`, `:87`, `:122` — warning-level messages
- `src/gradata/mcp_server.py:505`, `:511` — brain-init status messages
- `src/gradata/_embed.py:149` — missing API key error

All of these should use `logging.warning()` or `logging.error()` so callers can control log visibility. For example, in `_events.py` lines 107 and 122:

```python
        except Exception as e:
            logging.getLogger("gradata.events").error("[events] JSONL write failed: %s", e)
```

```python
        except Exception as e:
            logging.getLogger("gradata.events").error("[events] SQLite write failed: %s", e)
```

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/_paths.py
Line: 1-12

Comment:
**Missing `from __future__ import annotations`**

`_paths.py` is a core SDK module that uses `str | Path | None` union syntax in several public function signatures (e.g. `resolve_brain_dir`, `BrainContext.from_brain_dir`, `set_brain_dir`), but it is missing `from __future__ import annotations`. Rule 6 requires this import in every Python SDK file.

The same import is missing from:
- `src/gradata/_doctor.py`
- `src/gradata/_embed.py`
- `src/gradata/cli.py`
- `src/gradata/_installer.py`
- `src/gradata/_validator.py`

Add `from __future__ import annotations` as the first import (after the module docstring) in each of these files.

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/_events.py
Line: 198-216

Comment:
**`supersede()` ignores `BrainContext`, silently routes to the wrong database in multi-brain setups**

`emit()` and `query()` both accept an optional `ctx: BrainContext | None` parameter so they can be directed at a specific brain's database. However `supersede()` hard-codes `_p.DB_PATH` (the process-global path) at line 201 and passes no `ctx` to its internal `emit()` call at lines 210–213. In a process with two `Brain` instances, calling `supersede()` on events belonging to the second brain will silently read from and write to the first brain's database.

The same issue affects:
- `correction_rate()` (line 220) — uses `_p.DB_PATH` directly
- `compute_leading_indicators()` (line 231) — uses `_p.DB_PATH` directly
- `audit_trend()` (line 376) — uses `_p.DB_PATH` directly

Each of these functions should accept `ctx: BrainContext | None = None` and resolve `db_path = ctx.db_path if ctx else _p.DB_PATH`, consistent with `emit()` and `query()`.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "test: trigger Greptile review" | Re-trigger Greptile}

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

Context used:

Rule used - # Code Review Rules

Rule 1: Never use print() ... (source)

The recency reminder duplicated the #1 rule at bottom of injection block. Saves ~135 chars per injection without losing information (rule is already first). Co-Authored-By: Gradata <noreply@gradata.ai>

Audit finding gap-analysis/01-internal-audit.md #1.1: RULE_THRESHOLD was 0.80 while validate_assumptions in rule_engine.py enforces a 0.90 floor at injection time. Any lesson graduated in 0.80-0.89 was silently blocked from ever firing — non-deterministic behaviour depending on which gate ran. Plan: - Raise RULE_THRESHOLD to 0.90 (injection floor wins; it is the gate that governs what actually reaches the model). - Grep SDK for 0.80/0.90 to catch stale references. self_healing.py's DEFAULT_MIN_CONFIDENCE is a distinct detection threshold (catch rules whose confidence recently dipped after a correction-cycle penalty); left at 0.80 with an explanatory comment. - Confirm existing tests hold: test_safety_assertion, test_enhancements, test_clb, test_spec_compliance range all pass at 0.90. Adversary check: - Does 0.90 break pattern->rule promotion tests? No: surviving lessons at 0.85 + fsrs_bonus still cross 0.90 in one step (confirmed). - Does raising the threshold break rule_engine tests? No: rule_engine already hard-codes 0.90. - Does self_healing at 0.80 conflict with the stated goal? Distinct concern: it detects rules that *should have* fired, not rules that *can* be injected. Documented inline.

Audit finding gap-analysis/01-internal-audit.md #1.10: the survival branch in update_confidence unconditionally incremented fire_count, bypassing the "no promotion from silence" invariant asserted in graduate() (self_improvement.py:856-861). A lesson that was never actually injected could reach PATTERN in 3 sessions purely from survival bonuses — exactly the failure mode the fire-count gate is supposed to prevent. Plan: - Thread injection evidence through update_confidence via a new injected_lesson_keys parameter and an existing lesson attribute (_was_injected_this_session). Preserves existing callers that don't track injection. - On survival, still apply the confidence bonus (legacy semantics) but only increment fire_count when either signal confirms the lesson was injected. sessions_since_fire is also gated on the same evidence, so silent lessons correctly age toward UNTESTABLE. - Extend tests/test_safety_assertion.py with 4 cases covering the new behaviour, including the Sybil-style "3 silent survivals" scenario that the audit flagged as a promotion-from-silence bug. Adversary check: - Existing tests that rely on fire_count post-survival? Audited: test_clb.py manually increments fire_count ("# simulate rule being applied"), test_enhancements pre-seeds fire_count to the threshold. Full suite stays green (2070 passed, 23 skipped). - Does dropping sessions_since_fire reset hurt UNTESTABLE detection? No — it strengthens it. Silent lessons now age correctly. - Is injected_lesson_keys ever authoritative when upstream hooks don't wire it? It's opt-in; lessons without evidence default to the conservative "no increment" path, which is the fix's intent.

Audit finding gap-analysis/01-internal-audit.md #1.3: the compound penalty stack (ACCELERATION 1.5 * streak_mult 1.5 * severity_boost 1.8 * rule_override 1.2 * severity_weight 1.3 * FSRS 1.22) can subtract ~0.63 from a RULE at 0.90 in a single correction. Combined with the Bayesian blend overwrite that follows, the system oscillates under alternating corrections — exactly the scenario the streak logic is supposed to handle. Plan: - Add MAX_PER_STEP_PENALTY = 0.20 constant with an explanatory comment justifying the value (one graduation band's worth of confidence, matching MAX_PER_SESSION_DELTA so a single correction cannot chain two tier demotions in one tick). - Reconcile the two update paths. Pick ONE rule: FSRS first, then Bayesian blend as a second opinion — but clamp the blend result so it can never pull in the opposite direction of the current event. Penalty event: blend cannot increase confidence. Reinforcement event: blend cannot decrease it. This preserves the Bayesian signal while making the per-step update strictly monotone. - Extend tests with TestPenaltyCap: fully-stacked rewrite on a RULE stays within MAX_PER_STEP_PENALTY, and 10 alternating corrections never oscillate in the wrong direction. Adversary check: - Does the cap weaken rewrite-severity signals? No: 0.20 per step is still strong — four rewrite corrections drop a 0.90 RULE to 0.10. - Does the monotone guard silently hide Bayesian disagreement? No — blend is still computed and logged via lesson.alpha/beta, only the per-tick assignment is direction-clamped. Over time the posterior still dominates via blend_w. - Does clamping to pre-event confidence break Bayesian convergence? For a truly stable posterior, pre/post converge — the clamp is a no-op. For a divergent one, FSRS leads and Bayesian follows in the same tick-direction.

…#71) * fix(confidence): align RULE_THRESHOLD with injection floor at 0.90 Audit finding gap-analysis/01-internal-audit.md #1.1: RULE_THRESHOLD was 0.80 while validate_assumptions in rule_engine.py enforces a 0.90 floor at injection time. Any lesson graduated in 0.80-0.89 was silently blocked from ever firing — non-deterministic behaviour depending on which gate ran. Plan: - Raise RULE_THRESHOLD to 0.90 (injection floor wins; it is the gate that governs what actually reaches the model). - Grep SDK for 0.80/0.90 to catch stale references. self_healing.py's DEFAULT_MIN_CONFIDENCE is a distinct detection threshold (catch rules whose confidence recently dipped after a correction-cycle penalty); left at 0.80 with an explanatory comment. - Confirm existing tests hold: test_safety_assertion, test_enhancements, test_clb, test_spec_compliance range all pass at 0.90. Adversary check: - Does 0.90 break pattern->rule promotion tests? No: surviving lessons at 0.85 + fsrs_bonus still cross 0.90 in one step (confirmed). - Does raising the threshold break rule_engine tests? No: rule_engine already hard-codes 0.90. - Does self_healing at 0.80 conflict with the stated goal? Distinct concern: it detects rules that *should have* fired, not rules that *can* be injected. Documented inline. * fix(confidence): gate survival fire_count on injection evidence Audit finding gap-analysis/01-internal-audit.md #1.10: the survival branch in update_confidence unconditionally incremented fire_count, bypassing the "no promotion from silence" invariant asserted in graduate() (self_improvement.py:856-861). A lesson that was never actually injected could reach PATTERN in 3 sessions purely from survival bonuses — exactly the failure mode the fire-count gate is supposed to prevent. Plan: - Thread injection evidence through update_confidence via a new injected_lesson_keys parameter and an existing lesson attribute (_was_injected_this_session). Preserves existing callers that don't track injection. - On survival, still apply the confidence bonus (legacy semantics) but only increment fire_count when either signal confirms the lesson was injected. sessions_since_fire is also gated on the same evidence, so silent lessons correctly age toward UNTESTABLE. - Extend tests/test_safety_assertion.py with 4 cases covering the new behaviour, including the Sybil-style "3 silent survivals" scenario that the audit flagged as a promotion-from-silence bug. Adversary check: - Existing tests that rely on fire_count post-survival? Audited: test_clb.py manually increments fire_count ("# simulate rule being applied"), test_enhancements pre-seeds fire_count to the threshold. Full suite stays green (2070 passed, 23 skipped). - Does dropping sessions_since_fire reset hurt UNTESTABLE detection? No — it strengthens it. Silent lessons now age correctly. - Is injected_lesson_keys ever authoritative when upstream hooks don't wire it? It's opt-in; lessons without evidence default to the conservative "no increment" path, which is the fix's intent. * fix(confidence): cap per-step penalty and enforce monotone updates Audit finding gap-analysis/01-internal-audit.md #1.3: the compound penalty stack (ACCELERATION 1.5 * streak_mult 1.5 * severity_boost 1.8 * rule_override 1.2 * severity_weight 1.3 * FSRS 1.22) can subtract ~0.63 from a RULE at 0.90 in a single correction. Combined with the Bayesian blend overwrite that follows, the system oscillates under alternating corrections — exactly the scenario the streak logic is supposed to handle. Plan: - Add MAX_PER_STEP_PENALTY = 0.20 constant with an explanatory comment justifying the value (one graduation band's worth of confidence, matching MAX_PER_SESSION_DELTA so a single correction cannot chain two tier demotions in one tick). - Reconcile the two update paths. Pick ONE rule: FSRS first, then Bayesian blend as a second opinion — but clamp the blend result so it can never pull in the opposite direction of the current event. Penalty event: blend cannot increase confidence. Reinforcement event: blend cannot decrease it. This preserves the Bayesian signal while making the per-step update strictly monotone. - Extend tests with TestPenaltyCap: fully-stacked rewrite on a RULE stays within MAX_PER_STEP_PENALTY, and 10 alternating corrections never oscillate in the wrong direction. Adversary check: - Does the cap weaken rewrite-severity signals? No: 0.20 per step is still strong — four rewrite corrections drop a 0.90 RULE to 0.10. - Does the monotone guard silently hide Bayesian disagreement? No — blend is still computed and logged via lesson.alpha/beta, only the per-tick assignment is direction-clamped. Over time the posterior still dominates via blend_w. - Does clamping to pre-event confidence break Bayesian convergence? For a truly stable posterior, pre/post converge — the clamp is a no-op. For a divergent one, FSRS leads and Bayesian follows in the same tick-direction. * fix(confidence): cap per-session delta at one graduation tier transition Audit finding gap-analysis/01-internal-audit.md Gap 4 (red-team note): the Sybil attack. Given +0.12 per rewrite reinforcement, 10 coordinated same-session corrections push a fresh lesson 0 -> PATTERN -> RULE in one tick. Fire-count gates do not catch this once a brain has enough historical fires; the math alone lets confidence chain two tier transitions inside one update_confidence call. Plan: - Add MAX_PER_SESSION_DELTA = 0.30 constant. Value chosen so at most ONE graduation tier can be crossed (INSTINCT->PATTERN spans 0.20; PATTERN->RULE spans 0.30; INSTINCT->RULE would need 0.50). - Snapshot _pre_session_confidence and _pre_session_state on first entry into the per-lesson loop. The pre-confidence attribute was already read by graduate() for its jump warning — this extends the existing pattern so update_confidence is the single source of truth. - Clamp confidence to [pre - delta, pre + delta] AFTER per-correction updates, BEFORE the inline promotion loop evaluates thresholds. - Guard both the inline update_confidence graduation AND the separate graduate() function so neither can re-promote a lesson whose tier already changed this session. Document the invariant inline. - Extend tests with the Sybil burst scenario: 10 stacked rewrite reinforcements must not promote past ONE tier and must respect the absolute delta cap. Adversary check: - Does 0.30 make real convergence too slow? A lesson still crosses one tier per session when corrections support it. Crossing two tiers in one session was never the design intent — graduate()'s docstring explicitly says fire-count gates are non-bypassable. - Does the snapshot leak state across sessions? _pre_session_confidence is set on first entry; it already existed as a caller-owned attr (see test_safety_assertion::TestConfidenceJumpWarning). It survives for the lifetime of the Lesson object which, in the SDK, is per-session (lessons are re-parsed from disk each tick). - Does blocking graduate() promotion for already-transitioned lessons cause silent demotions? No — the block is symmetric with the original _pre_session_state check and transitions are monotone per-tier in graduate() anyway (no double-step code path exists).

…it, multi-agent infra SDK Portability: - Created shared .claude/hooks/config.js (env-var driven, single source of truth) - Migrated 17 hooks from hardcoded paths to config.js imports - Only config.js has machine-specific defaults now Wrap-Up Bug Fixes (4 P0 + 5 P1): - P0: Fixed edited_by_user/edited_by_oliver split-brain (calibration was always wrong) - P0: Fixed session off-by-one in session-close-data.js - P0: Fixed double confidence update via marker file guard - P0: Fixed major_edit phantom field in _store_results - P1: Fixed check 12 (queries activity_log), check 15 (uses brier_score) - P1: Auto-detect session type, replaced dead check 08, wrap-up-completed guard LOOP Split (~7K token savings per non-pipeline prompt): - loop-core: 13 CRITICAL guardrails (always-on) - loop-advanced: 54 optimization/analytics rules (recall-triggered) - COMMANDS domain set inactive (zero usage across all sessions) New Hooks: - gate-inject.js: auto-loads gates based on intent (fixes #1 error pattern) - post-compact.js: saves state before context compression (Audrey PostCompact) - peer-announce.js: auto-announces session to peers MCP Multi-Agent Infrastructure: - Enabled Agent Teams (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS) - Installed Claude Peers MCP server - Installed Octopus + UI UX Pro Max plugins Infrastructure: - Killed ChromaDB step in brain-maintain.js (saves ~45s/session) - Wired lesson_applications table for closed-loop learning - Event emission: HEALTH_CHECK on start + SESSION_END on close (8%→100%) - Cleaned 5.3GB stale worktrees - Fixed 2>/dev/null → 2>/dev/null bash compat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* cleanup(enhancements): move retrieval_fusion to scoring/, flip Beta-LB gate, add invariant + obfuscation tests Council v4 verdict (council_2026-05-02T11-59-00.md) and v4-rerun (council_2026-05-02T11-59-00.md) flagged a small set of production-readiness items that don't depend on the larger dual-write work. This commit lands those independently so dual-write atomicity can ship as its own reviewable PR. What - Move src/gradata/enhancements/retrieval_fusion.py into enhancements/scoring/retrieval_fusion.py and update importers. Council vote 5/7 — RRF is a ranking primitive, lives more naturally with scoring/ than as a sibling. - Flip GRADATA_BETA_LB_GATE default ON in enhancements/self_improvement/_graduation.py. The 2026-04 ablation documented in the file showed ~15-20% of RULE-tier graduations miscalibrated by format-not-content; shipping the fix opt-in was textbook silent regression (council 5/7). GRADATA_BETA_LB_GATE=0 preserves the override-off escape hatch. - New tests/test_initial_confidence_invariant.py — locks the INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary that almost promoted every fresh lesson before strict-> was wired in. - New tests/test_score_obfuscation_gate.py — CI gate that fails the build if any raw confidence float in [0,1] leaks into the <brain-rules> prompt-bound payload. middleware/_core.py build_brain_rules_block() updated to obfuscate. Why - Each item is independently testable, low-risk, and clears the runway for the dual-write atomicity PR. - Beta-LB default-on closes a known correctness hole that ships every release until flipped. - Obfuscation gate converts a comment-level guarantee (security/score_obfuscation.py) into an enforced one. Test plan - pytest tests/test_initial_confidence_invariant.py tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed. - pyright src/ — 0 errors, 27 warnings (unchanged baseline). - ruff on changed files — clean. Layering check - No Layer 0 → 2 imports introduced. Risk - Beta-LB flip changes the default for graduation calibration. Users relying on miscalibrated PATTERN→RULE behavior will see fewer graduations until they set GRADATA_BETA_LB_GATE=0. This is the intended fix. Council references - council_2026-05-02T11-08-25.md (initial v4 review) - council_2026-05-02T11-59-00.md (v4 rerun, all 7 lenses through fallback chain) - council_2026-05-02T12-24-08.md (PR sequencing decision) * fix(events): JSONL canonical, SQLite projection, reconcile-on-init, doctor --reconcile Council v4 (council_2026-05-02T11-59-00.md) ranked dual-write atomicity the #1 production blocker. Crash mid-write between events.jsonl append and SQLite INSERT could leave the brain in silent split-brain state with no recovery path. What - src/gradata/_events.py - JSONL is the canonical source of truth. Append + fsync FIRST, SQLite INSERT is now an idempotent projection derived from JSONL. - Added reconcile_jsonl_to_sqlite() that scans JSONL past the SQLite watermark and replays missing rows. - Single SQLite projection helper used by both the live write path and the retain orchestrator. - Env-gated crash-window delay for deterministic kill-9 testing only (no production effect). - src/gradata/brain.py - Brain.__init__ runs JSONL → SQLite reconciliation after migrations. - Brain() resolves BRAIN_DIR / cwd when no explicit path is supplied. - observe(text, kind="correction") public event API used by the PR2 spec. - src/gradata/cli.py + src/gradata/_doctor.py - New `gradata doctor --reconcile`: scans for drift, reports the count, replays missing JSONL rows into SQLite, exits non-zero on inconsistency that can't be healed. - tests/test_dualwrite_atomicity.py - Path-agnostic public-API tests covering: happy path, kill-9 mid batch (JSONL must lead SQLite, never trail), reconcile replay, idempotency, doctor CLI drift report, concurrent-writer JSONL line integrity. Why - Before: dual-write claimed atomic in CLAUDE.md, no two-phase commit, no recovery. Crash → silent data loss or duplicate-on-replay. - After: JSONL is the log, SQLite is the projection. Every reopen reconciles. doctor --reconcile is the operator escape hatch. Property: jsonl_count >= sqlite_count, always. Test plan - pytest tests/test_dualwrite_atomicity.py — 6 passed. - Full focused regression on changed surface — 42 passed. - Non-integration suite (excluding socket-bound daemon/plugin tests blocked by sandbox) — 4130 passed, 4 skipped. - pyright src/ — 0 errors, 27 warnings (unchanged baseline). Layering check - _events.py is Layer 0. Brain.__init__ in Layer 2 calls into it. No upward imports introduced. Risk - Reconcile-on-init runs on every Brain open. For a brain with 100k events this adds ~50ms-200ms one-time at startup. Watermark is incremental so subsequent opens are O(drift) not O(total). - Concurrent writers serialize via JSONL append + advisory lock. Throughput trade-off is acceptable for correctness. Council references - council_2026-05-02T11-59-00.md (v4 RISK class, all 7 lenses) - council_2026-05-02T12-24-08.md (PR sequencing — TDD-first) Stacks on #163. * fix(cloud/client): coerce float ts and non-int session for /sync The backfill script and incremental sync both crashed on real-world events.jsonl rows that contain: - float ts (epoch seconds, e.g. 1776803751.89) — broke str-vs-str comparison against the watermark cursor with TypeError. - float or string session values (e.g. 4.5, UUID strings) — server schema rejects non-ints with HTTP 422. Coerce ts to str and session to int|None at the format-event boundary. Also surface the response body in HTTPError so 4xx/5xx debugging is not opaque. Discovered while running scripts/backfill_to_cloud.py against a brain with ~28k events accumulated since 2026-03-22. --------- Co-authored-by: Oliver Le <oliverle94@gmail.com>

test: trigger Greptile review

868ad4e

Gradata closed this Apr 6, 2026

Gradata deleted the greptile-review-test branch April 6, 2026 05:31

This was referenced Apr 13, 2026

feat(dashboard): rebuild to V3 mockup, Vite → Next.js 16 #25

Merged

docs(inspection): surface existing brain.rules/explain/export API per sim20+sim22 demand #56

Merged

feat(correct): add applies_to= binding token per sim21 trust-crisis scenario #57

Merged

Gradata mentioned this pull request Apr 15, 2026

fix(confidence): 4 confidence-math repairs per gap-analysis synthesis #71

Merged

Gradata mentioned this pull request May 2, 2026

fix(events): JSONL canonical, SQLite projection, reconcile-on-init, doctor --reconcile #164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review: Full codebase audit via Greptile#1

Review: Full codebase audit via Greptile#1
Gradata wants to merge 1 commit intomainfrom
greptile-review-test

Gradata commented Apr 6, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Apr 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

What to review

After review

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Rule 1: Never use print() ... (source)

Rule 1: Never use print() ... (source)

Rule 1: Never use print() ... (source)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Gradata commented Apr 6, 2026 •

edited by greptile-apps Bot

Loading