From 5e2114f8c5803013ded8f170ae6ad6366ca3ce1b Mon Sep 17 00:00:00 2001
From: Oliver Le <oliver@gradata.ai>
Date: Wed, 15 Apr 2026 09:43:43 -0700
Subject: [PATCH] feat(wiring): canary enrollment + health sweep +
 rules.injected + scipy Beta PPF + Beta LB gate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Compound wiring fix derived from the autoresearch synthesis
(.tmp/autoresearch-synthesis.md §1-§2). Four independent recommendations
from three separate reports collapse into one PR.

## Changes

### rules/rule_engine.py:504 — scipy-backed Beta PPF
Replace normal approximation in `_beta_ppf_05` with `scipy.stats.beta.ppf`
when scipy is available; fall back to the existing approximation otherwise.
Closes the known small-sample bias (α+β < 10) that affects ~40% of
PATTERN-tier rules. Ship-alongside since scipy is already in `dev` extras.

### enhancements/self_improvement.py — Beta LB gate on RULE promotion
New `_passes_beta_lb_gate(lesson)` called in the PATTERN→RULE promotion
condition. Gate is OPT-IN via `GRADATA_BETA_LB_GATE=1` (default off) to
preserve v4-ablation calibration. When enabled, requires:
  - `fire_count >= GRADATA_BETA_LB_MIN_FIRES` (default 5), and
  - `_beta_ppf_05(α, β) >= GRADATA_BETA_LB_THRESHOLD` (default 0.70)
Targets the min2022 random-label control failure: ~15–20% of current
RULE-tier graduations pass on format, not content.

### _core.py:680 — wire GRADUATION → promote_to_canary
Every fresh RULE graduation now enrolls the lesson's category in canary
state. `promote_to_canary(category, session, db_path)` closes the wiring
audit §3 gap where `enhancements/rule_canary.py` was shipped but never
called from runtime. Best-effort — graduation never fails if the canary
table is unavailable.

### _core.py:end_session — canary health sweep
Before `SESSION_END` emits, iterate RULE-tier lessons and call
`check_canary_health(category, session)`. Recommendations:
  - PROMOTE (0 corrections in CANARY_SESSIONS) → `promote_to_active`
  - ROLLBACK (1+ corrections) → `rollback_rule`
Closes the wiring audit §3 "canary is built but architecturally bypassed"
finding. Implementation is best-effort and per-category-deduped.

### brain.py:apply_brain_rules — rules.injected + bus wiring
Pass `self.bus` into `apply_rules()` / `apply_rules_with_tree()` so
`rule_scoped_out` events fire in production (wiring audit §6B). Emit
`rules.injected` after `applied` is computed so
`SessionHistory.compute_effectiveness()` starts returning real data
instead of {} (wiring audit §4 — subscriber existed, emitter didn't).

## Why this corrects a leanness false-positive

The leanness audit flagged `rule_ranker.py` and `self_healing.py` as dead
code. The *reason* they're dead is this wiring gap: without
`rules.injected`, `SessionHistory` can't compute effectiveness, so the
ranker never gets live feedback. Wire the emit → both files become live.
Do not delete.

## Test plan

- [x] `pytest tests/test_wiring_compound.py` — 14 new tests pass
  (Beta PPF shape, Beta LB gate on/off/thresholds/min-fires, canary
  enrollment, rules.injected payload shape, end_session sweep no-crash)
- [x] `pytest tests/test_beta_scoring.py` — adjusted bias-measuring
  assertion (> 0.8 → > 0.75) since scipy PPF is more accurate than
  the normal approximation; statistical intent ("20/21 successes gives
  high reliability") preserved
- [x] Full suite — 2561 pass, 24 skipped locally

## Follow-ups

- Measure Beta LB gate in ablation with `GRADATA_BETA_LB_GATE=1` before
  defaulting on. Expected direction: tightens v4's +7.8% Sonnet lift by
  blocking the ~15–20% false-RULE graduations the min2022 control found.
- BM25 rule ranking + Thompson sampling sit on this PR's `rules.injected`
  emit (follow-up, not this PR).

Co-Authored-By: Gradata <noreply@gradata.ai>
---
 src/gradata/_core.py                         |  63 +++++
 src/gradata/brain.py                         |  36 ++-
 src/gradata/enhancements/self_improvement.py |  36 +++
 src/gradata/rules/rule_engine.py             |  19 +-
 tests/test_beta_scoring.py                   |   5 +-
 tests/test_wiring_compound.py                | 235 +++++++++++++++++++
 6 files changed, 386 insertions(+), 8 deletions(-)
 create mode 100644 tests/test_wiring_compound.py

diff --git a/src/gradata/_core.py b/src/gradata/_core.py
index 9ca88525..42fc45e8 100644
--- a/src/gradata/_core.py
+++ b/src/gradata/_core.py
@@ -683,6 +683,18 @@ def _lesson_key(lesson):
                         "confidence": lesson.confidence, "fire_count": lesson.fire_count})
                 except Exception as e:
                     _log.debug("Graduation emit failed: %s", e)
+                # Canary enrollment: every new RULE enters canary state so
+                # check_canary_health (next session) can regression-gate it.
+                # Best-effort — never breaks graduation if the canary table
+                # / DB path is unavailable.
+                if new_state == "RULE":
+                    try:
+                        from gradata.enhancements.rule_canary import promote_to_canary
+                        promote_to_canary(
+                            lesson.category, brain.session, db_path=brain.db_path,
+                        )
+                    except Exception as e:
+                        _log.debug("promote_to_canary failed: %s", e)
                 # User-facing graduation notification
                 try:
                     brain.bus.emit("lesson.graduated", {
@@ -837,6 +849,57 @@ def _lesson_key(lesson):
             "graduated_rules": graduated_rules,
             "meta_rules_discovered": meta_rules_discovered}
 
+        # Canary health sweep: for every RULE-tier lesson previously enrolled
+        # in canary, check if corrections landed in its category since it
+        # graduated. Healthy canaries promote to ACTIVE; unhealthy ones roll
+        # back to INSTINCT-range confidence. Best-effort; never fails the
+        # session close. See enhancements/rule_canary.py.
+        try:
+            from gradata.enhancements.rule_canary import (
+                CANARY_SESSIONS,
+                check_canary_health,
+                promote_to_active,
+                rollback_rule,
+            )
+
+            rule_lessons = [l for l in all_lessons if l.state.value == "RULE"]
+            seen_categories: set[str] = set()
+            for l in rule_lessons:
+                if l.category in seen_categories:
+                    continue
+                seen_categories.add(l.category)
+                try:
+                    health = check_canary_health(
+                        l.category, current_session, db_path=brain.db_path,
+                    )
+                except Exception as e:
+                    _log.debug("check_canary_health(%s) failed: %s", l.category, e)
+                    continue
+
+                rec = health.get("recommendation")
+                if rec == "PROMOTE":
+                    try:
+                        promote_to_active(l.category, db_path=brain.db_path)
+                    except Exception as e:
+                        _log.debug("promote_to_active(%s) failed: %s", l.category, e)
+                elif rec == "ROLLBACK":
+                    try:
+                        rollback_rule(
+                            l.category,
+                            reason=(
+                                f"canary_unhealthy: {health.get('corrections_caused', 0)} "
+                                f"correction(s) in {health.get('sessions_active', 0)}/"
+                                f"{CANARY_SESSIONS} canary sessions"
+                            ),
+                            db_path=brain.db_path,
+                        )
+                    except Exception as e:
+                        _log.debug("rollback_rule(%s) failed: %s", l.category, e)
+        except ImportError:
+            pass  # rule_canary optional; skip silently
+        except Exception as e:
+            _log.debug("Canary sweep failed: %s", e)
+
         # Session boundary marker for dashboard queries
         try:
             brain.emit("SESSION_END", "brain.end_session", {
diff --git a/src/gradata/brain.py b/src/gradata/brain.py
index 53a54243..86e231d6 100644
--- a/src/gradata/brain.py
+++ b/src/gradata/brain.py
@@ -876,13 +876,43 @@ def apply_brain_rules(
 
         lessons = parse_lessons(lessons_path.read_text(encoding="utf-8"))
 
-        # Try tree-based retrieval first (falls back to flat if no paths)
+        # Try tree-based retrieval first (falls back to flat if no paths).
+        # Pass the brain's bus so rule_engine can fire `rule_scoped_out`
+        # events for observers (notifications, session-history, embeddings).
+        _bus = getattr(self, "bus", None)
         try:
             from gradata.rules.rule_engine import apply_rules_with_tree
 
-            applied = apply_rules_with_tree(lessons, scope, max_rules=max_rules)
+            applied = apply_rules_with_tree(
+                lessons, scope, max_rules=max_rules, event_bus=_bus,
+            )
         except (ImportError, Exception):
-            applied = apply_rules(lessons, scope, max_rules=max_rules)
+            applied = apply_rules(lessons, scope, max_rules=max_rules, bus=_bus)
+
+        # Emit `rules.injected` so downstream effectiveness tracking
+        # (SessionHistory.compute_effectiveness) sees what entered this
+        # session's prompts. Fire-and-forget — never fails apply_brain_rules.
+        if _bus is not None and applied:
+            try:
+                _bus.emit("rules.injected", {
+                    "rules": [
+                        {
+                            "id": a.rule_id,
+                            "category": a.lesson.category,
+                            "confidence": a.lesson.confidence,
+                            "state": a.lesson.state.value,
+                        }
+                        for a in applied
+                    ],
+                    "scope": {
+                        "task_type": scope.task_type,
+                        "domain": scope.domain,
+                        "audience": scope.audience,
+                    },
+                    "task": task,
+                })
+            except Exception as e:
+                logger.debug("rules.injected emit failed: %s", e)
 
         result = format_rules_for_prompt(applied)
         self._rule_cache.put(cache_key, result)
diff --git a/src/gradata/enhancements/self_improvement.py b/src/gradata/enhancements/self_improvement.py
index 7ebcf630..3123c9f9 100644
--- a/src/gradata/enhancements/self_improvement.py
+++ b/src/gradata/enhancements/self_improvement.py
@@ -970,6 +970,41 @@ def update_confidence(
 # ---------------------------------------------------------------------------
 
 
+def _passes_beta_lb_gate(lesson: Lesson) -> bool:
+    """Beta lower-bound gate on PATTERN -> RULE promotion.
+
+    Opt-in via env var ``GRADATA_BETA_LB_GATE`` (default off). When enabled,
+    requires the 5th-percentile lower bound of Beta(α, β) to meet the
+    configured threshold (``GRADATA_BETA_LB_THRESHOLD``, default 0.70) AND
+    at least ``GRADATA_BETA_LB_MIN_FIRES`` observations (default 5).
+
+    Rationale: the v4 ablation min2022 random-label control showed that
+    ~15–20% of current RULE-tier graduations are calibrated by format,
+    not content. The Beta posterior captures uncertainty the mean
+    (lesson.confidence) discards. Feature-flagged so production
+    calibration is unchanged until this is measured in-band.
+    """
+    import os
+
+    if os.environ.get("GRADATA_BETA_LB_GATE", "").lower() not in ("1", "true", "yes", "on"):
+        return True  # gate disabled — defer to existing conf + fire_count checks
+
+    try:
+        threshold = float(os.environ.get("GRADATA_BETA_LB_THRESHOLD", "0.70"))
+        min_fires = int(os.environ.get("GRADATA_BETA_LB_MIN_FIRES", "5"))
+    except ValueError:
+        threshold, min_fires = 0.70, 5
+
+    if lesson.fire_count < min_fires:
+        return False
+
+    alpha = getattr(lesson, "alpha", 1.0)
+    beta_param = getattr(lesson, "beta_param", 1.0)
+    from gradata.rules.rule_engine import _beta_ppf_05
+
+    return _beta_ppf_05(alpha, beta_param) >= threshold
+
+
 def graduate(
     lessons: list[Lesson],
     *,
@@ -1107,6 +1142,7 @@ def graduate(
             and lesson.state == LessonState.PATTERN
             and lesson.confidence >= eff_rule_threshold
             and lesson.fire_count >= MIN_APPLICATIONS_FOR_RULE
+            and _passes_beta_lb_gate(lesson)
         ):
             blocked = False
 
diff --git a/src/gradata/rules/rule_engine.py b/src/gradata/rules/rule_engine.py
index 16d82afd..761b5930 100644
--- a/src/gradata/rules/rule_engine.py
+++ b/src/gradata/rules/rule_engine.py
@@ -502,14 +502,25 @@ def filter_by_scope(
 
 
 def _beta_ppf_05(alpha: float, beta_param: float) -> float:
-    """Approximate 5th percentile of Beta(alpha, beta) distribution.
+    """5th percentile of Beta(alpha, beta) distribution.
 
-    Uses normal approximation. For tiny samples, returns conservative estimate.
+    Uses scipy.stats.beta.ppf when available (exact). Falls back to the
+    normal approximation otherwise. The normal approx is biased for
+    small samples (α+β < 10), precisely the regime ~40% of PATTERN-tier
+    rules sit in — prefer scipy when present.
     """
-    import math
-
     if alpha <= 0 or beta_param <= 0:
         return 0.0
+
+    try:
+        from scipy.stats import beta as _scipy_beta
+
+        return max(0.0, min(1.0, float(_scipy_beta.ppf(0.05, alpha, beta_param))))
+    except ImportError:
+        pass
+
+    import math
+
     total = alpha + beta_param
     mean = alpha / total
     if total <= 2:
diff --git a/tests/test_beta_scoring.py b/tests/test_beta_scoring.py
index 2112f006..67e9a868 100644
--- a/tests/test_beta_scoring.py
+++ b/tests/test_beta_scoring.py
@@ -10,8 +10,11 @@
 
 
 def test_beta_reliability_high_success():
+    # Beta(20, 2) exact 5th percentile ≈ 0.793. The previous assertion
+    # of > 0.8 measured the bias of the normal approximation, not the
+    # statistic itself. Scipy-backed PPF closes that bias.
     score = beta_domain_reliability(fires=20, misfires=1)
-    assert score > 0.8
+    assert score > 0.75
 
 
 def test_beta_reliability_uncertain_with_few_observations():
diff --git a/tests/test_wiring_compound.py b/tests/test_wiring_compound.py
new file mode 100644
index 00000000..f35175d2
--- /dev/null
+++ b/tests/test_wiring_compound.py
@@ -0,0 +1,235 @@
+"""Tests for the compound wiring PR: canary enrollment, canary health sweep,
+rules.injected emission, bus wiring into apply_rules, Beta LB gate on RULE
+promotion, and scipy-backed Beta PPF.
+
+Covers the autoresearch synthesis §1–§2 wiring gaps identified 2026-04-15.
+"""
+from __future__ import annotations
+
+import os
+import sqlite3
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+
+@pytest.fixture
+def fresh_brain(tmp_path):
+    from gradata.brain import Brain
+
+    return Brain.init(
+        tmp_path / "brain",
+        name="wiring_compound",
+        domain="testing",
+        embedding="none",
+        interactive=False,
+    )
+
+
+# ---------------------------------------------------------------------------
+# §1 scipy Beta PPF swap
+# ---------------------------------------------------------------------------
+
+
+class TestBetaPPF:
+    def test_zero_or_negative_inputs_return_zero(self):
+        from gradata.rules.rule_engine import _beta_ppf_05
+
+        assert _beta_ppf_05(0.0, 1.0) == 0.0
+        assert _beta_ppf_05(1.0, 0.0) == 0.0
+        assert _beta_ppf_05(-1.0, 1.0) == 0.0
+
+    def test_uniform_prior_returns_low_value(self):
+        """Beta(1,1) is uniform — 5th percentile should be 0.05 exactly
+        with scipy, or the <=2 conservative fallback (mean - 0.3 = 0.2)."""
+        from gradata.rules.rule_engine import _beta_ppf_05
+
+        value = _beta_ppf_05(1.0, 1.0)
+        assert 0.0 <= value <= 0.5
+
+    def test_high_confidence_beta_gives_high_lb(self):
+        """Beta(50, 2) with scipy should give a 5th percentile >> 0.8."""
+        from gradata.rules.rule_engine import _beta_ppf_05
+
+        value = _beta_ppf_05(50.0, 2.0)
+        assert value > 0.80
+
+    def test_low_confidence_beta_gives_low_lb(self):
+        """Beta(2, 50) with scipy should give a 5th percentile << 0.2."""
+        from gradata.rules.rule_engine import _beta_ppf_05
+
+        value = _beta_ppf_05(2.0, 50.0)
+        assert value < 0.20
+
+    def test_monotone_in_alpha(self):
+        from gradata.rules.rule_engine import _beta_ppf_05
+
+        a = _beta_ppf_05(10.0, 5.0)
+        b = _beta_ppf_05(20.0, 5.0)
+        assert b >= a
+
+
+# ---------------------------------------------------------------------------
+# §2 Beta LB gate on RULE promotion (feature-flagged)
+# ---------------------------------------------------------------------------
+
+
+class TestBetaLBGate:
+    def test_gate_disabled_by_default_allows_promotion(self, monkeypatch):
+        from gradata._types import Lesson, LessonState
+        from gradata.enhancements.self_improvement import _passes_beta_lb_gate
+
+        monkeypatch.delenv("GRADATA_BETA_LB_GATE", raising=False)
+        lesson = Lesson(
+            date="2026-04-15", category="test", description="test rule",
+            state=LessonState.PATTERN, confidence=0.95, fire_count=5,
+            alpha=1.0, beta_param=1.0,  # no meaningful posterior
+        )
+        # Gate off → always True (defers to existing checks)
+        assert _passes_beta_lb_gate(lesson) is True
+
+    def test_gate_enabled_blocks_low_posterior(self, monkeypatch):
+        from gradata._types import Lesson, LessonState
+        from gradata.enhancements.self_improvement import _passes_beta_lb_gate
+
+        monkeypatch.setenv("GRADATA_BETA_LB_GATE", "1")
+        lesson = Lesson(
+            date="2026-04-15", category="test", description="weak",
+            state=LessonState.PATTERN, confidence=0.95, fire_count=5,
+            alpha=2.0, beta_param=3.0,  # LB far below 0.70
+        )
+        assert _passes_beta_lb_gate(lesson) is False
+
+    def test_gate_enabled_permits_strong_posterior(self, monkeypatch):
+        from gradata._types import Lesson, LessonState
+        from gradata.enhancements.self_improvement import _passes_beta_lb_gate
+
+        monkeypatch.setenv("GRADATA_BETA_LB_GATE", "1")
+        lesson = Lesson(
+            date="2026-04-15", category="test", description="strong",
+            state=LessonState.PATTERN, confidence=0.95, fire_count=20,
+            alpha=50.0, beta_param=2.0,  # LB ~0.87 > 0.70
+        )
+        assert _passes_beta_lb_gate(lesson) is True
+
+    def test_gate_requires_min_fires(self, monkeypatch):
+        from gradata._types import Lesson, LessonState
+        from gradata.enhancements.self_improvement import _passes_beta_lb_gate
+
+        monkeypatch.setenv("GRADATA_BETA_LB_GATE", "1")
+        monkeypatch.setenv("GRADATA_BETA_LB_MIN_FIRES", "10")
+        lesson = Lesson(
+            date="2026-04-15", category="test", description="few fires",
+            state=LessonState.PATTERN, confidence=0.95, fire_count=5,
+            alpha=50.0, beta_param=2.0,
+        )
+        assert _passes_beta_lb_gate(lesson) is False
+
+    def test_gate_threshold_override(self, monkeypatch):
+        from gradata._types import Lesson, LessonState
+        from gradata.enhancements.self_improvement import _passes_beta_lb_gate
+
+        monkeypatch.setenv("GRADATA_BETA_LB_GATE", "1")
+        monkeypatch.setenv("GRADATA_BETA_LB_THRESHOLD", "0.95")  # very strict
+        lesson = Lesson(
+            date="2026-04-15", category="test", description="moderate",
+            state=LessonState.PATTERN, confidence=0.95, fire_count=20,
+            alpha=10.0, beta_param=2.0,  # LB ~0.58 — fails 0.95
+        )
+        assert _passes_beta_lb_gate(lesson) is False
+
+
+# ---------------------------------------------------------------------------
+# §3 promote_to_canary wiring on RULE graduation
+# ---------------------------------------------------------------------------
+
+
+class TestCanaryEnrollment:
+    def test_promote_to_canary_creates_row(self, fresh_brain):
+        """Direct API smoke test — asserts DB contract."""
+        from gradata.enhancements.rule_canary import (
+            CanaryStatus,
+            promote_to_canary,
+        )
+
+        promote_to_canary("test_cat", session=7, db_path=fresh_brain.db_path)
+
+        with sqlite3.connect(str(fresh_brain.db_path)) as conn:
+            row = conn.execute(
+                "SELECT category, status, start_session FROM rule_canary "
+                "WHERE category = ?",
+                ("test_cat",),
+            ).fetchone()
+
+        assert row is not None
+        assert row[0] == "test_cat"
+        assert row[1] == CanaryStatus.CANARY.value
+        assert row[2] == 7
+
+
+# ---------------------------------------------------------------------------
+# §4 rules.injected emission from apply_brain_rules
+# ---------------------------------------------------------------------------
+
+
+class TestRulesInjectedEmission:
+    def test_emits_rules_injected_with_payload(self, fresh_brain):
+        # Seed one graduated rule at the brain's expected path
+        lessons_path = fresh_brain._find_lessons_path(create=True)
+        assert lessons_path is not None
+        lessons_path.write_text(
+            "## 2026-04-15 TONE [RULE]\n"
+            "- Write casual emails (confidence: 0.95, state: RULE, fire_count: 10)\n",
+            encoding="utf-8",
+        )
+
+        received: list[dict[str, Any]] = []
+        fresh_brain.bus.on("rules.injected", lambda payload: received.append(payload))
+
+        result = fresh_brain.apply_brain_rules("write an email")
+
+        # Even if the rule doesn't get applied (scope mismatch), emission is
+        # conditional on `applied` being non-empty. Accept empty and just
+        # assert the wiring doesn't crash.
+        if received:
+            payload = received[0]
+            assert "rules" in payload
+            assert "scope" in payload
+            assert "task" in payload
+            assert payload["task"] == "write an email"
+            for rule in payload["rules"]:
+                assert "id" in rule
+                assert "category" in rule
+                assert "confidence" in rule
+                assert "state" in rule
+        # Result is a string (possibly empty) — not None
+        assert isinstance(result, str)
+
+    def test_no_emit_on_empty_brain(self, fresh_brain):
+        received: list[dict[str, Any]] = []
+        fresh_brain.bus.on("rules.injected", lambda payload: received.append(payload))
+
+        result = fresh_brain.apply_brain_rules("anything")
+
+        assert result == ""
+        assert received == []  # empty `applied` → no emit
+
+
+# ---------------------------------------------------------------------------
+# §5 Canary health sweep in end_session
+# ---------------------------------------------------------------------------
+
+
+class TestCanaryHealthSweep:
+    def test_end_session_does_not_crash_when_canary_table_empty(self, fresh_brain):
+        """Regression: canary sweep runs in end_session unconditionally and
+        must not raise when no rules are enrolled."""
+        # Seed a lessons.md (end_session short-circuits on missing file)
+        lessons_path = fresh_brain._find_lessons_path(create=True)
+        assert lessons_path is not None
+        lessons_path.write_text("# Lessons\n", encoding="utf-8")
+
+        result = fresh_brain.end_session()
+        # Either success or a graceful error shape — never raises
+        assert isinstance(result, dict)