Skip to content

feat(wiring): canary + rules.injected + scipy Beta PPF + Beta LB gate#86

Merged
Gradata merged 1 commit intomainfrom
feat/wiring-compound
Apr 15, 2026
Merged

feat(wiring): canary + rules.injected + scipy Beta PPF + Beta LB gate#86
Gradata merged 1 commit intomainfrom
feat/wiring-compound

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 15, 2026

Summary

Compound wiring PR from the 2026-04-15 autoresearch synthesis (.tmp/autoresearch-synthesis.md). Four recommendations from three independent audit reports collapse into five small, contained edits.

Closes wiring gaps:

  • § Canary enrollment — _core.py:680 now calls promote_to_canary on every RULE graduation
  • § Canary health sweep — end_session now checks each RULE-tier canary; promotes/rolls back per check_canary_health
  • § rules.injected — emitted from brain.apply_brain_rules so SessionHistory.compute_effectiveness starts returning real data (subscriber existed, emitter didn't)
  • § Bus passed into apply_rules / apply_rules_with_tree so rule_scoped_out actually fires in production

Algo-gaps shipped:

  • § scipy-backed _beta_ppf_05 — closes small-sample bias in the ~40% of PATTERN-tier rules with α+β < 10
  • § Beta LB gate (opt-in) — blocks RULE promotion unless Beta.ppf(0.05, α, β) ≥ 0.70 AND fire_count ≥ 5. Off by default to preserve v4 ablation calibration; flip via GRADATA_BETA_LB_GATE=1

Why one PR, not five

Three separate audit reports identified these as independent issues. Cross-referencing them shows they share change sites and the fixes compose:

  • Canary + rule-to-hook wiring both live at _core.py:680 (GRADUATION emit point)
  • rules.injected emission is the unlock for SessionHistory.compute_effectiveness which is the unlock for rule_ranker's live effectiveness scores — so the leanness audit's "delete rule_ranker" recommendation was a false positive; wire, don't delete
  • Beta LB gate sits on top of the PPF swap — shipping them separately would have left the gate using the biased approximation

Full synthesis with cross-report compound analysis in .tmp/autoresearch-synthesis.md.

Changes (6 files, +386 / -8)

File Change
rules/rule_engine.py _beta_ppf_05 uses scipy when available, falls back to normal approx
enhancements/self_improvement.py new _passes_beta_lb_gate helper; gate wired into PATTERN→RULE promotion
_core.py promote_to_canary call after GRADUATION emit (RULE only); canary health sweep before SESSION_END
brain.py apply_brain_rules passes self.bus to apply_rules; emits rules.injected with rule ids + scope + task
tests/test_wiring_compound.py new — 14 tests covering all 5 surfaces
tests/test_beta_scoring.py loosened one bias-measuring assertion (> 0.8 → > 0.75) since scipy PPF is more accurate than the normal approximation

Test plan

  • pytest tests/test_wiring_compound.py — 14 new tests pass
  • Full local suite — 2561 pass, 24 skipped
  • CI: test (3.11) / (3.12) / (3.13)
  • CI: SDK Test — expect the known Py3.11 em-dash flake (unrelated to this PR) may still fire

Follow-ups (not in this PR)

  1. Measure Beta LB gate with GRADATA_BETA_LB_GATE=1 in ablation before defaulting on
  2. BM25 rule ranking + Thompson sampling — both depend on the rules.injected emit this PR adds
  3. Clean deletes in .tmp/autoresearch-synthesis.md §4 (~1,460 LOC across 9 files) — separate hygiene PR

Co-Authored-By: Gradata noreply@gradata.ai

…py Beta PPF + Beta LB gate

Compound wiring fix derived from the autoresearch synthesis
(.tmp/autoresearch-synthesis.md §1-§2). Four independent recommendations
from three separate reports collapse into one PR.

## Changes

### rules/rule_engine.py:504 — scipy-backed Beta PPF
Replace normal approximation in `_beta_ppf_05` with `scipy.stats.beta.ppf`
when scipy is available; fall back to the existing approximation otherwise.
Closes the known small-sample bias (α+β < 10) that affects ~40% of
PATTERN-tier rules. Ship-alongside since scipy is already in `dev` extras.

### enhancements/self_improvement.py — Beta LB gate on RULE promotion
New `_passes_beta_lb_gate(lesson)` called in the PATTERN→RULE promotion
condition. Gate is OPT-IN via `GRADATA_BETA_LB_GATE=1` (default off) to
preserve v4-ablation calibration. When enabled, requires:
  - `fire_count >= GRADATA_BETA_LB_MIN_FIRES` (default 5), and
  - `_beta_ppf_05(α, β) >= GRADATA_BETA_LB_THRESHOLD` (default 0.70)
Targets the min2022 random-label control failure: ~15–20% of current
RULE-tier graduations pass on format, not content.

### _core.py:680 — wire GRADUATION → promote_to_canary
Every fresh RULE graduation now enrolls the lesson's category in canary
state. `promote_to_canary(category, session, db_path)` closes the wiring
audit §3 gap where `enhancements/rule_canary.py` was shipped but never
called from runtime. Best-effort — graduation never fails if the canary
table is unavailable.

### _core.py:end_session — canary health sweep
Before `SESSION_END` emits, iterate RULE-tier lessons and call
`check_canary_health(category, session)`. Recommendations:
  - PROMOTE (0 corrections in CANARY_SESSIONS) → `promote_to_active`
  - ROLLBACK (1+ corrections) → `rollback_rule`
Closes the wiring audit §3 "canary is built but architecturally bypassed"
finding. Implementation is best-effort and per-category-deduped.

### brain.py:apply_brain_rules — rules.injected + bus wiring
Pass `self.bus` into `apply_rules()` / `apply_rules_with_tree()` so
`rule_scoped_out` events fire in production (wiring audit §6B). Emit
`rules.injected` after `applied` is computed so
`SessionHistory.compute_effectiveness()` starts returning real data
instead of {} (wiring audit §4 — subscriber existed, emitter didn't).

## Why this corrects a leanness false-positive

The leanness audit flagged `rule_ranker.py` and `self_healing.py` as dead
code. The *reason* they're dead is this wiring gap: without
`rules.injected`, `SessionHistory` can't compute effectiveness, so the
ranker never gets live feedback. Wire the emit → both files become live.
Do not delete.

## Test plan

- [x] `pytest tests/test_wiring_compound.py` — 14 new tests pass
  (Beta PPF shape, Beta LB gate on/off/thresholds/min-fires, canary
  enrollment, rules.injected payload shape, end_session sweep no-crash)
- [x] `pytest tests/test_beta_scoring.py` — adjusted bias-measuring
  assertion (> 0.8 → > 0.75) since scipy PPF is more accurate than
  the normal approximation; statistical intent ("20/21 successes gives
  high reliability") preserved
- [x] Full suite — 2561 pass, 24 skipped locally

## Follow-ups

- Measure Beta LB gate in ablation with `GRADATA_BETA_LB_GATE=1` before
  defaulting on. Expected direction: tightens v4's +7.8% Sonnet lift by
  blocking the ~15–20% false-RULE graduations the min2022 control found.
- BM25 rule ranking + Thompson sampling sit on this PR's `rules.injected`
  emit (follow-up, not this PR).

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough
  • Canary wiring: Added automatic canary enrollment when lessons graduate to RULE state via promote_to_canary(), with end-of-session health sweep that promotes or rolls back canaries based on check_canary_health() recommendations
  • Rules injection event: Brain.apply_brain_rules() now passes bus into rule execution and emits rules.injected event containing rule metadata (id, category, confidence, state) and current scope for observability and effectiveness tracking
  • Beta PPF accuracy: _beta_ppf_05() now uses scipy.stats.beta.ppf() when available for more accurate 5th percentile calculation, with fallback to normal approximation for environments without SciPy
  • Beta lower-bound gate: New opt-in feature flag (GRADATA_BETA_LB_GATE) adds _passes_beta_lb_gate() helper to block PATTERN→RULE promotion unless Beta.ppf(0.05, α, β) ≥ 0.70 and fire_count ≥ 5; gate is off by default
  • 14 new comprehensive tests covering canary wiring, rules injection events, Beta PPF edge cases, and feature-flagged gate behavior
  • Test adjustment: Loosened bias assertion in test_beta_scoring.py (0.8 → 0.75) to reflect improved scipy-based PPF accuracy
  • No breaking changes or security updates; all canary/health logic is internal with graceful degradation on missing imports

Walkthrough

This PR integrates canary rollout management into the core system by adding canary enrollment during graduation transitions, health sweeps at session end, event bus wiring for rule injection notifications, and a Beta distribution lower-bound gate for PATTERN→RULE promotion. It also updates the Beta percentile computation to prefer SciPy when available and includes comprehensive test coverage for the new functionality.

Changes

Cohort / File(s) Summary
Canary Enrollment & Health Sweep
src/gradata/_core.py
Added promote_to_canary() call during PATTERN→RULE graduation and check_canary_health() sweep at session end with conditional promote_to_active() or rollback_rule() based on health metrics; all operations wrapped in try/except with debug logging.
Event Bus Integration
src/gradata/brain.py
Captured brain's event bus in apply_brain_rules() and passed it to rule engine; emits rules.injected event with injected rule metadata and current scope after rule application.
Promotion Gating
src/gradata/enhancements/self_improvement.py, src/gradata/rules/rule_engine.py
Added _passes_beta_lb_gate() helper to conditionally block PATTERN→RULE promotion based on Beta distribution 5th-percentile lower bound; updated _beta_ppf_05() to prefer SciPy computation over normal approximation when available.
Test Coverage
tests/test_beta_scoring.py, tests/test_wiring_compound.py
Adjusted beta reliability test threshold and added comprehensive end-to-end tests for canary wiring, rule event emission, Beta percentile behavior, and feature-flagged promotion gating.

Sequence Diagrams

sequenceDiagram
    participant Brain
    participant GraduationLogic
    participant RuleCanary
    participant Database as DB

    Brain->>GraduationLogic: lesson.state = PATTERN→RULE transition
    GraduationLogic->>RuleCanary: promote_to_canary(category, session, db_path)
    RuleCanary->>Database: INSERT/UPDATE rule_canary table
    Database-->>RuleCanary: acknowledgement
    GraduationLogic->>Brain: emit lesson.graduated event
Loading
sequenceDiagram
    participant Brain
    participant SessionEnd as Session End
    participant RuleCanary
    participant Database as DB

    Brain->>SessionEnd: brain_end_session()
    SessionEnd->>RuleCanary: iterate RULE-state lessons by category
    RuleCanary->>RuleCanary: check_canary_health(category, current_session, db_path)
    RuleCanary->>Database: query canary metrics & session counts
    Database-->>RuleCanary: health metrics
    alt Health Recommendation
        RuleCanary->>RuleCanary: promote_to_active(category, db_path)
        RuleCanary->>Database: UPDATE rule_canary status
    else Rollback Required
        RuleCanary->>RuleCanary: rollback_rule(category, reason, db_path)
        RuleCanary->>Database: DELETE/UPDATE rule_canary & lessons
    end
    SessionEnd->>Brain: return session result
Loading
sequenceDiagram
    participant Brain
    participant ApplyRules as apply_brain_rules()
    participant RuleEngine
    participant EventBus as Event Bus
    participant Listener

    Brain->>ApplyRules: capture bus instance
    ApplyRules->>RuleEngine: apply_rules_with_tree(event_bus=bus) or apply_rules(bus=bus)
    RuleEngine-->>ApplyRules: injected rules metadata
    ApplyRules->>EventBus: emit rules.injected event with rule payload + scope + task
    EventBus->>Listener: dispatch event to registered observer
    Listener-->>EventBus: acknowledgement
    ApplyRules->>Brain: return formatted result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

feature

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title precisely summarizes the five main changes in the changeset: canary enrollment wiring, rules.injected event emission, scipy-backed Beta PPF, and Beta LB gate.
Description check ✅ Passed The description provides detailed context connecting the changes to audit recommendations, explains the rationale for bundling them into one PR, and includes a comprehensive test plan and follow-ups.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/wiring-compound

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label Apr 15, 2026
@Gradata Gradata merged commit 3bbd7de into main Apr 15, 2026
16 of 18 checks passed
Gradata added a commit that referenced this pull request Apr 15, 2026
#88 landed at the same time as #86 and #87 shipping from a parallel
session. The merges didn't conflict line-wise, but the diffs overlap:

- brain.py:apply_brain_rules — #86 already wired `rules.injected`
  with a richer payload (id + category + confidence + state + scope)
  and try/except guard. #88 added a second thinner emit after the
  cache.put. Result: double-fire on fresh compute. Harmless in
  practice — SessionHistory dedups via a set — but clearly wrong.
  Removing #88's emit, keeping #86's.

- .gitignore — #87 already added `/cloud/` and `/sdk/`. #88's re-adds
  are duplicates. Removing; keeping `/railway.toml` and
  `apollo-leads-*.csv` which are genuinely new from #88.

The regression test in tests/test_session_history.py stays — it
asserts the emit fires end-to-end from a real Brain + correct() loop,
complementing #86's test_wiring_compound.py coverage of payload shape.
Both pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gradata added a commit that referenced this pull request Apr 15, 2026
#88 landed at the same time as #86 and #87 shipping from a parallel
session. The merges didn't conflict line-wise, but the diffs overlap:

- brain.py:apply_brain_rules — #86 already wired `rules.injected`
  with a richer payload (id + category + confidence + state + scope)
  and try/except guard. #88 added a second thinner emit after the
  cache.put. Result: double-fire on fresh compute. Harmless in
  practice — SessionHistory dedups via a set — but clearly wrong.
  Removing #88's emit, keeping #86's.

- .gitignore — #87 already added `/cloud/` and `/sdk/`. #88's re-adds
  are duplicates. Removing; keeping `/railway.toml` and
  `apollo-leads-*.csv` which are genuinely new from #88.

The regression test in tests/test_session_history.py stays — it
asserts the emit fires end-to-end from a real Brain + correct() loop,
complementing #86's test_wiring_compound.py coverage of payload shape.
Both pass.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gradata added a commit that referenced this pull request Apr 15, 2026
…test LOC)

Deletes dead code flagged in the autoresearch leanness audit after
grep-verifying that no runtime import path exists. All 2453 tests pass.

Source files removed (2101 LOC):
- src/gradata/contrib/enhancements/outcome_feedback.py (1 LOC, docstring stub)
- src/gradata/enhancements/super_meta_rules.py (197 LOC, no importers;
  SuperMetaRule dataclass + SQL table live in meta_rules.py /
  meta_rules_storage.py and remain wired)
- src/gradata/enhancements/pubsub_pipeline.py (49 LOC, test-only)
- src/gradata/rules/budget.py (43 LOC, test-only)
- src/gradata/rules/rw_lock.py (54 LOC, test-only)
- src/gradata/cloud/wiki_store.py (451 LOC, only cloud/__init__.py
  re-export + test)
- src/gradata/enhancements/rule_verifier.py (243 LOC, only manifest
  string + test reference)
- src/gradata/enhancements/rule_evolution.py (434 LOC, only manifest
  string + test references; contradiction_detector.py covers the live
  path via self_improvement.py:545)
- src/gradata/security/privacy_model.py (113 LOC, test + docs only;
  _core.py / brain.py / _export_brain.py grep-clean)
- src/gradata/benchmarks/swe_bench.py (516 LOC, docstring example +
  test only, no CLI/docs runtime reference)

Test files removed (1042 LOC): matching tests for each module plus
targeted pruning of rule_evolution test classes (TestRuleConflicts,
TestRuleRelationEnum, rule_evolution imports in TestIntegration) from
tests/test_steals.py and the TestRuleABTesting block in
tests/test_adaptations.py.

Registry + docstring updates:
- contrib/enhancements/install_manifest.py: drop rule_verifier from
  rule-integrity module components
- _manifest_helpers.py: drop rule_evolution from _core_modules
- enhancements/__init__.py: drop rule_verifier docstring line
- cloud/__init__.py: drop WikiStore lazy re-export
- enhancements/meta_rules_storage.py: docstring no longer points at
  the deleted super_meta_rules.py

NOT DELETED (verified live via PRs #77/#81/#86):
- enhancements/rule_ranker.py, self_healing.py, rule_canary.py,
  rule_to_hook.py (all have runtime callers)
- middleware/ (flagged empty in the audit but actually contains
  _core.py + 4 adapters — kept)
- src/gradata/graphify-out/ (did not exist in this tree)

Tests: 2453 passed, 24 skipped (test_integration_full.py ignored per
task spec).

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata added a commit that referenced this pull request Apr 15, 2026
…test LOC) (#90)

Deletes dead code flagged in the autoresearch leanness audit after
grep-verifying that no runtime import path exists. All 2453 tests pass.

Source files removed (2101 LOC):
- src/gradata/contrib/enhancements/outcome_feedback.py (1 LOC, docstring stub)
- src/gradata/enhancements/super_meta_rules.py (197 LOC, no importers;
  SuperMetaRule dataclass + SQL table live in meta_rules.py /
  meta_rules_storage.py and remain wired)
- src/gradata/enhancements/pubsub_pipeline.py (49 LOC, test-only)
- src/gradata/rules/budget.py (43 LOC, test-only)
- src/gradata/rules/rw_lock.py (54 LOC, test-only)
- src/gradata/cloud/wiki_store.py (451 LOC, only cloud/__init__.py
  re-export + test)
- src/gradata/enhancements/rule_verifier.py (243 LOC, only manifest
  string + test reference)
- src/gradata/enhancements/rule_evolution.py (434 LOC, only manifest
  string + test references; contradiction_detector.py covers the live
  path via self_improvement.py:545)
- src/gradata/security/privacy_model.py (113 LOC, test + docs only;
  _core.py / brain.py / _export_brain.py grep-clean)
- src/gradata/benchmarks/swe_bench.py (516 LOC, docstring example +
  test only, no CLI/docs runtime reference)

Test files removed (1042 LOC): matching tests for each module plus
targeted pruning of rule_evolution test classes (TestRuleConflicts,
TestRuleRelationEnum, rule_evolution imports in TestIntegration) from
tests/test_steals.py and the TestRuleABTesting block in
tests/test_adaptations.py.

Registry + docstring updates:
- contrib/enhancements/install_manifest.py: drop rule_verifier from
  rule-integrity module components
- _manifest_helpers.py: drop rule_evolution from _core_modules
- enhancements/__init__.py: drop rule_verifier docstring line
- cloud/__init__.py: drop WikiStore lazy re-export
- enhancements/meta_rules_storage.py: docstring no longer points at
  the deleted super_meta_rules.py

NOT DELETED (verified live via PRs #77/#81/#86):
- enhancements/rule_ranker.py, self_healing.py, rule_canary.py,
  rule_to_hook.py (all have runtime callers)
- middleware/ (flagged empty in the audit but actually contains
  _core.py + 4 adapters — kept)
- src/gradata/graphify-out/ (did not exist in this tree)

Tests: 2453 passed, 24 skipped (test_integration_full.py ignored per
task spec).

Co-authored-by: Gradata <noreply@gradata.ai>
Gradata added a commit that referenced this pull request Apr 15, 2026
Stages a small, manual-kickoff A/B harness to measure the Beta lower-
bound promotion gate shipped in PR #86. Does not run the experiment —
Oliver runs it with GRADATA_ABLATION_CONFIRM=1 when he wants a signal.

- brain/scripts/ablation_beta_lb_gate.py: synthetic 20-lesson brain,
  graduation simulation under gate on/off, Sonnet generate + Haiku judge,
  writes .tmp/ablation_beta_lb_<ts>.json + human summary.
- brain/scripts/README-ablation-beta-lb.md: context, run commands, cost
  table, decision criteria (pref-lift >= +1.0% AND grad-drop <= 50%).
- tests/test_ablation_beta_lb_gate.py: dry-run zero-LLM-call proof,
  gate discriminates on synthetic pool, env-var restore, output schema.

Safety gate: without GRADATA_ABLATION_CONFIRM=1 the script prints the
trial count + token + dollar estimate and exits 0. Dry-run is verified
by a test that raises AssertionError on any client-factory access.

No changes to production code — harness PR only.
Gradata added a commit that referenced this pull request Apr 15, 2026
Stages a small, manual-kickoff A/B harness to measure the Beta lower-
bound promotion gate shipped in PR #86. Does not run the experiment —
Oliver runs it with GRADATA_ABLATION_CONFIRM=1 when he wants a signal.

- brain/scripts/ablation_beta_lb_gate.py: synthetic 20-lesson brain,
  graduation simulation under gate on/off, Sonnet generate + Haiku judge,
  writes .tmp/ablation_beta_lb_<ts>.json + human summary.
- brain/scripts/README-ablation-beta-lb.md: context, run commands, cost
  table, decision criteria (pref-lift >= +1.0% AND grad-drop <= 50%).
- tests/test_ablation_beta_lb_gate.py: dry-run zero-LLM-call proof,
  gate discriminates on synthetic pool, env-var restore, output schema.

Safety gate: without GRADATA_ABLATION_CONFIRM=1 the script prints the
trial count + token + dollar estimate and exits 0. Dry-run is verified
by a test that raises AssertionError on any client-factory access.

No changes to production code — harness PR only.
@Gradata Gradata deleted the feat/wiring-compound branch April 17, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant