feat(wiring): canary + rules.injected + scipy Beta PPF + Beta LB gate by Gradata · Pull Request #86 · Gradata/gradata

Gradata · 2026-04-15T16:45:15Z

Summary

Compound wiring PR from the 2026-04-15 autoresearch synthesis (.tmp/autoresearch-synthesis.md). Four recommendations from three independent audit reports collapse into five small, contained edits.

Closes wiring gaps:

§ Canary enrollment — _core.py:680 now calls promote_to_canary on every RULE graduation
§ Canary health sweep — end_session now checks each RULE-tier canary; promotes/rolls back per check_canary_health
§ rules.injected — emitted from brain.apply_brain_rules so SessionHistory.compute_effectiveness starts returning real data (subscriber existed, emitter didn't)
§ Bus passed into apply_rules / apply_rules_with_tree so rule_scoped_out actually fires in production

Algo-gaps shipped:

§ scipy-backed _beta_ppf_05 — closes small-sample bias in the ~40% of PATTERN-tier rules with α+β < 10
§ Beta LB gate (opt-in) — blocks RULE promotion unless Beta.ppf(0.05, α, β) ≥ 0.70 AND fire_count ≥ 5. Off by default to preserve v4 ablation calibration; flip via GRADATA_BETA_LB_GATE=1

Why one PR, not five

Three separate audit reports identified these as independent issues. Cross-referencing them shows they share change sites and the fixes compose:

Canary + rule-to-hook wiring both live at _core.py:680 (GRADUATION emit point)
rules.injected emission is the unlock for SessionHistory.compute_effectiveness which is the unlock for rule_ranker's live effectiveness scores — so the leanness audit's "delete rule_ranker" recommendation was a false positive; wire, don't delete
Beta LB gate sits on top of the PPF swap — shipping them separately would have left the gate using the biased approximation

Full synthesis with cross-report compound analysis in .tmp/autoresearch-synthesis.md.

Changes (6 files, +386 / -8)

File	Change
`rules/rule_engine.py`	`_beta_ppf_05` uses scipy when available, falls back to normal approx
`enhancements/self_improvement.py`	new `_passes_beta_lb_gate` helper; gate wired into PATTERN→RULE promotion
`_core.py`	`promote_to_canary` call after GRADUATION emit (RULE only); canary health sweep before SESSION_END
`brain.py`	`apply_brain_rules` passes `self.bus` to apply_rules; emits `rules.injected` with rule ids + scope + task
`tests/test_wiring_compound.py`	new — 14 tests covering all 5 surfaces
`tests/test_beta_scoring.py`	loosened one bias-measuring assertion (> 0.8 → > 0.75) since scipy PPF is more accurate than the normal approximation

Test plan

pytest tests/test_wiring_compound.py — 14 new tests pass
Full local suite — 2561 pass, 24 skipped
CI: test (3.11) / (3.12) / (3.13)
CI: SDK Test — expect the known Py3.11 em-dash flake (unrelated to this PR) may still fire

Follow-ups (not in this PR)

Measure Beta LB gate with GRADATA_BETA_LB_GATE=1 in ablation before defaulting on
BM25 rule ranking + Thompson sampling — both depend on the rules.injected emit this PR adds
Clean deletes in .tmp/autoresearch-synthesis.md §4 (~1,460 LOC across 9 files) — separate hygiene PR

Co-Authored-By: Gradata noreply@gradata.ai

…py Beta PPF + Beta LB gate Compound wiring fix derived from the autoresearch synthesis (.tmp/autoresearch-synthesis.md §1-§2). Four independent recommendations from three separate reports collapse into one PR. ## Changes ### rules/rule_engine.py:504 — scipy-backed Beta PPF Replace normal approximation in `_beta_ppf_05` with `scipy.stats.beta.ppf` when scipy is available; fall back to the existing approximation otherwise. Closes the known small-sample bias (α+β < 10) that affects ~40% of PATTERN-tier rules. Ship-alongside since scipy is already in `dev` extras. ### enhancements/self_improvement.py — Beta LB gate on RULE promotion New `_passes_beta_lb_gate(lesson)` called in the PATTERN→RULE promotion condition. Gate is OPT-IN via `GRADATA_BETA_LB_GATE=1` (default off) to preserve v4-ablation calibration. When enabled, requires: - `fire_count >= GRADATA_BETA_LB_MIN_FIRES` (default 5), and - `_beta_ppf_05(α, β) >= GRADATA_BETA_LB_THRESHOLD` (default 0.70) Targets the min2022 random-label control failure: ~15–20% of current RULE-tier graduations pass on format, not content. ### _core.py:680 — wire GRADUATION → promote_to_canary Every fresh RULE graduation now enrolls the lesson's category in canary state. `promote_to_canary(category, session, db_path)` closes the wiring audit §3 gap where `enhancements/rule_canary.py` was shipped but never called from runtime. Best-effort — graduation never fails if the canary table is unavailable. ### _core.py:end_session — canary health sweep Before `SESSION_END` emits, iterate RULE-tier lessons and call `check_canary_health(category, session)`. Recommendations: - PROMOTE (0 corrections in CANARY_SESSIONS) → `promote_to_active` - ROLLBACK (1+ corrections) → `rollback_rule` Closes the wiring audit §3 "canary is built but architecturally bypassed" finding. Implementation is best-effort and per-category-deduped. ### brain.py:apply_brain_rules — rules.injected + bus wiring Pass `self.bus` into `apply_rules()` / `apply_rules_with_tree()` so `rule_scoped_out` events fire in production (wiring audit §6B). Emit `rules.injected` after `applied` is computed so `SessionHistory.compute_effectiveness()` starts returning real data instead of {} (wiring audit §4 — subscriber existed, emitter didn't). ## Why this corrects a leanness false-positive The leanness audit flagged `rule_ranker.py` and `self_healing.py` as dead code. The *reason* they're dead is this wiring gap: without `rules.injected`, `SessionHistory` can't compute effectiveness, so the ranker never gets live feedback. Wire the emit → both files become live. Do not delete. ## Test plan - [x] `pytest tests/test_wiring_compound.py` — 14 new tests pass (Beta PPF shape, Beta LB gate on/off/thresholds/min-fires, canary enrollment, rules.injected payload shape, end_session sweep no-crash) - [x] `pytest tests/test_beta_scoring.py` — adjusted bias-measuring assertion (> 0.8 → > 0.75) since scipy PPF is more accurate than the normal approximation; statistical intent ("20/21 successes gives high reliability") preserved - [x] Full suite — 2561 pass, 24 skipped locally ## Follow-ups - Measure Beta LB gate in ablation with `GRADATA_BETA_LB_GATE=1` before defaulting on. Expected direction: tightens v4's +7.8% Sonnet lift by blocking the ~15–20% false-RULE graduations the min2022 control found. - BM25 rule ranking + Thompson sampling sit on this PR's `rules.injected` emit (follow-up, not this PR). Co-Authored-By: Gradata <noreply@gradata.ai>

greptile-apps

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

coderabbitai · 2026-04-15T16:45:30Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Canary wiring: Added automatic canary enrollment when lessons graduate to RULE state via promote_to_canary(), with end-of-session health sweep that promotes or rolls back canaries based on check_canary_health() recommendations
Rules injection event: Brain.apply_brain_rules() now passes bus into rule execution and emits rules.injected event containing rule metadata (id, category, confidence, state) and current scope for observability and effectiveness tracking
Beta PPF accuracy: _beta_ppf_05() now uses scipy.stats.beta.ppf() when available for more accurate 5th percentile calculation, with fallback to normal approximation for environments without SciPy
Beta lower-bound gate: New opt-in feature flag (GRADATA_BETA_LB_GATE) adds _passes_beta_lb_gate() helper to block PATTERN→RULE promotion unless Beta.ppf(0.05, α, β) ≥ 0.70 and fire_count ≥ 5; gate is off by default
14 new comprehensive tests covering canary wiring, rules injection events, Beta PPF edge cases, and feature-flagged gate behavior
Test adjustment: Loosened bias assertion in test_beta_scoring.py (0.8 → 0.75) to reflect improved scipy-based PPF accuracy
No breaking changes or security updates; all canary/health logic is internal with graceful degradation on missing imports

Walkthrough

This PR integrates canary rollout management into the core system by adding canary enrollment during graduation transitions, health sweeps at session end, event bus wiring for rule injection notifications, and a Beta distribution lower-bound gate for PATTERN→RULE promotion. It also updates the Beta percentile computation to prefer SciPy when available and includes comprehensive test coverage for the new functionality.

Changes

Cohort / File(s)	Summary
Canary Enrollment & Health Sweep `src/gradata/_core.py`	Added `promote_to_canary()` call during PATTERN→RULE graduation and `check_canary_health()` sweep at session end with conditional `promote_to_active()` or `rollback_rule()` based on health metrics; all operations wrapped in try/except with debug logging.
Event Bus Integration `src/gradata/brain.py`	Captured brain's event bus in `apply_brain_rules()` and passed it to rule engine; emits `rules.injected` event with injected rule metadata and current scope after rule application.
Promotion Gating `src/gradata/enhancements/self_improvement.py`, `src/gradata/rules/rule_engine.py`	Added `_passes_beta_lb_gate()` helper to conditionally block PATTERN→RULE promotion based on Beta distribution 5th-percentile lower bound; updated `_beta_ppf_05()` to prefer SciPy computation over normal approximation when available.
Test Coverage `tests/test_beta_scoring.py`, `tests/test_wiring_compound.py`	Adjusted beta reliability test threshold and added comprehensive end-to-end tests for canary wiring, rule event emission, Beta percentile behavior, and feature-flagged promotion gating.

Sequence Diagrams

sequenceDiagram
    participant Brain
    participant GraduationLogic
    participant RuleCanary
    participant Database as DB

    Brain->>GraduationLogic: lesson.state = PATTERN→RULE transition
    GraduationLogic->>RuleCanary: promote_to_canary(category, session, db_path)
    RuleCanary->>Database: INSERT/UPDATE rule_canary table
    Database-->>RuleCanary: acknowledgement
    GraduationLogic->>Brain: emit lesson.graduated event

sequenceDiagram
    participant Brain
    participant SessionEnd as Session End
    participant RuleCanary
    participant Database as DB

    Brain->>SessionEnd: brain_end_session()
    SessionEnd->>RuleCanary: iterate RULE-state lessons by category
    RuleCanary->>RuleCanary: check_canary_health(category, current_session, db_path)
    RuleCanary->>Database: query canary metrics & session counts
    Database-->>RuleCanary: health metrics
    alt Health Recommendation
        RuleCanary->>RuleCanary: promote_to_active(category, db_path)
        RuleCanary->>Database: UPDATE rule_canary status
    else Rollback Required
        RuleCanary->>RuleCanary: rollback_rule(category, reason, db_path)
        RuleCanary->>Database: DELETE/UPDATE rule_canary & lessons
    end
    SessionEnd->>Brain: return session result

sequenceDiagram
    participant Brain
    participant ApplyRules as apply_brain_rules()
    participant RuleEngine
    participant EventBus as Event Bus
    participant Listener

    Brain->>ApplyRules: capture bus instance
    ApplyRules->>RuleEngine: apply_rules_with_tree(event_bus=bus) or apply_rules(bus=bus)
    RuleEngine-->>ApplyRules: injected rules metadata
    ApplyRules->>EventBus: emit rules.injected event with rule payload + scope + task
    EventBus->>Listener: dispatch event to registered observer
    Listener-->>EventBus: acknowledgement
    ApplyRules->>Brain: return formatted result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: meta-rule discovery pipeline + behavioral extraction #19: Modifies brain_end_session loop control flow where canary health sweep is injected; direct interaction with the same session-finalization function.
feat: Sim 9 engine hardening — rule scoping, convergence gate, efficiency metric #15: Modifies Brain.apply_brain_rules to integrate event bus parameter; shares the same method integration point for rule engine event wiring.
feat: S102 — MiroFish P0-P2 roadmap implementation (9 features) #24: Modifies _beta_ppf_05 and enhancements/self_improvement.py promotion logic; directly affects the Beta lower-bound gating functions added here.

Suggested labels

feature

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title precisely summarizes the five main changes in the changeset: canary enrollment wiring, rules.injected event emission, scipy-backed Beta PPF, and Beta LB gate.
Description check	✅ Passed	The description provides detailed context connecting the changes to audit recommendations, explains the rationale for bundling them into one PR, and includes a comprehensive test plan and follow-ups.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/wiring-compound

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

#88 landed at the same time as #86 and #87 shipping from a parallel session. The merges didn't conflict line-wise, but the diffs overlap: - brain.py:apply_brain_rules — #86 already wired `rules.injected` with a richer payload (id + category + confidence + state + scope) and try/except guard. #88 added a second thinner emit after the cache.put. Result: double-fire on fresh compute. Harmless in practice — SessionHistory dedups via a set — but clearly wrong. Removing #88's emit, keeping #86's. - .gitignore — #87 already added `/cloud/` and `/sdk/`. #88's re-adds are duplicates. Removing; keeping `/railway.toml` and `apollo-leads-*.csv` which are genuinely new from #88. The regression test in tests/test_session_history.py stays — it asserts the emit fires end-to-end from a real Brain + correct() loop, complementing #86's test_wiring_compound.py coverage of payload shape. Both pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

#88 landed at the same time as #86 and #87 shipping from a parallel session. The merges didn't conflict line-wise, but the diffs overlap: - brain.py:apply_brain_rules — #86 already wired `rules.injected` with a richer payload (id + category + confidence + state + scope) and try/except guard. #88 added a second thinner emit after the cache.put. Result: double-fire on fresh compute. Harmless in practice — SessionHistory dedups via a set — but clearly wrong. Removing #88's emit, keeping #86's. - .gitignore — #87 already added `/cloud/` and `/sdk/`. #88's re-adds are duplicates. Removing; keeping `/railway.toml` and `apollo-leads-*.csv` which are genuinely new from #88. The regression test in tests/test_session_history.py stays — it asserts the emit fires end-to-end from a real Brain + correct() loop, complementing #86's test_wiring_compound.py coverage of payload shape. Both pass. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…test LOC) Deletes dead code flagged in the autoresearch leanness audit after grep-verifying that no runtime import path exists. All 2453 tests pass. Source files removed (2101 LOC): - src/gradata/contrib/enhancements/outcome_feedback.py (1 LOC, docstring stub) - src/gradata/enhancements/super_meta_rules.py (197 LOC, no importers; SuperMetaRule dataclass + SQL table live in meta_rules.py / meta_rules_storage.py and remain wired) - src/gradata/enhancements/pubsub_pipeline.py (49 LOC, test-only) - src/gradata/rules/budget.py (43 LOC, test-only) - src/gradata/rules/rw_lock.py (54 LOC, test-only) - src/gradata/cloud/wiki_store.py (451 LOC, only cloud/__init__.py re-export + test) - src/gradata/enhancements/rule_verifier.py (243 LOC, only manifest string + test reference) - src/gradata/enhancements/rule_evolution.py (434 LOC, only manifest string + test references; contradiction_detector.py covers the live path via self_improvement.py:545) - src/gradata/security/privacy_model.py (113 LOC, test + docs only; _core.py / brain.py / _export_brain.py grep-clean) - src/gradata/benchmarks/swe_bench.py (516 LOC, docstring example + test only, no CLI/docs runtime reference) Test files removed (1042 LOC): matching tests for each module plus targeted pruning of rule_evolution test classes (TestRuleConflicts, TestRuleRelationEnum, rule_evolution imports in TestIntegration) from tests/test_steals.py and the TestRuleABTesting block in tests/test_adaptations.py. Registry + docstring updates: - contrib/enhancements/install_manifest.py: drop rule_verifier from rule-integrity module components - _manifest_helpers.py: drop rule_evolution from _core_modules - enhancements/__init__.py: drop rule_verifier docstring line - cloud/__init__.py: drop WikiStore lazy re-export - enhancements/meta_rules_storage.py: docstring no longer points at the deleted super_meta_rules.py NOT DELETED (verified live via PRs #77/#81/#86): - enhancements/rule_ranker.py, self_healing.py, rule_canary.py, rule_to_hook.py (all have runtime callers) - middleware/ (flagged empty in the audit but actually contains _core.py + 4 adapters — kept) - src/gradata/graphify-out/ (did not exist in this tree) Tests: 2453 passed, 24 skipped (test_integration_full.py ignored per task spec). Co-Authored-By: Gradata <noreply@gradata.ai>

…test LOC) (#90) Deletes dead code flagged in the autoresearch leanness audit after grep-verifying that no runtime import path exists. All 2453 tests pass. Source files removed (2101 LOC): - src/gradata/contrib/enhancements/outcome_feedback.py (1 LOC, docstring stub) - src/gradata/enhancements/super_meta_rules.py (197 LOC, no importers; SuperMetaRule dataclass + SQL table live in meta_rules.py / meta_rules_storage.py and remain wired) - src/gradata/enhancements/pubsub_pipeline.py (49 LOC, test-only) - src/gradata/rules/budget.py (43 LOC, test-only) - src/gradata/rules/rw_lock.py (54 LOC, test-only) - src/gradata/cloud/wiki_store.py (451 LOC, only cloud/__init__.py re-export + test) - src/gradata/enhancements/rule_verifier.py (243 LOC, only manifest string + test reference) - src/gradata/enhancements/rule_evolution.py (434 LOC, only manifest string + test references; contradiction_detector.py covers the live path via self_improvement.py:545) - src/gradata/security/privacy_model.py (113 LOC, test + docs only; _core.py / brain.py / _export_brain.py grep-clean) - src/gradata/benchmarks/swe_bench.py (516 LOC, docstring example + test only, no CLI/docs runtime reference) Test files removed (1042 LOC): matching tests for each module plus targeted pruning of rule_evolution test classes (TestRuleConflicts, TestRuleRelationEnum, rule_evolution imports in TestIntegration) from tests/test_steals.py and the TestRuleABTesting block in tests/test_adaptations.py. Registry + docstring updates: - contrib/enhancements/install_manifest.py: drop rule_verifier from rule-integrity module components - _manifest_helpers.py: drop rule_evolution from _core_modules - enhancements/__init__.py: drop rule_verifier docstring line - cloud/__init__.py: drop WikiStore lazy re-export - enhancements/meta_rules_storage.py: docstring no longer points at the deleted super_meta_rules.py NOT DELETED (verified live via PRs #77/#81/#86): - enhancements/rule_ranker.py, self_healing.py, rule_canary.py, rule_to_hook.py (all have runtime callers) - middleware/ (flagged empty in the audit but actually contains _core.py + 4 adapters — kept) - src/gradata/graphify-out/ (did not exist in this tree) Tests: 2453 passed, 24 skipped (test_integration_full.py ignored per task spec). Co-authored-by: Gradata <noreply@gradata.ai>

Stages a small, manual-kickoff A/B harness to measure the Beta lower- bound promotion gate shipped in PR #86. Does not run the experiment — Oliver runs it with GRADATA_ABLATION_CONFIRM=1 when he wants a signal. - brain/scripts/ablation_beta_lb_gate.py: synthetic 20-lesson brain, graduation simulation under gate on/off, Sonnet generate + Haiku judge, writes .tmp/ablation_beta_lb_<ts>.json + human summary. - brain/scripts/README-ablation-beta-lb.md: context, run commands, cost table, decision criteria (pref-lift >= +1.0% AND grad-drop <= 50%). - tests/test_ablation_beta_lb_gate.py: dry-run zero-LLM-call proof, gate discriminates on synthetic pool, env-var restore, output schema. Safety gate: without GRADATA_ABLATION_CONFIRM=1 the script prints the trial count + token + dollar estimate and exits 0. Dry-run is verified by a test that raises AssertionError on any client-factory access. No changes to production code — harness PR only.

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

coderabbitai Bot added the feature label Apr 15, 2026

Gradata merged commit 3bbd7de into main Apr 15, 2026
16 of 18 checks passed

Gradata mentioned this pull request Apr 15, 2026

fix: undo #88 duplicates that collided with #86/#87 #89

Merged

This was referenced Apr 15, 2026

chore(sdk): prune 10 confirmed-dead modules (-3143 LOC) #90

Merged

feat(ranking): BM25 context relevance + Thompson sampling + unified ranker #91

Merged

Gradata mentioned this pull request Apr 15, 2026

exp(ablation): pilot harness for GRADATA_BETA_LB_GATE #92

Merged

4 tasks

coderabbitai Bot mentioned this pull request Apr 17, 2026

feat(jit,graduation): BM25 for JIT ranking + raise Beta LB default to 0.85 #101

Merged

3 tasks

Gradata deleted the feat/wiring-compound branch April 17, 2026 19:46

coderabbitai Bot mentioned this pull request May 2, 2026

cleanup(enhancements): move retrieval_fusion, flip Beta-LB gate, add invariant + obfuscation tests #163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wiring): canary + rules.injected + scipy Beta PPF + Beta LB gate#86

feat(wiring): canary + rules.injected + scipy Beta PPF + Beta LB gate#86
Gradata merged 1 commit intomainfrom
feat/wiring-compound

Gradata commented Apr 15, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Apr 15, 2026

Summary

Why one PR, not five

Changes (6 files, +386 / -8)

Test plan

Follow-ups (not in this PR)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading