cleanup(enhancements): move retrieval_fusion, flip Beta-LB gate, add invariant + obfuscation tests by Gradata · Pull Request #163 · Gradata/gradata

Gradata · 2026-05-02T21:33:10Z

Summary

Council-validated cleanup work, split out so the larger dual-write PR can be reviewed independently.

Moved retrieval_fusion.py under enhancements/scoring/ (council 5/7 — RRF is a ranking primitive).
Flipped GRADATA_BETA_LB_GATE default ON. Documented 2026-04 ablation showed ~15-20% of RULE-tier graduations miscalibrated by format-not-content; shipping the fix opt-in was the textbook silent-regression pattern. GRADATA_BETA_LB_GATE=0 preserves the override.
New test_initial_confidence_invariant.py locks the INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary.
New test_score_obfuscation_gate.py is a CI gate that fails the build if any raw confidence float in [0,1] leaks into the <brain-rules> prompt. middleware/_core.py updated to obfuscate.

Test plan

pytest tests/test_initial_confidence_invariant.py tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed.
pyright src/ — 0 errors, 27 warnings (unchanged baseline).
ruff on changed files — clean.

Layering check

No Layer 0 → 2 imports introduced.

Risk

Beta-LB flip changes default graduation calibration. Users relying on the miscalibrated path will see fewer PATTERN→RULE promotions until they set GRADATA_BETA_LB_GATE=0. Intended.

Council references

council_2026-05-02T11-08-25.md
council_2026-05-02T11-59-00.md (all 7 lenses via fallback chain)
council_2026-05-02T12-24-08.md (PR sequencing)

…B gate, add invariant + obfuscation tests Council v4 verdict (council_2026-05-02T11-59-00.md) and v4-rerun (council_2026-05-02T11-59-00.md) flagged a small set of production-readiness items that don't depend on the larger dual-write work. This commit lands those independently so dual-write atomicity can ship as its own reviewable PR. What - Move src/gradata/enhancements/retrieval_fusion.py into enhancements/scoring/retrieval_fusion.py and update importers. Council vote 5/7 — RRF is a ranking primitive, lives more naturally with scoring/ than as a sibling. - Flip GRADATA_BETA_LB_GATE default ON in enhancements/self_improvement/_graduation.py. The 2026-04 ablation documented in the file showed ~15-20% of RULE-tier graduations miscalibrated by format-not-content; shipping the fix opt-in was textbook silent regression (council 5/7). GRADATA_BETA_LB_GATE=0 preserves the override-off escape hatch. - New tests/test_initial_confidence_invariant.py — locks the INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary that almost promoted every fresh lesson before strict-> was wired in. - New tests/test_score_obfuscation_gate.py — CI gate that fails the build if any raw confidence float in [0,1] leaks into the <brain-rules> prompt-bound payload. middleware/_core.py build_brain_rules_block() updated to obfuscate. Why - Each item is independently testable, low-risk, and clears the runway for the dual-write atomicity PR. - Beta-LB default-on closes a known correctness hole that ships every release until flipped. - Obfuscation gate converts a comment-level guarantee (security/score_obfuscation.py) into an enforced one. Test plan - pytest tests/test_initial_confidence_invariant.py tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed. - pyright src/ — 0 errors, 27 warnings (unchanged baseline). - ruff on changed files — clean. Layering check - No Layer 0 → 2 imports introduced. Risk - Beta-LB flip changes the default for graduation calibration. Users relying on miscalibrated PATTERN→RULE behavior will see fewer graduations until they set GRADATA_BETA_LB_GATE=0. This is the intended fix. Council references - council_2026-05-02T11-08-25.md (initial v4 review) - council_2026-05-02T11-59-00.md (v4 rerun, all 7 lenses through fallback chain) - council_2026-05-02T12-24-08.md (PR sequencing decision)

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-02T21:33:22Z

📝 Walkthrough

Summary

Module restructuring: Moved retrieval_fusion.py from enhancements/ to enhancements/scoring/ with updated imports across affected code and tests
Breaking change - GRADATA_BETA_LB_GATE now ON by default: Changed from opt-in to enabled by default; users relying on previous miscalibrated behavior must explicitly set GRADATA_BETA_LB_GATE=0 to restore old behavior
Prompt security enhancement: Added obfuscation to build_brain_rules_block() to prevent raw confidence float values from leaking into brain rules prompts
New tests for invariant enforcement: Added test_initial_confidence_invariant.py to lock INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 invariant across lesson promotion workflows
New obfuscation gate tests: Added test_score_obfuscation_gate.py to verify that no raw numeric confidence values appear in generated prompts; CI will fail if leakage occurs
Test suite updates: Modified existing tests in test_rule_pipeline.py, test_rule_graduated_events.py, test_rule_to_hook.py, test_rule_to_hook_promotion.py, test_safety_assertion.py, test_wiring_compound.py, and test_middleware_core.py to account for the new default-on Beta-LB gate behavior
Test coverage: 63 tests passing; no new type errors (pyright: 0 errors, 27 warnings unchanged); code quality clean (ruff)
No new public APIs introduced and no Layer 0 → 2 import violations introduced

Walkthrough

This PR reorganizes the retrieval fusion module structure, inverts the GRADATA_BETA_LB_GATE environment variable default from opt-in to enabled-by-default, obfuscates confidence values in brain rules output, and updates corresponding test fixtures and assertions.

Changes

Module Reorganization & Output Obfuscation

Layer / File(s)	Summary
Module Relocation `src/gradata/enhancements/rule_pipeline.py`, `tests/test_retrieval_fusion.py`	`retrieval_fusion` module import path updates from `gradata.enhancements.retrieval_fusion` to `gradata.enhancements.scoring.retrieval_fusion` across production and test files.
Output Obfuscation `src/gradata/middleware/_core.py`	`build_brain_rules_block()` now wraps each rendered lesson line with `obfuscate_instruction(...)` instead of emitting raw `[state:confidence]` markers, hiding numeric confidence values from the prompt output.
Test Assertions Update `tests/test_middleware_core.py`	Test assertions updated to expect obfuscated `[RULE]` markers instead of confidence-suffixed `[RULE:0.95]` markers; max rules count now checks for `[RULE]` presence.
Mock Path Alignment `tests/test_rule_pipeline.py`	Missing module mocking updated to patch `sys.modules["gradata.enhancements.scoring.retrieval_fusion"]` instead of the old module path.
Output Obfuscation Validation `tests/test_score_obfuscation_gate.py`	New test module verifies that `apply_brain_rules()` and `build_brain_rules_block()` output does not leak raw numeric confidence float literals in prompt text.

Beta LB Gate Default Behavior Inversion

Layer / File(s)	Summary
Gate Logic Inversion `src/gradata/enhancements/self_improvement/_graduation.py`	`_read_beta_lb_config()` changes from opt-in (default disabled) to enabled-by-default using `os.environ.get(..., "1")` and a denylist of disable values (`"0"`, `"false"`, `"no"`, `"off"`); docstring updated to reflect new behavior.
Test Configuration Updates `tests/test_rule_graduated_events.py`, `tests/test_rule_pipeline.py`, `tests/test_rule_to_hook.py`, `tests/test_rule_to_hook_promotion.py`, `tests/test_safety_assertion.py`	Multiple tests now explicitly set `GRADATA_BETA_LB_GATE="0"` via `monkeypatch` before graduation/promotion flows to disable the gate under the new default-enabled behavior.
Test Naming Alignment `tests/test_wiring_compound.py`	First test in `TestBetaLBGate` renamed from `test_gate_disabled_by_default_allows_promotion` to `test_gate_can_be_disabled_to_allow_promotion` and now explicitly sets `GRADATA_BETA_LB_GATE="0"` to reflect the inverted default.
Confidence Threshold Tests `tests/test_initial_confidence_invariant.py`	New test module with `_lesson()` factory helper validates `graduate()` behavior around the `PATTERN_THRESHOLD` and `MIN_APPLICATIONS_FOR_PATTERN` boundary conditions.

Sequence Diagram(s)

Not applicable—these changes constitute module relocation, logic inversion, and obfuscation rather than introducing new multi-component interactions or significantly altering control flow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(jit,graduation): BM25 for JIT ranking + raise Beta LB default to 0.85 #101: Both PRs modify _graduation.py and alter beta LB gate behavior and threshold logic.
feat(wiring): canary + rules.injected + scipy Beta PPF + Beta LB gate #86: Both PRs update beta LB gate parsing and gating logic across _graduation.py and related tests.
feat: wire LLM meta-rule synthesis (Gemma native) #97: Both PRs modify rule_pipeline.py and its Phase 1 pipeline import handling.

Suggested labels

refactor, breaking-change

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the four main changes: moving retrieval_fusion to scoring, flipping the Beta-LB gate default, and adding two new test suites for invariants and obfuscation.
Description check	✅ Passed	The description provides detailed context for all changes, including motivation, implementation approach, test results, and known risks.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr/cleanup-and-tests-2026-05-02

_{Review rate limit: 4/5 reviews remaining, refill in 12 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py`:
- Around line 110-115: The default-on gate (GRADATA_BETA_LB_GATE currently
defaulting to "1") causes lessons missing persisted alpha/beta_param to be
treated as Beta(1,1) in _passes_beta_lb_gate(), which denies legacy PATTERN
lessons; revert the compatibility break by making the gate default off (change
the default of GRADATA_BETA_LB_GATE to "0"/false) or alter
_passes_beta_lb_gate() to treat a Lesson lacking alpha or beta_param as passing
(i.e., skip the Beta(1,1) fallback and allow graduation) — update code
references to GRADATA_BETA_LB_GATE, _passes_beta_lb_gate(), Lesson.alpha, and
Lesson.beta_param accordingly.

In `@Gradata/tests/test_score_obfuscation_gate.py`:
- Around line 16-23: Replace the direct Brain.init(...) invocation in
test_apply_brain_rules_prompt_does_not_leak_raw_confidence with the test helpers
from conftest: either call brain = init_brain(tmp_path, name="ObfuscationGate",
domain="Testing", embedding="local", interactive=False) or switch the test to
use the fresh_brain fixture and adjust its parameters; this ensures BRAIN_DIR
and the _paths cache are handled the same way as other tests instead of calling
Brain.init directly.
- Around line 8-13: The regex _RAW_CONFIDENCE_FLOAT currently also matches
integers because the decimal portion is optional; update the pattern used by the
_RAW_CONFIDENCE_FLOAT constant so it requires an explicit decimal point and
digits (i.e., only match floats like 0.95 or 1.00), then keep
_assert_no_raw_confidence_float unchanged so it will only detect actual float
leaks and not plain integers like "1" or "0".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d6f52f71-6517-430a-b96c-70529a21ae9f

📥 Commits

Reviewing files that changed from the base of the PR and between b82a2dc and c12d358.

📒 Files selected for processing (14)

Gradata/src/gradata/enhancements/rule_pipeline.py
Gradata/src/gradata/enhancements/scoring/retrieval_fusion.py
Gradata/src/gradata/enhancements/self_improvement/_graduation.py
Gradata/src/gradata/middleware/_core.py
Gradata/tests/test_initial_confidence_invariant.py
Gradata/tests/test_middleware_core.py
Gradata/tests/test_retrieval_fusion.py
Gradata/tests/test_rule_graduated_events.py
Gradata/tests/test_rule_pipeline.py
Gradata/tests/test_rule_to_hook.py
Gradata/tests/test_rule_to_hook_promotion.py
Gradata/tests/test_safety_assertion.py
Gradata/tests/test_score_obfuscation_gate.py
Gradata/tests/test_wiring_compound.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: pytest ubuntu-latest / py3.11
GitHub Check: pytest windows-latest / py3.12
GitHub Check: pytest macos-latest / py3.11
GitHub Check: pytest macos-latest / py3.12
GitHub Check: pytest windows-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.12
GitHub Check: pytest (py3.12)

🧰 Additional context used

📓 Path-based instructions (2)

Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

Gradata/tests/test_retrieval_fusion.py
Gradata/tests/test_middleware_core.py
Gradata/tests/test_rule_to_hook.py
Gradata/tests/test_initial_confidence_invariant.py
Gradata/tests/test_rule_graduated_events.py
Gradata/tests/test_wiring_compound.py
Gradata/tests/test_score_obfuscation_gate.py
Gradata/tests/test_rule_to_hook_promotion.py
Gradata/tests/test_safety_assertion.py
Gradata/tests/test_rule_pipeline.py

Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

Gradata/src/gradata/enhancements/rule_pipeline.py
Gradata/src/gradata/middleware/_core.py
Gradata/src/gradata/enhancements/self_improvement/_graduation.py

🔇 Additional comments (8)

Gradata/tests/test_initial_confidence_invariant.py (2)

12-20: Helper fixture is minimal and correct for graduation-path tests.

Good construction of a deterministic Lesson object with only the fields needed for this invariant.

23-45: Boundary assertions for INSTINCT→PATTERN promotion are well covered.

The tests correctly lock the tie case (== threshold) as non-promoting and the above-threshold case as promoting, matching the strict comparison contract.

Gradata/src/gradata/middleware/_core.py (1)

280-283: Obfuscation is applied at the correct boundary.

Good change: Line 281 strips score suffixes at render time while preserving rule selection and XML structure.

Gradata/tests/test_middleware_core.py (2)

46-47: Assertion update matches new obfuscated marker format.

These checks correctly enforce [RULE] presence and confidence-suffixed marker absence.

59-59: Max-rules assertion stays aligned with obfuscated output.

Counting [RULE] here is the right adjustment for the new rendered format.

Gradata/tests/test_score_obfuscation_gate.py (1)

38-60: Great middleware-level gate coverage.

This test complements the core unit checks by asserting obfuscation on build_brain_rules_block(...) output directly.

Gradata/tests/test_rule_pipeline.py (2)

171-180: Gate override keeps this graduation test deterministic.

Good update: disabling GRADATA_BETA_LB_GATE in-test prevents default-flip drift and preserves intended threshold behavior coverage.

246-247: Missing-module mock now targets the correct relocated import.

This aligns the test with the new enhancements.scoring.retrieval_fusion path and keeps optional-dependency failure handling properly covered.

coderabbitai · 2026-05-02T21:37:51Z

+    enabled = os.environ.get("GRADATA_BETA_LB_GATE", "1").lower() not in (
+        "0",
+        "false",
+        "no",
+        "off",
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default-on Beta-LB now hard-blocks lessons that lack posterior state.

Enabling the gate by default turns any Lesson without persisted alpha / beta_param into an automatic PATTERN→RULE deny, because _passes_beta_lb_gate() falls back to Beta(1,1), whose 5th-percentile lower bound is far below the default 0.85. That means legacy or handcrafted PATTERN lessons will stop graduating entirely unless the env override is set, which is a much stronger behavior change than “tighter calibration.”

Possible compatibility guard

def _passes_beta_lb_gate( lesson: Lesson, config: tuple[bool, float, int] | None = None, ) -> bool: @@ - alpha = getattr(lesson, "alpha", 1.0) - beta_param = getattr(lesson, "beta_param", 1.0) + alpha = getattr(lesson, "alpha", None) + beta_param = getattr(lesson, "beta_param", None) + if alpha is None or beta_param is None: + return True # keep legacy promotion behavior until posterior state is backfilled + from gradata.rules.rule_engine import _beta_ppf_05 return _beta_ppf_05(alpha, beta_param) >= threshold

Also applies to: 151-155

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py` around lines 110 - 115, The default-on gate (GRADATA_BETA_LB_GATE currently defaulting to "1") causes lessons missing persisted alpha/beta_param to be treated as Beta(1,1) in _passes_beta_lb_gate(), which denies legacy PATTERN lessons; revert the compatibility break by making the gate default off (change the default of GRADATA_BETA_LB_GATE to "0"/false) or alter _passes_beta_lb_gate() to treat a Lesson lacking alpha or beta_param as passing (i.e., skip the Beta(1,1) fallback and allow graduation) — update code references to GRADATA_BETA_LB_GATE, _passes_beta_lb_gate(), Lesson.alpha, and Lesson.beta_param accordingly.

coderabbitai · 2026-05-02T21:37:51Z

+_RAW_CONFIDENCE_FLOAT = re.compile(r"(?<![\w.])(?:0(?:\.\d+)?|1(?:\.0+)?)(?![\w.])")
+
+
+def _assert_no_raw_confidence_float(prompt: str) -> None:
+    leaks = _RAW_CONFIDENCE_FLOAT.findall(prompt)
+    assert not leaks, f"raw confidence float leaked into prompt-bound text: {prompt}"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' import re current = re.compile(r"(?<![\w.])(?:0(?:\.\d+)?|1(?:\.0+)?)(?![\w.])") proposed = re.compile(r"(?<![\w.])(?:0\.\d+|1\.0+)(?![\w.])") samples = [ "<brain-rules>\n[RULE] Use 1 sentence.\n</brain-rules>", "<brain-rules>\n[RULE] Keep 0 padding.\n</brain-rules>", "<brain-rules>\n[RULE:0.95] Prefer concrete dates.\n</brain-rules>", "<brain-rules>\n[RULE:1.00] Prefer concrete dates.\n</brain-rules>", ] for s in samples: print("TEXT:", s.replace("\n", "\\n")) print(" current:", current.findall(s)) print("proposed:", proposed.findall(s)) print("-" * 60) PY

Repository: Gradata/gradata

Length of output: 701

Regex matches plain integers and will cause false positives.

The optional decimal portion in the current pattern ((?:\.\d+)? and (?:\.0+)?) makes it match standalone 0 and 1. This means valid text like "use 1 sentence" or "keep 0 padding" will incorrectly trigger the gate.

Proposed fix

-_RAW_CONFIDENCE_FLOAT = re.compile(r"(?<![\w.])(?:0(?:\.\d+)?|1(?:\.0+)?)(?![\w.])") +_RAW_CONFIDENCE_FLOAT = re.compile(r"(?<![\w.])(?:0\.\d+|1\.0+)(?![\w.])")

The proposed regex requires an explicit decimal point and digits, matching only actual floats like 0.95 and 1.00 while ignoring plain integers.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/tests/test_score_obfuscation_gate.py` around lines 8 - 13, The regex _RAW_CONFIDENCE_FLOAT currently also matches integers because the decimal portion is optional; update the pattern used by the _RAW_CONFIDENCE_FLOAT constant so it requires an explicit decimal point and digits (i.e., only match floats like 0.95 or 1.00), then keep _assert_no_raw_confidence_float unchanged so it will only detect actual float leaks and not plain integers like "1" or "0".

coderabbitai · 2026-05-02T21:37:51Z

+def test_apply_brain_rules_prompt_does_not_leak_raw_confidence(tmp_path) -> None:
+    brain = Brain.init(
+        tmp_path / "brain",
+        name="ObfuscationGate",
+        domain="Testing",
+        embedding="local",
+        interactive=False,
+    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Looking for test conftest and isolation hooks..." fd -i 'conftest.py' tests echo echo "Searching for BRAIN_DIR and _paths cache refresh logic in test infra..." rg -n -C3 'BRAIN_DIR|_paths|Brain\.init|reload|refresh' tests echo echo "Reference call site in this test file:" rg -n -C2 'test_apply_brain_rules_prompt_does_not_leak_raw_confidence|Brain\.init\(' Gradata/tests/test_score_obfuscation_gate.py

Repository: Gradata/gradata

Length of output: 219

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find all conftest.py files in the repository echo "=== Locating conftest.py files ===" git ls-files | grep -i conftest.py echo echo "=== Checking Gradata/tests directory structure ===" git ls-files | grep "^Gradata/tests/" | head -20 echo echo "=== Looking for the target test file ===" git ls-files | grep "test_score_obfuscation_gate.py"

Repository: Gradata/gradata

Length of output: 953

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Reading conftest.py for test isolation setup ===" cat -n Gradata/tests/conftest.py echo echo "=== Checking for test_score_obfuscation_gate.py ===" git ls-files | grep "test_score_obfuscation_gate.py" echo echo "=== Reading test_score_obfuscation_gate.py around lines 16-23 ===" head -30 Gradata/tests/test_score_obfuscation_gate.py

Repository: Gradata/gradata

Length of output: 9867

Use the fresh_brain fixture or init_brain() from conftest.py instead of calling Brain.init() directly.

This test calls Brain.init() directly without setting BRAIN_DIR or refreshing the _paths.py module cache. The conftest.py provides the init_brain() helper (and fresh_brain fixture) specifically for this purpose. Either replace the direct Brain.init() call with brain = init_brain(tmp_path, name="ObfuscationGate", domain="Testing") or use the fresh_brain fixture and customize it as needed. See conftest.py lines 26-92.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/tests/test_score_obfuscation_gate.py` around lines 16 - 23, Replace the direct Brain.init(...) invocation in test_apply_brain_rules_prompt_does_not_leak_raw_confidence with the test helpers from conftest: either call brain = init_brain(tmp_path, name="ObfuscationGate", domain="Testing", embedding="local", interactive=False) or switch the test to use the fresh_brain fixture and adjust its parameters; this ensures BRAIN_DIR and the _paths cache are handled the same way as other tests instead of calling Brain.init directly.

* cleanup(enhancements): move retrieval_fusion to scoring/, flip Beta-LB gate, add invariant + obfuscation tests Council v4 verdict (council_2026-05-02T11-59-00.md) and v4-rerun (council_2026-05-02T11-59-00.md) flagged a small set of production-readiness items that don't depend on the larger dual-write work. This commit lands those independently so dual-write atomicity can ship as its own reviewable PR. What - Move src/gradata/enhancements/retrieval_fusion.py into enhancements/scoring/retrieval_fusion.py and update importers. Council vote 5/7 — RRF is a ranking primitive, lives more naturally with scoring/ than as a sibling. - Flip GRADATA_BETA_LB_GATE default ON in enhancements/self_improvement/_graduation.py. The 2026-04 ablation documented in the file showed ~15-20% of RULE-tier graduations miscalibrated by format-not-content; shipping the fix opt-in was textbook silent regression (council 5/7). GRADATA_BETA_LB_GATE=0 preserves the override-off escape hatch. - New tests/test_initial_confidence_invariant.py — locks the INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary that almost promoted every fresh lesson before strict-> was wired in. - New tests/test_score_obfuscation_gate.py — CI gate that fails the build if any raw confidence float in [0,1] leaks into the <brain-rules> prompt-bound payload. middleware/_core.py build_brain_rules_block() updated to obfuscate. Why - Each item is independently testable, low-risk, and clears the runway for the dual-write atomicity PR. - Beta-LB default-on closes a known correctness hole that ships every release until flipped. - Obfuscation gate converts a comment-level guarantee (security/score_obfuscation.py) into an enforced one. Test plan - pytest tests/test_initial_confidence_invariant.py tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed. - pyright src/ — 0 errors, 27 warnings (unchanged baseline). - ruff on changed files — clean. Layering check - No Layer 0 → 2 imports introduced. Risk - Beta-LB flip changes the default for graduation calibration. Users relying on miscalibrated PATTERN→RULE behavior will see fewer graduations until they set GRADATA_BETA_LB_GATE=0. This is the intended fix. Council references - council_2026-05-02T11-08-25.md (initial v4 review) - council_2026-05-02T11-59-00.md (v4 rerun, all 7 lenses through fallback chain) - council_2026-05-02T12-24-08.md (PR sequencing decision) * fix(events): JSONL canonical, SQLite projection, reconcile-on-init, doctor --reconcile Council v4 (council_2026-05-02T11-59-00.md) ranked dual-write atomicity the #1 production blocker. Crash mid-write between events.jsonl append and SQLite INSERT could leave the brain in silent split-brain state with no recovery path. What - src/gradata/_events.py - JSONL is the canonical source of truth. Append + fsync FIRST, SQLite INSERT is now an idempotent projection derived from JSONL. - Added reconcile_jsonl_to_sqlite() that scans JSONL past the SQLite watermark and replays missing rows. - Single SQLite projection helper used by both the live write path and the retain orchestrator. - Env-gated crash-window delay for deterministic kill-9 testing only (no production effect). - src/gradata/brain.py - Brain.__init__ runs JSONL → SQLite reconciliation after migrations. - Brain() resolves BRAIN_DIR / cwd when no explicit path is supplied. - observe(text, kind="correction") public event API used by the PR2 spec. - src/gradata/cli.py + src/gradata/_doctor.py - New `gradata doctor --reconcile`: scans for drift, reports the count, replays missing JSONL rows into SQLite, exits non-zero on inconsistency that can't be healed. - tests/test_dualwrite_atomicity.py - Path-agnostic public-API tests covering: happy path, kill-9 mid batch (JSONL must lead SQLite, never trail), reconcile replay, idempotency, doctor CLI drift report, concurrent-writer JSONL line integrity. Why - Before: dual-write claimed atomic in CLAUDE.md, no two-phase commit, no recovery. Crash → silent data loss or duplicate-on-replay. - After: JSONL is the log, SQLite is the projection. Every reopen reconciles. doctor --reconcile is the operator escape hatch. Property: jsonl_count >= sqlite_count, always. Test plan - pytest tests/test_dualwrite_atomicity.py — 6 passed. - Full focused regression on changed surface — 42 passed. - Non-integration suite (excluding socket-bound daemon/plugin tests blocked by sandbox) — 4130 passed, 4 skipped. - pyright src/ — 0 errors, 27 warnings (unchanged baseline). Layering check - _events.py is Layer 0. Brain.__init__ in Layer 2 calls into it. No upward imports introduced. Risk - Reconcile-on-init runs on every Brain open. For a brain with 100k events this adds ~50ms-200ms one-time at startup. Watermark is incremental so subsequent opens are O(drift) not O(total). - Concurrent writers serialize via JSONL append + advisory lock. Throughput trade-off is acceptable for correctness. Council references - council_2026-05-02T11-59-00.md (v4 RISK class, all 7 lenses) - council_2026-05-02T12-24-08.md (PR sequencing — TDD-first) Stacks on #163. * fix(cloud/client): coerce float ts and non-int session for /sync The backfill script and incremental sync both crashed on real-world events.jsonl rows that contain: - float ts (epoch seconds, e.g. 1776803751.89) — broke str-vs-str comparison against the watermark cursor with TypeError. - float or string session values (e.g. 4.5, UUID strings) — server schema rejects non-ints with HTTP 422. Coerce ts to str and session to int|None at the format-event boundary. Also surface the response body in HTTPError so 4xx/5xx debugging is not opaque. Discovered while running scripts/backfill_to_cloud.py against a brain with ~28k events accumulated since 2026-03-22. --------- Co-authored-by: Oliver Le <oliverle94@gmail.com>

greptile-apps Bot reviewed May 2, 2026

View reviewed changes

coderabbitai Bot added breaking-change refactor labels May 2, 2026

coderabbitai Bot requested changes May 2, 2026

View reviewed changes

Gradata mentioned this pull request May 2, 2026

fix(events): JSONL canonical, SQLite projection, reconcile-on-init, doctor --reconcile #164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleanup(enhancements): move retrieval_fusion, flip Beta-LB gate, add invariant + obfuscation tests#163

cleanup(enhancements): move retrieval_fusion, flip Beta-LB gate, add invariant + obfuscation tests#163
Gradata wants to merge 1 commit intomainfrom
pr/cleanup-and-tests-2026-05-02

Gradata commented May 2, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

Summary

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gradata commented May 2, 2026

Summary

Test plan

Layering check

Risk

Council references

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 2, 2026 •

edited

Loading