Skip to content

cleanup(enhancements): move retrieval_fusion, flip Beta-LB gate, add invariant + obfuscation tests#163

Open
Gradata wants to merge 1 commit intomainfrom
pr/cleanup-and-tests-2026-05-02
Open

cleanup(enhancements): move retrieval_fusion, flip Beta-LB gate, add invariant + obfuscation tests#163
Gradata wants to merge 1 commit intomainfrom
pr/cleanup-and-tests-2026-05-02

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented May 2, 2026

Summary

Council-validated cleanup work, split out so the larger dual-write PR can be reviewed independently.

  • Moved retrieval_fusion.py under enhancements/scoring/ (council 5/7 — RRF is a ranking primitive).
  • Flipped GRADATA_BETA_LB_GATE default ON. Documented 2026-04 ablation showed ~15-20% of RULE-tier graduations miscalibrated by format-not-content; shipping the fix opt-in was the textbook silent-regression pattern. GRADATA_BETA_LB_GATE=0 preserves the override.
  • New test_initial_confidence_invariant.py locks the INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary.
  • New test_score_obfuscation_gate.py is a CI gate that fails the build if any raw confidence float in [0,1] leaks into the <brain-rules> prompt. middleware/_core.py updated to obfuscate.

Test plan

  • pytest tests/test_initial_confidence_invariant.py tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py tests/test_rule_pipeline.py tests/test_middleware_core.py63 passed.
  • pyright src/ — 0 errors, 27 warnings (unchanged baseline).
  • ruff on changed files — clean.

Layering check

No Layer 0 → 2 imports introduced.

Risk

Beta-LB flip changes default graduation calibration. Users relying on the miscalibrated path will see fewer PATTERN→RULE promotions until they set GRADATA_BETA_LB_GATE=0. Intended.

Council references

  • council_2026-05-02T11-08-25.md
  • council_2026-05-02T11-59-00.md (all 7 lenses via fallback chain)
  • council_2026-05-02T12-24-08.md (PR sequencing)

…B gate, add invariant + obfuscation tests

Council v4 verdict (council_2026-05-02T11-59-00.md) and v4-rerun
(council_2026-05-02T11-59-00.md) flagged a small set of
production-readiness items that don't depend on the larger dual-write
work. This commit lands those independently so dual-write atomicity can
ship as its own reviewable PR.

What
- Move src/gradata/enhancements/retrieval_fusion.py into
  enhancements/scoring/retrieval_fusion.py and update importers.
  Council vote 5/7 — RRF is a ranking primitive, lives more naturally
  with scoring/ than as a sibling.
- Flip GRADATA_BETA_LB_GATE default ON in
  enhancements/self_improvement/_graduation.py. The 2026-04 ablation
  documented in the file showed ~15-20% of RULE-tier graduations
  miscalibrated by format-not-content; shipping the fix opt-in was
  textbook silent regression (council 5/7). GRADATA_BETA_LB_GATE=0
  preserves the override-off escape hatch.
- New tests/test_initial_confidence_invariant.py — locks the
  INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary that almost
  promoted every fresh lesson before strict-> was wired in.
- New tests/test_score_obfuscation_gate.py — CI gate that fails the
  build if any raw confidence float in [0,1] leaks into the
  <brain-rules> prompt-bound payload. middleware/_core.py
  build_brain_rules_block() updated to obfuscate.

Why
- Each item is independently testable, low-risk, and clears the runway
  for the dual-write atomicity PR.
- Beta-LB default-on closes a known correctness hole that ships every
  release until flipped.
- Obfuscation gate converts a comment-level guarantee
  (security/score_obfuscation.py) into an enforced one.

Test plan
- pytest tests/test_initial_confidence_invariant.py
  tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py
  tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed.
- pyright src/ — 0 errors, 27 warnings (unchanged baseline).
- ruff on changed files — clean.

Layering check
- No Layer 0 → 2 imports introduced.

Risk
- Beta-LB flip changes the default for graduation calibration. Users
  relying on miscalibrated PATTERN→RULE behavior will see fewer
  graduations until they set GRADATA_BETA_LB_GATE=0. This is the
  intended fix.

Council references
- council_2026-05-02T11-08-25.md (initial v4 review)
- council_2026-05-02T11-59-00.md (v4 rerun, all 7 lenses through
  fallback chain)
- council_2026-05-02T12-24-08.md (PR sequencing decision)
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

📝 Walkthrough

Summary

  • Module restructuring: Moved retrieval_fusion.py from enhancements/ to enhancements/scoring/ with updated imports across affected code and tests

  • Breaking change - GRADATA_BETA_LB_GATE now ON by default: Changed from opt-in to enabled by default; users relying on previous miscalibrated behavior must explicitly set GRADATA_BETA_LB_GATE=0 to restore old behavior

  • Prompt security enhancement: Added obfuscation to build_brain_rules_block() to prevent raw confidence float values from leaking into brain rules prompts

  • New tests for invariant enforcement: Added test_initial_confidence_invariant.py to lock INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 invariant across lesson promotion workflows

  • New obfuscation gate tests: Added test_score_obfuscation_gate.py to verify that no raw numeric confidence values appear in generated prompts; CI will fail if leakage occurs

  • Test suite updates: Modified existing tests in test_rule_pipeline.py, test_rule_graduated_events.py, test_rule_to_hook.py, test_rule_to_hook_promotion.py, test_safety_assertion.py, test_wiring_compound.py, and test_middleware_core.py to account for the new default-on Beta-LB gate behavior

  • Test coverage: 63 tests passing; no new type errors (pyright: 0 errors, 27 warnings unchanged); code quality clean (ruff)

  • No new public APIs introduced and no Layer 0 → 2 import violations introduced

Walkthrough

This PR reorganizes the retrieval fusion module structure, inverts the GRADATA_BETA_LB_GATE environment variable default from opt-in to enabled-by-default, obfuscates confidence values in brain rules output, and updates corresponding test fixtures and assertions.

Changes

Module Reorganization & Output Obfuscation

Layer / File(s) Summary
Module Relocation
src/gradata/enhancements/rule_pipeline.py, tests/test_retrieval_fusion.py
retrieval_fusion module import path updates from gradata.enhancements.retrieval_fusion to gradata.enhancements.scoring.retrieval_fusion across production and test files.
Output Obfuscation
src/gradata/middleware/_core.py
build_brain_rules_block() now wraps each rendered lesson line with obfuscate_instruction(...) instead of emitting raw [state:confidence] markers, hiding numeric confidence values from the prompt output.
Test Assertions Update
tests/test_middleware_core.py
Test assertions updated to expect obfuscated [RULE] markers instead of confidence-suffixed [RULE:0.95] markers; max rules count now checks for [RULE] presence.
Mock Path Alignment
tests/test_rule_pipeline.py
Missing module mocking updated to patch sys.modules["gradata.enhancements.scoring.retrieval_fusion"] instead of the old module path.
Output Obfuscation Validation
tests/test_score_obfuscation_gate.py
New test module verifies that apply_brain_rules() and build_brain_rules_block() output does not leak raw numeric confidence float literals in prompt text.

Beta LB Gate Default Behavior Inversion

Layer / File(s) Summary
Gate Logic Inversion
src/gradata/enhancements/self_improvement/_graduation.py
_read_beta_lb_config() changes from opt-in (default disabled) to enabled-by-default using os.environ.get(..., "1") and a denylist of disable values ("0", "false", "no", "off"); docstring updated to reflect new behavior.
Test Configuration Updates
tests/test_rule_graduated_events.py, tests/test_rule_pipeline.py, tests/test_rule_to_hook.py, tests/test_rule_to_hook_promotion.py, tests/test_safety_assertion.py
Multiple tests now explicitly set GRADATA_BETA_LB_GATE="0" via monkeypatch before graduation/promotion flows to disable the gate under the new default-enabled behavior.
Test Naming Alignment
tests/test_wiring_compound.py
First test in TestBetaLBGate renamed from test_gate_disabled_by_default_allows_promotion to test_gate_can_be_disabled_to_allow_promotion and now explicitly sets GRADATA_BETA_LB_GATE="0" to reflect the inverted default.
Confidence Threshold Tests
tests/test_initial_confidence_invariant.py
New test module with _lesson() factory helper validates graduate() behavior around the PATTERN_THRESHOLD and MIN_APPLICATIONS_FOR_PATTERN boundary conditions.

Sequence Diagram(s)

Not applicable—these changes constitute module relocation, logic inversion, and obfuscation rather than introducing new multi-component interactions or significantly altering control flow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

refactor, breaking-change

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the four main changes: moving retrieval_fusion to scoring, flipping the Beta-LB gate default, and adding two new test suites for invariants and obfuscation.
Description check ✅ Passed The description provides detailed context for all changes, including motivation, implementation approach, test results, and known risks.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr/cleanup-and-tests-2026-05-02

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py`:
- Around line 110-115: The default-on gate (GRADATA_BETA_LB_GATE currently
defaulting to "1") causes lessons missing persisted alpha/beta_param to be
treated as Beta(1,1) in _passes_beta_lb_gate(), which denies legacy PATTERN
lessons; revert the compatibility break by making the gate default off (change
the default of GRADATA_BETA_LB_GATE to "0"/false) or alter
_passes_beta_lb_gate() to treat a Lesson lacking alpha or beta_param as passing
(i.e., skip the Beta(1,1) fallback and allow graduation) — update code
references to GRADATA_BETA_LB_GATE, _passes_beta_lb_gate(), Lesson.alpha, and
Lesson.beta_param accordingly.

In `@Gradata/tests/test_score_obfuscation_gate.py`:
- Around line 16-23: Replace the direct Brain.init(...) invocation in
test_apply_brain_rules_prompt_does_not_leak_raw_confidence with the test helpers
from conftest: either call brain = init_brain(tmp_path, name="ObfuscationGate",
domain="Testing", embedding="local", interactive=False) or switch the test to
use the fresh_brain fixture and adjust its parameters; this ensures BRAIN_DIR
and the _paths cache are handled the same way as other tests instead of calling
Brain.init directly.
- Around line 8-13: The regex _RAW_CONFIDENCE_FLOAT currently also matches
integers because the decimal portion is optional; update the pattern used by the
_RAW_CONFIDENCE_FLOAT constant so it requires an explicit decimal point and
digits (i.e., only match floats like 0.95 or 1.00), then keep
_assert_no_raw_confidence_float unchanged so it will only detect actual float
leaks and not plain integers like "1" or "0".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d6f52f71-6517-430a-b96c-70529a21ae9f

📥 Commits

Reviewing files that changed from the base of the PR and between b82a2dc and c12d358.

📒 Files selected for processing (14)
  • Gradata/src/gradata/enhancements/rule_pipeline.py
  • Gradata/src/gradata/enhancements/scoring/retrieval_fusion.py
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
  • Gradata/src/gradata/middleware/_core.py
  • Gradata/tests/test_initial_confidence_invariant.py
  • Gradata/tests/test_middleware_core.py
  • Gradata/tests/test_retrieval_fusion.py
  • Gradata/tests/test_rule_graduated_events.py
  • Gradata/tests/test_rule_pipeline.py
  • Gradata/tests/test_rule_to_hook.py
  • Gradata/tests/test_rule_to_hook_promotion.py
  • Gradata/tests/test_safety_assertion.py
  • Gradata/tests/test_score_obfuscation_gate.py
  • Gradata/tests/test_wiring_compound.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest (py3.12)
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_retrieval_fusion.py
  • Gradata/tests/test_middleware_core.py
  • Gradata/tests/test_rule_to_hook.py
  • Gradata/tests/test_initial_confidence_invariant.py
  • Gradata/tests/test_rule_graduated_events.py
  • Gradata/tests/test_wiring_compound.py
  • Gradata/tests/test_score_obfuscation_gate.py
  • Gradata/tests/test_rule_to_hook_promotion.py
  • Gradata/tests/test_safety_assertion.py
  • Gradata/tests/test_rule_pipeline.py
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/enhancements/rule_pipeline.py
  • Gradata/src/gradata/middleware/_core.py
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
🔇 Additional comments (8)
Gradata/tests/test_initial_confidence_invariant.py (2)

12-20: Helper fixture is minimal and correct for graduation-path tests.

Good construction of a deterministic Lesson object with only the fields needed for this invariant.


23-45: Boundary assertions for INSTINCT→PATTERN promotion are well covered.

The tests correctly lock the tie case (== threshold) as non-promoting and the above-threshold case as promoting, matching the strict comparison contract.

Gradata/src/gradata/middleware/_core.py (1)

280-283: Obfuscation is applied at the correct boundary.

Good change: Line 281 strips score suffixes at render time while preserving rule selection and XML structure.

Gradata/tests/test_middleware_core.py (2)

46-47: Assertion update matches new obfuscated marker format.

These checks correctly enforce [RULE] presence and confidence-suffixed marker absence.


59-59: Max-rules assertion stays aligned with obfuscated output.

Counting [RULE] here is the right adjustment for the new rendered format.

Gradata/tests/test_score_obfuscation_gate.py (1)

38-60: Great middleware-level gate coverage.

This test complements the core unit checks by asserting obfuscation on build_brain_rules_block(...) output directly.

Gradata/tests/test_rule_pipeline.py (2)

171-180: Gate override keeps this graduation test deterministic.

Good update: disabling GRADATA_BETA_LB_GATE in-test prevents default-flip drift and preserves intended threshold behavior coverage.


246-247: Missing-module mock now targets the correct relocated import.

This aligns the test with the new enhancements.scoring.retrieval_fusion path and keeps optional-dependency failure handling properly covered.

Comment on lines +110 to +115
enabled = os.environ.get("GRADATA_BETA_LB_GATE", "1").lower() not in (
"0",
"false",
"no",
"off",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default-on Beta-LB now hard-blocks lessons that lack posterior state.

Enabling the gate by default turns any Lesson without persisted alpha / beta_param into an automatic PATTERN→RULE deny, because _passes_beta_lb_gate() falls back to Beta(1,1), whose 5th-percentile lower bound is far below the default 0.85. That means legacy or handcrafted PATTERN lessons will stop graduating entirely unless the env override is set, which is a much stronger behavior change than “tighter calibration.”

Possible compatibility guard
 def _passes_beta_lb_gate(
     lesson: Lesson,
     config: tuple[bool, float, int] | None = None,
 ) -> bool:
@@
-    alpha = getattr(lesson, "alpha", 1.0)
-    beta_param = getattr(lesson, "beta_param", 1.0)
+    alpha = getattr(lesson, "alpha", None)
+    beta_param = getattr(lesson, "beta_param", None)
+    if alpha is None or beta_param is None:
+        return True  # keep legacy promotion behavior until posterior state is backfilled
+
     from gradata.rules.rule_engine import _beta_ppf_05
 
     return _beta_ppf_05(alpha, beta_param) >= threshold

Also applies to: 151-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py` around
lines 110 - 115, The default-on gate (GRADATA_BETA_LB_GATE currently defaulting
to "1") causes lessons missing persisted alpha/beta_param to be treated as
Beta(1,1) in _passes_beta_lb_gate(), which denies legacy PATTERN lessons; revert
the compatibility break by making the gate default off (change the default of
GRADATA_BETA_LB_GATE to "0"/false) or alter _passes_beta_lb_gate() to treat a
Lesson lacking alpha or beta_param as passing (i.e., skip the Beta(1,1) fallback
and allow graduation) — update code references to GRADATA_BETA_LB_GATE,
_passes_beta_lb_gate(), Lesson.alpha, and Lesson.beta_param accordingly.

Comment on lines +8 to +13
_RAW_CONFIDENCE_FLOAT = re.compile(r"(?<![\w.])(?:0(?:\.\d+)?|1(?:\.0+)?)(?![\w.])")


def _assert_no_raw_confidence_float(prompt: str) -> None:
leaks = _RAW_CONFIDENCE_FLOAT.findall(prompt)
assert not leaks, f"raw confidence float leaked into prompt-bound text: {prompt}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
python - <<'PY'
import re

current = re.compile(r"(?<![\w.])(?:0(?:\.\d+)?|1(?:\.0+)?)(?![\w.])")
proposed = re.compile(r"(?<![\w.])(?:0\.\d+|1\.0+)(?![\w.])")

samples = [
    "<brain-rules>\n[RULE] Use 1 sentence.\n</brain-rules>",
    "<brain-rules>\n[RULE] Keep 0 padding.\n</brain-rules>",
    "<brain-rules>\n[RULE:0.95] Prefer concrete dates.\n</brain-rules>",
    "<brain-rules>\n[RULE:1.00] Prefer concrete dates.\n</brain-rules>",
]

for s in samples:
    print("TEXT:", s.replace("\n", "\\n"))
    print(" current:", current.findall(s))
    print("proposed:", proposed.findall(s))
    print("-" * 60)
PY

Repository: Gradata/gradata

Length of output: 701


Regex matches plain integers and will cause false positives.

The optional decimal portion in the current pattern ((?:\.\d+)? and (?:\.0+)?) makes it match standalone 0 and 1. This means valid text like "use 1 sentence" or "keep 0 padding" will incorrectly trigger the gate.

Proposed fix
-_RAW_CONFIDENCE_FLOAT = re.compile(r"(?<![\w.])(?:0(?:\.\d+)?|1(?:\.0+)?)(?![\w.])")
+_RAW_CONFIDENCE_FLOAT = re.compile(r"(?<![\w.])(?:0\.\d+|1\.0+)(?![\w.])")

The proposed regex requires an explicit decimal point and digits, matching only actual floats like 0.95 and 1.00 while ignoring plain integers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/tests/test_score_obfuscation_gate.py` around lines 8 - 13, The regex
_RAW_CONFIDENCE_FLOAT currently also matches integers because the decimal
portion is optional; update the pattern used by the _RAW_CONFIDENCE_FLOAT
constant so it requires an explicit decimal point and digits (i.e., only match
floats like 0.95 or 1.00), then keep _assert_no_raw_confidence_float unchanged
so it will only detect actual float leaks and not plain integers like "1" or
"0".

Comment on lines +16 to +23
def test_apply_brain_rules_prompt_does_not_leak_raw_confidence(tmp_path) -> None:
brain = Brain.init(
tmp_path / "brain",
name="ObfuscationGate",
domain="Testing",
embedding="local",
interactive=False,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Looking for test conftest and isolation hooks..."
fd -i 'conftest.py' tests

echo
echo "Searching for BRAIN_DIR and _paths cache refresh logic in test infra..."
rg -n -C3 'BRAIN_DIR|_paths|Brain\.init|reload|refresh' tests

echo
echo "Reference call site in this test file:"
rg -n -C2 'test_apply_brain_rules_prompt_does_not_leak_raw_confidence|Brain\.init\(' Gradata/tests/test_score_obfuscation_gate.py

Repository: Gradata/gradata

Length of output: 219


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all conftest.py files in the repository
echo "=== Locating conftest.py files ==="
git ls-files | grep -i conftest.py

echo
echo "=== Checking Gradata/tests directory structure ==="
git ls-files | grep "^Gradata/tests/" | head -20

echo
echo "=== Looking for the target test file ==="
git ls-files | grep "test_score_obfuscation_gate.py"

Repository: Gradata/gradata

Length of output: 953


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Reading conftest.py for test isolation setup ==="
cat -n Gradata/tests/conftest.py

echo
echo "=== Checking for test_score_obfuscation_gate.py ==="
git ls-files | grep "test_score_obfuscation_gate.py"

echo
echo "=== Reading test_score_obfuscation_gate.py around lines 16-23 ==="
head -30 Gradata/tests/test_score_obfuscation_gate.py

Repository: Gradata/gradata

Length of output: 9867


Use the fresh_brain fixture or init_brain() from conftest.py instead of calling Brain.init() directly.

This test calls Brain.init() directly without setting BRAIN_DIR or refreshing the _paths.py module cache. The conftest.py provides the init_brain() helper (and fresh_brain fixture) specifically for this purpose. Either replace the direct Brain.init() call with brain = init_brain(tmp_path, name="ObfuscationGate", domain="Testing") or use the fresh_brain fixture and customize it as needed. See conftest.py lines 26-92.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/tests/test_score_obfuscation_gate.py` around lines 16 - 23, Replace
the direct Brain.init(...) invocation in
test_apply_brain_rules_prompt_does_not_leak_raw_confidence with the test helpers
from conftest: either call brain = init_brain(tmp_path, name="ObfuscationGate",
domain="Testing", embedding="local", interactive=False) or switch the test to
use the fresh_brain fixture and adjust its parameters; this ensures BRAIN_DIR
and the _paths cache are handled the same way as other tests instead of calling
Brain.init directly.

Gradata added a commit that referenced this pull request May 2, 2026
* cleanup(enhancements): move retrieval_fusion to scoring/, flip Beta-LB gate, add invariant + obfuscation tests

Council v4 verdict (council_2026-05-02T11-59-00.md) and v4-rerun
(council_2026-05-02T11-59-00.md) flagged a small set of
production-readiness items that don't depend on the larger dual-write
work. This commit lands those independently so dual-write atomicity can
ship as its own reviewable PR.

What
- Move src/gradata/enhancements/retrieval_fusion.py into
  enhancements/scoring/retrieval_fusion.py and update importers.
  Council vote 5/7 — RRF is a ranking primitive, lives more naturally
  with scoring/ than as a sibling.
- Flip GRADATA_BETA_LB_GATE default ON in
  enhancements/self_improvement/_graduation.py. The 2026-04 ablation
  documented in the file showed ~15-20% of RULE-tier graduations
  miscalibrated by format-not-content; shipping the fix opt-in was
  textbook silent regression (council 5/7). GRADATA_BETA_LB_GATE=0
  preserves the override-off escape hatch.
- New tests/test_initial_confidence_invariant.py — locks the
  INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary that almost
  promoted every fresh lesson before strict-> was wired in.
- New tests/test_score_obfuscation_gate.py — CI gate that fails the
  build if any raw confidence float in [0,1] leaks into the
  <brain-rules> prompt-bound payload. middleware/_core.py
  build_brain_rules_block() updated to obfuscate.

Why
- Each item is independently testable, low-risk, and clears the runway
  for the dual-write atomicity PR.
- Beta-LB default-on closes a known correctness hole that ships every
  release until flipped.
- Obfuscation gate converts a comment-level guarantee
  (security/score_obfuscation.py) into an enforced one.

Test plan
- pytest tests/test_initial_confidence_invariant.py
  tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py
  tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed.
- pyright src/ — 0 errors, 27 warnings (unchanged baseline).
- ruff on changed files — clean.

Layering check
- No Layer 0 → 2 imports introduced.

Risk
- Beta-LB flip changes the default for graduation calibration. Users
  relying on miscalibrated PATTERN→RULE behavior will see fewer
  graduations until they set GRADATA_BETA_LB_GATE=0. This is the
  intended fix.

Council references
- council_2026-05-02T11-08-25.md (initial v4 review)
- council_2026-05-02T11-59-00.md (v4 rerun, all 7 lenses through
  fallback chain)
- council_2026-05-02T12-24-08.md (PR sequencing decision)

* fix(events): JSONL canonical, SQLite projection, reconcile-on-init, doctor --reconcile

Council v4 (council_2026-05-02T11-59-00.md) ranked dual-write atomicity
the #1 production blocker. Crash mid-write between events.jsonl append
and SQLite INSERT could leave the brain in silent split-brain state
with no recovery path.

What
- src/gradata/_events.py
  - JSONL is the canonical source of truth. Append + fsync FIRST,
    SQLite INSERT is now an idempotent projection derived from JSONL.
  - Added reconcile_jsonl_to_sqlite() that scans JSONL past the
    SQLite watermark and replays missing rows.
  - Single SQLite projection helper used by both the live write path
    and the retain orchestrator.
  - Env-gated crash-window delay for deterministic kill-9 testing
    only (no production effect).
- src/gradata/brain.py
  - Brain.__init__ runs JSONL → SQLite reconciliation after migrations.
  - Brain() resolves BRAIN_DIR / cwd when no explicit path is supplied.
  - observe(text, kind="correction") public event API used by the
    PR2 spec.
- src/gradata/cli.py + src/gradata/_doctor.py
  - New `gradata doctor --reconcile`: scans for drift, reports the
    count, replays missing JSONL rows into SQLite, exits non-zero on
    inconsistency that can't be healed.
- tests/test_dualwrite_atomicity.py
  - Path-agnostic public-API tests covering: happy path, kill-9 mid
    batch (JSONL must lead SQLite, never trail), reconcile replay,
    idempotency, doctor CLI drift report, concurrent-writer JSONL
    line integrity.

Why
- Before: dual-write claimed atomic in CLAUDE.md, no two-phase commit,
  no recovery. Crash → silent data loss or duplicate-on-replay.
- After: JSONL is the log, SQLite is the projection. Every reopen
  reconciles. doctor --reconcile is the operator escape hatch.
  Property: jsonl_count >= sqlite_count, always.

Test plan
- pytest tests/test_dualwrite_atomicity.py — 6 passed.
- Full focused regression on changed surface — 42 passed.
- Non-integration suite (excluding socket-bound daemon/plugin tests
  blocked by sandbox) — 4130 passed, 4 skipped.
- pyright src/ — 0 errors, 27 warnings (unchanged baseline).

Layering check
- _events.py is Layer 0. Brain.__init__ in Layer 2 calls into it.
  No upward imports introduced.

Risk
- Reconcile-on-init runs on every Brain open. For a brain with
  100k events this adds ~50ms-200ms one-time at startup. Watermark
  is incremental so subsequent opens are O(drift) not O(total).
- Concurrent writers serialize via JSONL append + advisory lock.
  Throughput trade-off is acceptable for correctness.

Council references
- council_2026-05-02T11-59-00.md (v4 RISK class, all 7 lenses)
- council_2026-05-02T12-24-08.md (PR sequencing — TDD-first)

Stacks on #163.

* fix(cloud/client): coerce float ts and non-int session for /sync

The backfill script and incremental sync both crashed on real-world
events.jsonl rows that contain:
  - float ts (epoch seconds, e.g. 1776803751.89) — broke str-vs-str
    comparison against the watermark cursor with TypeError.
  - float or string session values (e.g. 4.5, UUID strings) — server
    schema rejects non-ints with HTTP 422.

Coerce ts to str and session to int|None at the format-event boundary.
Also surface the response body in HTTPError so 4xx/5xx debugging is
not opaque.

Discovered while running scripts/backfill_to_cloud.py against a
brain with ~28k events accumulated since 2026-03-22.

---------

Co-authored-by: Oliver Le <oliverle94@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants