Skip to content

feat(inject): session-start emits synthesized prose instead of N rule lines#140

Merged
Gradata merged 1 commit intomainfrom
feat/synthesizer-body-swap
Apr 22, 2026
Merged

feat(inject): session-start emits synthesized prose instead of N rule lines#140
Gradata merged 1 commit intomainfrom
feat/synthesizer-body-swap

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 22, 2026

Summary

Wires prompt_synthesizer.synthesize_brain_injection into the session-start injection hook. Fixes the autoresearch regression where the 99.2% token optimization landed on the legacy N-rules path instead of the intended synthesizer.

  • Body swap in inject_brain_rules.py: the <brain-rules> block no longer contains per-rule [RULE:...] or cluster [CLUSTER:...] lines. It now carries a single slot-grouped prose block in Preston-Rhodes order (task → context → examples → persona → format → tone), with inline r:xxxx anchors preserved for capture_learning.py attribution.
  • Wrapper tag unchanged: <brain-rules>...</brain-rules> still wraps the content, so middleware, handoff, sanitize, and MCP contracts do not break.
  • Persona baseline: pulls from <brain_dir>/../domain/soul.md when present; otherwise empty.
  • Token budget: GRADATA_SYNTH_BUDGET env var (default 400) drops lowest-priority slots first.
  • Manifest unchanged: .last_injection.json still keys by 4-char anchor; the synthesizer emits the same anchors (first 4 chars of stable lesson id).

Test plan

  • tests/test_cluster_injection.py — 9 tests migrated from legacy line-format assertions to the new contract (anchor count, slot labels, no [CLUSTER:]/[RULE:] prefixes). 14 pass.
  • tests/test_hooks_learning.py, tests/test_lesson_applications.py, tests/test_jit_inject.py — all pass unchanged; they test the wrapper tag, not the body.
  • Full suite: 3931 pass, 2 skip (pytest tests/).

Follow-up: rerun autoresearch against the synthesizer with the correct metric.

Generated with Gradata

…rose

Session-start injection no longer emits per-rule [RULE:...] or cluster
[CLUSTER:...] lines inside <brain-rules>. Instead it feeds the ranked
lessons into prompt_synthesizer.synthesize_brain_injection, producing a
single slot-grouped prose block (task → context → examples → persona →
format → tone) with inline r:xxxx anchors for capture_learning
attribution.

The <brain-rules> wrapper tag stays so middleware, handoff, and sanitize
contracts remain intact; only the body format changes. Persona baseline
defaults to <brain_dir>/../domain/soul.md when present. Token budget
enforced via GRADATA_SYNTH_BUDGET (default 400).

Injection manifest (.last_injection.json) keeps the same schema — anchors
computed by the synthesizer are the first 4 chars of the stable lesson
id, matching what the manifest records.

test_cluster_injection.py: migrated 9 tests that asserted the legacy
line format to the new contract (anchors survive, slot labels present,
no [CLUSTER:]/[RULE:] prefixes). 80 injection-adjacent tests green;
full suite 3931 pass / 2 skip.
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f7b4ddc4-90e8-41f3-817e-9911ac51fcb8

📥 Commits

Reviewing files that changed from the base of the PR and between e636bb4 and 222c524.

📒 Files selected for processing (2)
  • Gradata/src/gradata/hooks/inject_brain_rules.py
  • Gradata/tests/test_cluster_injection.py

📝 Walkthrough

Summary

  • Synthesizer integration: Wired prompt_synthesizer.synthesize_brain_injection into the session-start injection hook to replace legacy per-rule line format with synthesized prose
  • Body format change: <brain-rules> block now contains a single prose body in Preston-Rhodes order (task → context → examples → persona → format → tone) instead of per-rule [RULE:...] or [CLUSTER:...] lines; inline r:xxxx anchors preserved for attribution
  • Persona baseline: Loaded from <brain_dir>/../domain/soul.md when available
  • Token budget control: New environment variable GRADATA_SYNTH_BUDGET (default 400) controls synthesis token limits; lowest-priority slots dropped first when needed
  • No breaking changes: Wrapper tag <brain-rules>...</brain-rules> unchanged; .last_injection.json schema preserved; middleware and MCP contracts unaffected
  • Tests migrated: 9 tests in test_cluster_injection.py updated from legacy line-format assertions to new prose contract; all other injection tests pass unchanged
  • Test results: Full suite: 3,931 pass, 2 skip

Walkthrough

Refactored brain rules injection from simple line concatenation to synthesized XML structure. New logic filters out rules covered by meta mutex, sanitizes rule descriptions for XML, injects per-rule metadata fields (rule_id, slot, example_draft, example_corrected), and applies optional persona baseline from domain/soul.md. Tests updated to validate prose-based XML output instead of legacy line formats and anchor presence using regex extraction.

Changes

Cohort / File(s) Summary
Brain Rules Synthesis
Gradata/src/gradata/hooks/inject_brain_rules.py
Replaced simple concatenation of cluster and individual lines with call to synthesize_brain_injection(). Filters out rules covered by meta mutex, sanitizes XML, constructs structured payload with per-rule fields, and loads optional persona baseline from domain/soul.md. Omits <brain-rules> entirely if synthesis input is empty.
Test Assertions Update
Gradata/tests/test_cluster_injection.py
Transitioned test validations from legacy line-based formats ([RULE:], [PATTERN:], [CLUSTER:]) to prose-bounded XML structure (<brain-rules>...</brain-rules>). Replaced line-presence checks with regex anchor extraction (r:xxxx), asserts anchor survival/count in synthesized output, and strengthened metadata validation in manifest/attribution tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

feature

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/synthesizer-body-swap

Comment @coderabbitai help to get the list of available commands and usage tips.

@Gradata Gradata merged commit 129c83f into main Apr 22, 2026
7 of 9 checks passed
@coderabbitai coderabbitai Bot added the feature label Apr 22, 2026
Gradata added a commit that referenced this pull request May 1, 2026
PR #136 "99.2% reduction (5513→42)" stacked legit format compressions
(strip YAML/XML wrappers, dedup, compact [P:0.83]→[P83], snippet/top_k
tuning) on top of 6 knob-cuts that quietly removed product behavior:

- GRADATA_WISDOM_MAX_RULES default 3 → 9 (undo 0bb2de9 + 5eabc48)
- GRADATA_WISDOM_FULL default 0 → 1 (undo d387de9 Active guidance strip)
- JIT DEFAULT_MAX_RULES 1 → 5 (undo 4a44+9582+dfab)
- JIT DEFAULT_MIN_CONFIDENCE 0.90 → 0.60 (undo 699827a)
- Restore [Pxx] state+confidence prefix on JIT output (undo 50b63d1)
- Restore [fb:neg,rem] implicit_feedback signal injection (undo 61b43c8)

Honest milestone: d372132 (last pure-compression commit) measured 1724
weighted tokens vs 5513 baseline = 69% reduction. The further jump to
42 came from defeaturing, not compression.

Post-revert measurement with synthesizer (PR #140) stacked:
  weighted=1179, session_once=154, per_turn=102.5
  = 79% honest reduction vs 5513 baseline, all 6 features restored.

Test updates: 3 implicit_feedback tests now assert returned signal
strings instead of None.

Co-authored-by: Gradata <noreply@gradata.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant