revert(autoresearch): undo 6 defeaturing knob-cuts from PR #136#141
revert(autoresearch): undo 6 defeaturing knob-cuts from PR #136#141
Conversation
PR #136 "99.2% reduction (5513→42)" stacked legit format compressions (strip YAML/XML wrappers, dedup, compact [P:0.83]→[P83], snippet/top_k tuning) on top of 6 knob-cuts that quietly removed product behavior: - GRADATA_WISDOM_MAX_RULES default 3 → 9 (undo 0bb2de9 + 5eabc48) - GRADATA_WISDOM_FULL default 0 → 1 (undo d387de9 Active guidance strip) - JIT DEFAULT_MAX_RULES 1 → 5 (undo 4a44+9582+dfab) - JIT DEFAULT_MIN_CONFIDENCE 0.90 → 0.60 (undo 699827a) - Restore [Pxx] state+confidence prefix on JIT output (undo 50b63d1) - Restore [fb:neg,rem] implicit_feedback signal injection (undo 61b43c8) Honest milestone: d372132 (last pure-compression commit) measured 1724 weighted tokens vs 5513 baseline = 69% reduction. The further jump to 42 came from defeaturing, not compression. Post-revert measurement with synthesizer (PR #140) stacked: weighted=1179, session_once=154, per_turn=102.5 = 79% honest reduction vs 5513 baseline, all 6 features restored. Test updates: 3 implicit_feedback tests now assert returned signal strings instead of None. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughSummary
Breaking change: WalkthroughMultiple hook functions in the gradata library are updated: implicit feedback detection now returns inline formatted signal results instead of None; brain rule injection defaults expand rule retention and maximum rule count; JIT rule injection parameters are relaxed to include more candidates with revised confidence thresholding and rule formatting; tests are updated to verify the new return behaviors. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/src/gradata/hooks/implicit_feedback.py`:
- Around line 205-214: The code can emit both an approval and negative feedback
at once; update the signals handling in implicit_feedback.py (the block that
builds _SIG_ABBREV and sig_str and returns {"result": ...}) to resolve conflicts
by preferring approval: if any signal with type "approval" exists, filter out
negative signal types (e.g., "negation", "challenge", "gap") before constructing
sig_str so you won't return negative feedback alongside OUTPUT_ACCEPTED; keep
the existing _SIG_ABBREV mapping and sig_str construction but operate on a
cleaned signals list (or set) so the return only includes the resolved signal
types.
In `@Gradata/src/gradata/hooks/inject_brain_rules.py`:
- Line 185: Replace the raw int(...) parsing of GRADATA_WISDOM_MAX_RULES used to
set wisdom_max_rules with a defensive parse: catch ValueError/TypeError, fall
back to the default (9), and clamp the resulting value to a safe minimum (e.g.,
0 or 1) and optionally an upper bound; you can reuse or implement a small helper
like _env_int to perform parse-with-default-and-clamp. Update the code that
references wisdom_max_rules (in inject_brain_rules.py / SessionStart injection)
to use this safe value so malformed env input won’t raise and abort injection.
In `@Gradata/tests/test_hooks_intelligence.py`:
- Line 468: The current assertion "assert result is not None and 'chal' in
result['result']" is too loose and can yield false positives; update the test in
test_hooks_intelligence (the assertion referencing the variable result and its
"result" key, and the similar assertion around line 486) to assert exact
expected outputs — either compare result to the full expected dictionary payload
or assert result["result"] equals the exact expected string and also validate
any tag lists/sets (e.g., compare sets) to ensure no extra or missing tags are
present; make the assertions deterministic and strict instead of using a simple
substring membership.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 10576b45-ffe5-4ff2-8e2e-b5b3ed969fba
📒 Files selected for processing (4)
Gradata/src/gradata/hooks/implicit_feedback.pyGradata/src/gradata/hooks/inject_brain_rules.pyGradata/src/gradata/hooks/jit_inject.pyGradata/tests/test_hooks_intelligence.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.11
- GitHub Check: pytest windows-latest / py3.12
- GitHub Check: pytest (py3.12)
- GitHub Check: pytest (py3.11)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/src/gradata/hooks/inject_brain_rules.py
🔇 Additional comments (4)
Gradata/src/gradata/hooks/inject_brain_rules.py (1)
167-170: Good default restoration for full wisdom context.Keeping Active guidance/disposition by default here matches the revert objective and preserves expected session-start behavior.
Gradata/src/gradata/hooks/jit_inject.py (2)
69-70: Defaults are correctly restored for broader JIT coverage.
DEFAULT_MAX_RULES=5andDEFAULT_MIN_CONFIDENCE=0.60are consistent with the stated revert intent.
366-380: Compact[state+confidence] descriptionemission looks good.This restores a useful ranking signal for the model while keeping output concise, and it preserves existing dedup behavior.
Gradata/tests/test_hooks_intelligence.py (1)
446-447: Good contract coverage for explicit signal payloads.The exact assertions on Line 446 and Line 457 correctly lock the new
implicit_feedbackreturn contract ([fb:neg]/[fb:rem]).Also applies to: 457-457
| if signals: | ||
| _SIG_ABBREV = { | ||
| "negation": "neg", | ||
| "reminder": "rem", | ||
| "challenge": "chal", | ||
| "approval": "approv", | ||
| "gap": "gap", | ||
| } | ||
| sig_str = ",".join(_SIG_ABBREV.get(str(s["type"]), str(s["type"])) for s in signals) | ||
| return {"result": f"[fb:{sig_str}]"} |
There was a problem hiding this comment.
Resolve conflicting approval + negative signals before emitting/returning feedback.
Line 205 currently returns all detected signal types, but the current flow can classify a single message as both negative and approval (e.g., challenge phrasing containing “that’s correct”). That can emit OUTPUT_ACCEPTED (Line 188) and also return negative feedback in the same turn, which is contradictory.
Suggested fix
@@
- has_negative = bool(signal_types & _NEGATIVE_SIGNAL_TYPES)
- has_approval = "approval" in signal_types
+ has_negative = bool(signal_types & _NEGATIVE_SIGNAL_TYPES)
+ has_approval = "approval" in signal_types
+
+ # Negative feedback must take precedence over approval to avoid
+ # contradictory acceptance + correction in the same message.
+ if has_negative and has_approval:
+ signals = [s for s in signals if s["type"] != "approval"]
+ signal_types.discard("approval")
+ has_approval = False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/hooks/implicit_feedback.py` around lines 205 - 214, The
code can emit both an approval and negative feedback at once; update the signals
handling in implicit_feedback.py (the block that builds _SIG_ABBREV and sig_str
and returns {"result": ...}) to resolve conflicts by preferring approval: if any
signal with type "approval" exists, filter out negative signal types (e.g.,
"negation", "challenge", "gap") before constructing sig_str so you won't return
negative feedback alongside OUTPUT_ACCEPTED; keep the existing _SIG_ABBREV
mapping and sig_str construction but operate on a cleaned signals list (or set)
so the return only includes the resolved signal types.
| # which address the highest-stakes errors. Mid-tier rules fire via JIT when | ||
| # contextually relevant and are retrievable via brain.search(). Saves ~59 tok. | ||
| wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "3")) | ||
| wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "9")) |
There was a problem hiding this comment.
Harden GRADATA_WISDOM_MAX_RULES parsing to prevent SessionStart breakage.
Line 185 uses raw int(...) on environment input. A malformed value (e.g. "abc") raises ValueError and can abort injection instead of degrading safely.
Proposed defensive parse + clamp
- wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "9"))
+ raw_max_rules = os.environ.get("GRADATA_WISDOM_MAX_RULES", "9").strip()
+ try:
+ wisdom_max_rules = max(0, int(raw_max_rules))
+ except ValueError:
+ wisdom_max_rules = 9Based on learnings: _env_int default clamping to minimum.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/hooks/inject_brain_rules.py` at line 185, Replace the raw
int(...) parsing of GRADATA_WISDOM_MAX_RULES used to set wisdom_max_rules with a
defensive parse: catch ValueError/TypeError, fall back to the default (9), and
clamp the resulting value to a safe minimum (e.g., 0 or 1) and optionally an
upper bound; you can reuse or implement a small helper like _env_int to perform
parse-with-default-and-clamp. Update the code that references wisdom_max_rules
(in inject_brain_rules.py / SessionStart injection) to use this safe value so
malformed env input won’t raise and abort injection.
| with patch("gradata.hooks.implicit_feedback.emit_hook_event") as mock_emit: | ||
| result = feedback_main({"message": "Are you sure that's correct? It doesn't look right."}) | ||
| assert result is None | ||
| assert result is not None and "chal" in result["result"] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Tighten these assertions to prevent false-positive passes.
Line 468 and Line 486 currently allow broad outputs, so conflicting or extra tags can slip through unnoticed. Prefer exact expected payloads for deterministic regression checks.
Suggested test tightening
- assert result is not None and "chal" in result["result"]
+ assert result == {"result": "[fb:chal]"}
@@
- assert result is not None and result["result"].startswith("[fb:")
+ assert result == {"result": "[fb:rem,chal]"}Also applies to: 486-486
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/tests/test_hooks_intelligence.py` at line 468, The current assertion
"assert result is not None and 'chal' in result['result']" is too loose and can
yield false positives; update the test in test_hooks_intelligence (the assertion
referencing the variable result and its "result" key, and the similar assertion
around line 486) to assert exact expected outputs — either compare result to the
full expected dictionary payload or assert result["result"] equals the exact
expected string and also validate any tag lists/sets (e.g., compare sets) to
ensure no extra or missing tags are present; make the assertions deterministic
and strict instead of using a simple substring membership.
Critical: - cloud/sync.py: fix double /api/v1 prefix on telemetry + corpus paths Major: - cli.py: resolve brain_root once for skill export consistency - skill_export.py: escape backslashes in YAML descriptions - skill_export.py: whitespace-only desc falls back to auto - implicit_feedback.py: negative signals win over approval on conflict - inject_brain_rules.py: harden MAX_RULES int parse against malformed env Tests: - update assertions for corrected /telemetry + /corpus paths - add regression coverage for YAML backslash/newline/whitespace - tighten loose assertions in hooks_intelligence Co-authored-by: Oliver <oliver@spritesai.com>
Summary
PR #136 advertised a "99.2% reduction (5513→42)" but stacked legit format compressions on top of 6 knob-cuts that quietly removed product behavior. This PR undoes the 6 defeaturing cuts while keeping all legit compressions (frontmatter strips, dedup, compact [P83] prefix, snippet/top_k tuning, etc.) and the synthesizer from PR #140.
What was defeatured (now restored)
GRADATA_WISDOM_MAX_RULESdefaultGRADATA_WISDOM_FULLdefaultJIT DEFAULT_MAX_RULESJIT DEFAULT_MIN_CONFIDENCEdescriptiononly[P83] description(state + confidence)implicit_feedbackreturnNone(signals only logged){"result": "[fb:neg,rem]"}(model sees signal)Measurements (tiktoken cl100k_base, typical scenario: once + 10·per_turn)
da6bed43, verify-script introduction)d3721320(last clean legit compression = 69% honest reduction)The synthesizer (PR #140) legitimately compresses N-rules-lines into prose, which is why post-revert lands at 1179 (better than d372132's 1724) without any knob cuts.
Test plan
pytest tests/→ 3931 passed, 2 skippedcontext_inject.main()returns 401-char result;inject_brain_rules.main()returns 227-char MUST block + Active guidance + dispositionimplicit_feedbackreturns{"result": "[fb:neg]"}on "No, that's wrong"[P83] descriptionformat with top-5 rules at 0.60 thresholdGenerated with Gradata