revert(autoresearch): undo 6 defeaturing knob-cuts from PR #136 by Gradata · Pull Request #141 · Gradata/gradata

Gradata · 2026-04-22T07:04:54Z

Summary

PR #136 advertised a "99.2% reduction (5513→42)" but stacked legit format compressions on top of 6 knob-cuts that quietly removed product behavior. This PR undoes the 6 defeaturing cuts while keeping all legit compressions (frontmatter strips, dedup, compact [P83] prefix, snippet/top_k tuning, etc.) and the synthesizer from PR #140.

What was defeatured (now restored)

Knob	Defeatured	Restored
`GRADATA_WISDOM_MAX_RULES` default	3	9
`GRADATA_WISDOM_FULL` default	0 (strip Active guidance/disposition)	1 (keep them)
`JIT DEFAULT_MAX_RULES`	1	5
`JIT DEFAULT_MIN_CONFIDENCE`	0.90	0.60
JIT line format	`description` only	`[P83] description` (state + confidence)
`implicit_feedback` return	`None` (signals only logged)	`{"result": "[fb:neg,rem]"}` (model sees signal)

Measurements (tiktoken cl100k_base, typical scenario: once + 10·per_turn)

5513 — baseline (da6bed43, verify-script introduction)
1724 — d3721320 (last clean legit compression = 69% honest reduction)
864 — pre-revert main (84% — but defeatured)
1179 — this PR (79% honest reduction, all 6 features restored)

The synthesizer (PR #140) legitimately compresses N-rules-lines into prose, which is why post-revert lands at 1179 (better than d372132's 1724) without any knob cuts.

Test plan

pytest tests/ → 3931 passed, 2 skipped
Hook smoke: context_inject.main() returns 401-char result; inject_brain_rules.main() returns 227-char MUST block + Active guidance + disposition
implicit_feedback returns {"result": "[fb:neg]"} on "No, that's wrong"
JIT emits [P83] description format with top-5 rules at 0.60 threshold

Generated with Gradata

PR #136 "99.2% reduction (5513→42)" stacked legit format compressions (strip YAML/XML wrappers, dedup, compact [P:0.83]→[P83], snippet/top_k tuning) on top of 6 knob-cuts that quietly removed product behavior: - GRADATA_WISDOM_MAX_RULES default 3 → 9 (undo 0bb2de9 + 5eabc48) - GRADATA_WISDOM_FULL default 0 → 1 (undo d387de9 Active guidance strip) - JIT DEFAULT_MAX_RULES 1 → 5 (undo 4a44+9582+dfab) - JIT DEFAULT_MIN_CONFIDENCE 0.90 → 0.60 (undo 699827a) - Restore [Pxx] state+confidence prefix on JIT output (undo 50b63d1) - Restore [fb:neg,rem] implicit_feedback signal injection (undo 61b43c8) Honest milestone: d372132 (last pure-compression commit) measured 1724 weighted tokens vs 5513 baseline = 69% reduction. The further jump to 42 came from defeaturing, not compression. Post-revert measurement with synthesizer (PR #140) stacked: weighted=1179, session_once=154, per_turn=102.5 = 79% honest reduction vs 5513 baseline, all 6 features restored. Test updates: 3 implicit_feedback tests now assert returned signal strings instead of None. Co-Authored-By: Gradata <noreply@gradata.ai>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-04-22T07:05:06Z

📝 Walkthrough

Summary

Reverts six defeaturing knob changes from PR #136: restores feature defaults while maintaining format compressions from that PR and the synthesizer from PR #140
implicit_feedback return behavior change: now returns {"result": "[fb:neg,rem,chal,approv,gap]"} instead of None when signals detected, enabling model visibility into feedback signals
GRADATA_WISDOM_MAX_RULES: default increased from 3 to 9, expanding non-negotiable rule inclusion
GRADATA_WISDOM_FULL: default changed from 0 to 1, keeping "Active guidance" and "Current disposition" sections
JIT DEFAULT_MAX_RULES: increased from 1 to 5 rules
JIT DEFAULT_MIN_CONFIDENCE: lowered from 0.90 to 0.60 threshold
JIT rule formatting: rules now emit with [Pxx] prefix (state abbreviation + zero-padded confidence %), replacing description-only format
Test updates: assertions updated to verify non-None feedback returns with signal abbreviations
Token efficiency: achieves 79% reduction vs baseline (1179 tokens vs 5513) while restoring features
All tests pass: 3931 passed, 2 skipped; hook smoke tests confirm expected outputs

Breaking change: implicit_feedback() return value changed from None to structured feedback dict

Walkthrough

Multiple hook functions in the gradata library are updated: implicit feedback detection now returns inline formatted signal results instead of None; brain rule injection defaults expand rule retention and maximum rule count; JIT rule injection parameters are relaxed to include more candidates with revised confidence thresholding and rule formatting; tests are updated to verify the new return behaviors.

Changes

Cohort / File(s)	Summary
Implicit Feedback Hook `Gradata/src/gradata/hooks/implicit_feedback.py`, `Gradata/tests/test_hooks_intelligence.py`	The `main()` function now returns `{"result": "[fb:<sig_str>]"}` when implicit feedback signals are detected (mapping signal types to abbreviations: negation→neg, reminder→rem, challenge→chal, approval→approv, gap→gap), instead of returning `None`. Test assertions updated to validate the new return payloads.
Brain Rules Injection `Gradata/src/gradata/hooks/inject_brain_rules.py`	Default behavior for section retention inverted: "Active guidance" and "Current disposition" sections now kept by default unless `GRADATA_WISDOM_FULL=0`. Maximum non-negotiable rule lines cap increased from 3 to 9 via `GRADATA_WISDOM_MAX_RULES` environment variable default.
JIT Rule Injection `Gradata/src/gradata/hooks/jit_inject.py`	`DEFAULT_MAX_RULES` increased from 1 to 5 and `DEFAULT_MIN_CONFIDENCE` decreased from 0.90 to 0.60, broadening candidate rule selection. Rule output formatting now includes a bracketed prefix with abbreviated state (P/I/R) and zero-padded confidence percentage instead of raw description.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: SDK hook port — 19 Python hooks + installer + profile system #20: Modifies the same implicit_feedback.py hook to emit and return inline feedback result payloads when signals are detected.
perf(tokens): autoresearch loop 99.2% reduction (5513→42) #136: Related change to implicit_feedback.py main() function behavior regarding inline feedback result returns.
feat(hooks): inject meta-rules into LLM context at session start #45: Modifies inject_brain_rules.py to adjust rule-injection defaults and behavior parameters.

Suggested labels

feature

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies this as a revert of 6 defeaturing knob-cuts from PR `#136`, which directly aligns with the main objective of restoring removed product behavior while keeping legitimate compressions.
Description check	✅ Passed	The description provides comprehensive context on the six defeaturing changes being reverted, includes a detailed comparison table, token measurements, and test plan results that all relate directly to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/revert-autoresearch-defeaturing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Gradata/src/gradata/hooks/implicit_feedback.py`:
- Around line 205-214: The code can emit both an approval and negative feedback
at once; update the signals handling in implicit_feedback.py (the block that
builds _SIG_ABBREV and sig_str and returns {"result": ...}) to resolve conflicts
by preferring approval: if any signal with type "approval" exists, filter out
negative signal types (e.g., "negation", "challenge", "gap") before constructing
sig_str so you won't return negative feedback alongside OUTPUT_ACCEPTED; keep
the existing _SIG_ABBREV mapping and sig_str construction but operate on a
cleaned signals list (or set) so the return only includes the resolved signal
types.

In `@Gradata/src/gradata/hooks/inject_brain_rules.py`:
- Line 185: Replace the raw int(...) parsing of GRADATA_WISDOM_MAX_RULES used to
set wisdom_max_rules with a defensive parse: catch ValueError/TypeError, fall
back to the default (9), and clamp the resulting value to a safe minimum (e.g.,
0 or 1) and optionally an upper bound; you can reuse or implement a small helper
like _env_int to perform parse-with-default-and-clamp. Update the code that
references wisdom_max_rules (in inject_brain_rules.py / SessionStart injection)
to use this safe value so malformed env input won’t raise and abort injection.

In `@Gradata/tests/test_hooks_intelligence.py`:
- Line 468: The current assertion "assert result is not None and 'chal' in
result['result']" is too loose and can yield false positives; update the test in
test_hooks_intelligence (the assertion referencing the variable result and its
"result" key, and the similar assertion around line 486) to assert exact
expected outputs — either compare result to the full expected dictionary payload
or assert result["result"] equals the exact expected string and also validate
any tag lists/sets (e.g., compare sets) to ensure no extra or missing tags are
present; make the assertions deterministic and strict instead of using a simple
substring membership.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10576b45-ffe5-4ff2-8e2e-b5b3ed969fba

📥 Commits

Reviewing files that changed from the base of the PR and between 129c83f and db05e08.

📒 Files selected for processing (4)

Gradata/src/gradata/hooks/implicit_feedback.py
Gradata/src/gradata/hooks/inject_brain_rules.py
Gradata/src/gradata/hooks/jit_inject.py
Gradata/tests/test_hooks_intelligence.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: pytest macos-latest / py3.12
GitHub Check: pytest windows-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.12
GitHub Check: pytest macos-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.11
GitHub Check: pytest windows-latest / py3.12
GitHub Check: pytest (py3.12)
GitHub Check: pytest (py3.11)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

📚 Learning: 2026-04-17T17:18:07.439Z

Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

Applied to files:

Gradata/src/gradata/hooks/inject_brain_rules.py

🔇 Additional comments (4)

Gradata/src/gradata/hooks/inject_brain_rules.py (1)

167-170: Good default restoration for full wisdom context.

Keeping Active guidance/disposition by default here matches the revert objective and preserves expected session-start behavior.

Gradata/src/gradata/hooks/jit_inject.py (2)

69-70: Defaults are correctly restored for broader JIT coverage.

DEFAULT_MAX_RULES=5 and DEFAULT_MIN_CONFIDENCE=0.60 are consistent with the stated revert intent.

366-380: Compact [state+confidence] description emission looks good.

This restores a useful ranking signal for the model while keeping output concise, and it preserves existing dedup behavior.

Gradata/tests/test_hooks_intelligence.py (1)

446-447: Good contract coverage for explicit signal payloads.

The exact assertions on Line 446 and Line 457 correctly lock the new implicit_feedback return contract ([fb:neg] / [fb:rem]).

Also applies to: 457-457

coderabbitai · 2026-04-22T07:09:01Z

+        if signals:
+            _SIG_ABBREV = {
+                "negation": "neg",
+                "reminder": "rem",
+                "challenge": "chal",
+                "approval": "approv",
+                "gap": "gap",
+            }
+            sig_str = ",".join(_SIG_ABBREV.get(str(s["type"]), str(s["type"])) for s in signals)
+            return {"result": f"[fb:{sig_str}]"}


⚠️ Potential issue | 🟠 Major

Resolve conflicting approval + negative signals before emitting/returning feedback.

Line 205 currently returns all detected signal types, but the current flow can classify a single message as both negative and approval (e.g., challenge phrasing containing “that’s correct”). That can emit OUTPUT_ACCEPTED (Line 188) and also return negative feedback in the same turn, which is contradictory.

Suggested fix

@@ - has_negative = bool(signal_types & _NEGATIVE_SIGNAL_TYPES) - has_approval = "approval" in signal_types + has_negative = bool(signal_types & _NEGATIVE_SIGNAL_TYPES) + has_approval = "approval" in signal_types + + # Negative feedback must take precedence over approval to avoid + # contradictory acceptance + correction in the same message. + if has_negative and has_approval: + signals = [s for s in signals if s["type"] != "approval"] + signal_types.discard("approval") + has_approval = False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/hooks/implicit_feedback.py` around lines 205 - 214, The code can emit both an approval and negative feedback at once; update the signals handling in implicit_feedback.py (the block that builds _SIG_ABBREV and sig_str and returns {"result": ...}) to resolve conflicts by preferring approval: if any signal with type "approval" exists, filter out negative signal types (e.g., "negation", "challenge", "gap") before constructing sig_str so you won't return negative feedback alongside OUTPUT_ACCEPTED; keep the existing _SIG_ABBREV mapping and sig_str construction but operate on a cleaned signals list (or set) so the return only includes the resolved signal types.

coderabbitai · 2026-04-22T07:09:01Z

-    # which address the highest-stakes errors. Mid-tier rules fire via JIT when
-    # contextually relevant and are retrievable via brain.search(). Saves ~59 tok.
-    wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "3"))
+    wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "9"))


⚠️ Potential issue | 🟠 Major

Harden GRADATA_WISDOM_MAX_RULES parsing to prevent SessionStart breakage.

Line 185 uses raw int(...) on environment input. A malformed value (e.g. "abc") raises ValueError and can abort injection instead of degrading safely.

Proposed defensive parse + clamp

- wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "9")) + raw_max_rules = os.environ.get("GRADATA_WISDOM_MAX_RULES", "9").strip() + try: + wisdom_max_rules = max(0, int(raw_max_rules)) + except ValueError: + wisdom_max_rules = 9

Based on learnings: _env_int default clamping to minimum.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/hooks/inject_brain_rules.py` at line 185, Replace the raw int(...) parsing of GRADATA_WISDOM_MAX_RULES used to set wisdom_max_rules with a defensive parse: catch ValueError/TypeError, fall back to the default (9), and clamp the resulting value to a safe minimum (e.g., 0 or 1) and optionally an upper bound; you can reuse or implement a small helper like _env_int to perform parse-with-default-and-clamp. Update the code that references wisdom_max_rules (in inject_brain_rules.py / SessionStart injection) to use this safe value so malformed env input won’t raise and abort injection.

coderabbitai · 2026-04-22T07:09:01Z

    with patch("gradata.hooks.implicit_feedback.emit_hook_event") as mock_emit:
        result = feedback_main({"message": "Are you sure that's correct? It doesn't look right."})
-    assert result is None
+    assert result is not None and "chal" in result["result"]


🧹 Nitpick | 🔵 Trivial

Tighten these assertions to prevent false-positive passes.

Line 468 and Line 486 currently allow broad outputs, so conflicting or extra tags can slip through unnoticed. Prefer exact expected payloads for deterministic regression checks.

Suggested test tightening

- assert result is not None and "chal" in result["result"] + assert result == {"result": "[fb:chal]"} @@ - assert result is not None and result["result"].startswith("[fb:") + assert result == {"result": "[fb:rem,chal]"}

Also applies to: 486-486

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/tests/test_hooks_intelligence.py` at line 468, The current assertion "assert result is not None and 'chal' in result['result']" is too loose and can yield false positives; update the test in test_hooks_intelligence (the assertion referencing the variable result and its "result" key, and the similar assertion around line 486) to assert exact expected outputs — either compare result to the full expected dictionary payload or assert result["result"] equals the exact expected string and also validate any tag lists/sets (e.g., compare sets) to ensure no extra or missing tags are present; make the assertions deterministic and strict instead of using a simple substring membership.

Critical: - cloud/sync.py: fix double /api/v1 prefix on telemetry + corpus paths Major: - cli.py: resolve brain_root once for skill export consistency - skill_export.py: escape backslashes in YAML descriptions - skill_export.py: whitespace-only desc falls back to auto - implicit_feedback.py: negative signals win over approval on conflict - inject_brain_rules.py: harden MAX_RULES int parse against malformed env Tests: - update assertions for corrected /telemetry + /corpus paths - add regression coverage for YAML backslash/newline/whitespace - tighten loose assertions in hooks_intelligence Co-authored-by: Oliver <oliver@spritesai.com>

greptile-apps Bot reviewed Apr 22, 2026

View reviewed changes

coderabbitai Bot added the feature label Apr 22, 2026

coderabbitai Bot requested changes Apr 22, 2026

View reviewed changes

Gradata merged commit 8f57c14 into main May 1, 2026
9 checks passed

Gradata deleted the fix/revert-autoresearch-defeaturing branch May 1, 2026 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert(autoresearch): undo 6 defeaturing knob-cuts from PR #136#141

revert(autoresearch): undo 6 defeaturing knob-cuts from PR #136#141
Gradata merged 1 commit intomainfrom
fix/revert-autoresearch-defeaturing

Gradata commented Apr 22, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Summary

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Apr 22, 2026

Summary

What was defeatured (now restored)

Measurements (tiktoken cl100k_base, typical scenario: once + 10·per_turn)

Test plan

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading