Skip to content

revert(autoresearch): undo 6 defeaturing knob-cuts from PR #136#141

Merged
Gradata merged 1 commit intomainfrom
fix/revert-autoresearch-defeaturing
May 1, 2026
Merged

revert(autoresearch): undo 6 defeaturing knob-cuts from PR #136#141
Gradata merged 1 commit intomainfrom
fix/revert-autoresearch-defeaturing

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 22, 2026

Summary

PR #136 advertised a "99.2% reduction (5513→42)" but stacked legit format compressions on top of 6 knob-cuts that quietly removed product behavior. This PR undoes the 6 defeaturing cuts while keeping all legit compressions (frontmatter strips, dedup, compact [P83] prefix, snippet/top_k tuning, etc.) and the synthesizer from PR #140.

What was defeatured (now restored)

Knob Defeatured Restored
GRADATA_WISDOM_MAX_RULES default 3 9
GRADATA_WISDOM_FULL default 0 (strip Active guidance/disposition) 1 (keep them)
JIT DEFAULT_MAX_RULES 1 5
JIT DEFAULT_MIN_CONFIDENCE 0.90 0.60
JIT line format description only [P83] description (state + confidence)
implicit_feedback return None (signals only logged) {"result": "[fb:neg,rem]"} (model sees signal)

Measurements (tiktoken cl100k_base, typical scenario: once + 10·per_turn)

  • 5513 — baseline (da6bed43, verify-script introduction)
  • 1724d3721320 (last clean legit compression = 69% honest reduction)
  • 864 — pre-revert main (84% — but defeatured)
  • 1179this PR (79% honest reduction, all 6 features restored)

The synthesizer (PR #140) legitimately compresses N-rules-lines into prose, which is why post-revert lands at 1179 (better than d372132's 1724) without any knob cuts.

Test plan

  • pytest tests/ → 3931 passed, 2 skipped
  • Hook smoke: context_inject.main() returns 401-char result; inject_brain_rules.main() returns 227-char MUST block + Active guidance + disposition
  • implicit_feedback returns {"result": "[fb:neg]"} on "No, that's wrong"
  • JIT emits [P83] description format with top-5 rules at 0.60 threshold

Generated with Gradata

PR #136 "99.2% reduction (5513→42)" stacked legit format compressions
(strip YAML/XML wrappers, dedup, compact [P:0.83]→[P83], snippet/top_k
tuning) on top of 6 knob-cuts that quietly removed product behavior:

- GRADATA_WISDOM_MAX_RULES default 3 → 9 (undo 0bb2de9 + 5eabc48)
- GRADATA_WISDOM_FULL default 0 → 1 (undo d387de9 Active guidance strip)
- JIT DEFAULT_MAX_RULES 1 → 5 (undo 4a44+9582+dfab)
- JIT DEFAULT_MIN_CONFIDENCE 0.90 → 0.60 (undo 699827a)
- Restore [Pxx] state+confidence prefix on JIT output (undo 50b63d1)
- Restore [fb:neg,rem] implicit_feedback signal injection (undo 61b43c8)

Honest milestone: d372132 (last pure-compression commit) measured 1724
weighted tokens vs 5513 baseline = 69% reduction. The further jump to
42 came from defeaturing, not compression.

Post-revert measurement with synthesizer (PR #140) stacked:
  weighted=1179, session_once=154, per_turn=102.5
  = 79% honest reduction vs 5513 baseline, all 6 features restored.

Test updates: 3 implicit_feedback tests now assert returned signal
strings instead of None.

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Summary

  • Reverts six defeaturing knob changes from PR #136: restores feature defaults while maintaining format compressions from that PR and the synthesizer from PR #140
  • implicit_feedback return behavior change: now returns {"result": "[fb:neg,rem,chal,approv,gap]"} instead of None when signals detected, enabling model visibility into feedback signals
  • GRADATA_WISDOM_MAX_RULES: default increased from 3 to 9, expanding non-negotiable rule inclusion
  • GRADATA_WISDOM_FULL: default changed from 0 to 1, keeping "Active guidance" and "Current disposition" sections
  • JIT DEFAULT_MAX_RULES: increased from 1 to 5 rules
  • JIT DEFAULT_MIN_CONFIDENCE: lowered from 0.90 to 0.60 threshold
  • JIT rule formatting: rules now emit with [Pxx] prefix (state abbreviation + zero-padded confidence %), replacing description-only format
  • Test updates: assertions updated to verify non-None feedback returns with signal abbreviations
  • Token efficiency: achieves 79% reduction vs baseline (1179 tokens vs 5513) while restoring features
  • All tests pass: 3931 passed, 2 skipped; hook smoke tests confirm expected outputs

Breaking change: implicit_feedback() return value changed from None to structured feedback dict

Walkthrough

Multiple hook functions in the gradata library are updated: implicit feedback detection now returns inline formatted signal results instead of None; brain rule injection defaults expand rule retention and maximum rule count; JIT rule injection parameters are relaxed to include more candidates with revised confidence thresholding and rule formatting; tests are updated to verify the new return behaviors.

Changes

Cohort / File(s) Summary
Implicit Feedback Hook
Gradata/src/gradata/hooks/implicit_feedback.py, Gradata/tests/test_hooks_intelligence.py
The main() function now returns {"result": "[fb:<sig_str>]"} when implicit feedback signals are detected (mapping signal types to abbreviations: negation→neg, reminder→rem, challenge→chal, approval→approv, gap→gap), instead of returning None. Test assertions updated to validate the new return payloads.
Brain Rules Injection
Gradata/src/gradata/hooks/inject_brain_rules.py
Default behavior for section retention inverted: "Active guidance" and "Current disposition" sections now kept by default unless GRADATA_WISDOM_FULL=0. Maximum non-negotiable rule lines cap increased from 3 to 9 via GRADATA_WISDOM_MAX_RULES environment variable default.
JIT Rule Injection
Gradata/src/gradata/hooks/jit_inject.py
DEFAULT_MAX_RULES increased from 1 to 5 and DEFAULT_MIN_CONFIDENCE decreased from 0.90 to 0.60, broadening candidate rule selection. Rule output formatting now includes a bracketed prefix with abbreviated state (P/I/R) and zero-padded confidence percentage instead of raw description.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

feature

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies this as a revert of 6 defeaturing knob-cuts from PR #136, which directly aligns with the main objective of restoring removed product behavior while keeping legitimate compressions.
Description check ✅ Passed The description provides comprehensive context on the six defeaturing changes being reverted, includes a detailed comparison table, token measurements, and test plan results that all relate directly to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/revert-autoresearch-defeaturing

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label Apr 22, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Gradata/src/gradata/hooks/implicit_feedback.py`:
- Around line 205-214: The code can emit both an approval and negative feedback
at once; update the signals handling in implicit_feedback.py (the block that
builds _SIG_ABBREV and sig_str and returns {"result": ...}) to resolve conflicts
by preferring approval: if any signal with type "approval" exists, filter out
negative signal types (e.g., "negation", "challenge", "gap") before constructing
sig_str so you won't return negative feedback alongside OUTPUT_ACCEPTED; keep
the existing _SIG_ABBREV mapping and sig_str construction but operate on a
cleaned signals list (or set) so the return only includes the resolved signal
types.

In `@Gradata/src/gradata/hooks/inject_brain_rules.py`:
- Line 185: Replace the raw int(...) parsing of GRADATA_WISDOM_MAX_RULES used to
set wisdom_max_rules with a defensive parse: catch ValueError/TypeError, fall
back to the default (9), and clamp the resulting value to a safe minimum (e.g.,
0 or 1) and optionally an upper bound; you can reuse or implement a small helper
like _env_int to perform parse-with-default-and-clamp. Update the code that
references wisdom_max_rules (in inject_brain_rules.py / SessionStart injection)
to use this safe value so malformed env input won’t raise and abort injection.

In `@Gradata/tests/test_hooks_intelligence.py`:
- Line 468: The current assertion "assert result is not None and 'chal' in
result['result']" is too loose and can yield false positives; update the test in
test_hooks_intelligence (the assertion referencing the variable result and its
"result" key, and the similar assertion around line 486) to assert exact
expected outputs — either compare result to the full expected dictionary payload
or assert result["result"] equals the exact expected string and also validate
any tag lists/sets (e.g., compare sets) to ensure no extra or missing tags are
present; make the assertions deterministic and strict instead of using a simple
substring membership.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10576b45-ffe5-4ff2-8e2e-b5b3ed969fba

📥 Commits

Reviewing files that changed from the base of the PR and between 129c83f and db05e08.

📒 Files selected for processing (4)
  • Gradata/src/gradata/hooks/implicit_feedback.py
  • Gradata/src/gradata/hooks/inject_brain_rules.py
  • Gradata/src/gradata/hooks/jit_inject.py
  • Gradata/tests/test_hooks_intelligence.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest (py3.12)
  • GitHub Check: pytest (py3.11)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

Applied to files:

  • Gradata/src/gradata/hooks/inject_brain_rules.py
🔇 Additional comments (4)
Gradata/src/gradata/hooks/inject_brain_rules.py (1)

167-170: Good default restoration for full wisdom context.

Keeping Active guidance/disposition by default here matches the revert objective and preserves expected session-start behavior.

Gradata/src/gradata/hooks/jit_inject.py (2)

69-70: Defaults are correctly restored for broader JIT coverage.

DEFAULT_MAX_RULES=5 and DEFAULT_MIN_CONFIDENCE=0.60 are consistent with the stated revert intent.


366-380: Compact [state+confidence] description emission looks good.

This restores a useful ranking signal for the model while keeping output concise, and it preserves existing dedup behavior.

Gradata/tests/test_hooks_intelligence.py (1)

446-447: Good contract coverage for explicit signal payloads.

The exact assertions on Line 446 and Line 457 correctly lock the new implicit_feedback return contract ([fb:neg] / [fb:rem]).

Also applies to: 457-457

Comment on lines +205 to +214
if signals:
_SIG_ABBREV = {
"negation": "neg",
"reminder": "rem",
"challenge": "chal",
"approval": "approv",
"gap": "gap",
}
sig_str = ",".join(_SIG_ABBREV.get(str(s["type"]), str(s["type"])) for s in signals)
return {"result": f"[fb:{sig_str}]"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve conflicting approval + negative signals before emitting/returning feedback.

Line 205 currently returns all detected signal types, but the current flow can classify a single message as both negative and approval (e.g., challenge phrasing containing “that’s correct”). That can emit OUTPUT_ACCEPTED (Line 188) and also return negative feedback in the same turn, which is contradictory.

Suggested fix
@@
-        has_negative = bool(signal_types & _NEGATIVE_SIGNAL_TYPES)
-        has_approval = "approval" in signal_types
+        has_negative = bool(signal_types & _NEGATIVE_SIGNAL_TYPES)
+        has_approval = "approval" in signal_types
+
+        # Negative feedback must take precedence over approval to avoid
+        # contradictory acceptance + correction in the same message.
+        if has_negative and has_approval:
+            signals = [s for s in signals if s["type"] != "approval"]
+            signal_types.discard("approval")
+            has_approval = False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/hooks/implicit_feedback.py` around lines 205 - 214, The
code can emit both an approval and negative feedback at once; update the signals
handling in implicit_feedback.py (the block that builds _SIG_ABBREV and sig_str
and returns {"result": ...}) to resolve conflicts by preferring approval: if any
signal with type "approval" exists, filter out negative signal types (e.g.,
"negation", "challenge", "gap") before constructing sig_str so you won't return
negative feedback alongside OUTPUT_ACCEPTED; keep the existing _SIG_ABBREV
mapping and sig_str construction but operate on a cleaned signals list (or set)
so the return only includes the resolved signal types.

# which address the highest-stakes errors. Mid-tier rules fire via JIT when
# contextually relevant and are retrievable via brain.search(). Saves ~59 tok.
wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "3"))
wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "9"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden GRADATA_WISDOM_MAX_RULES parsing to prevent SessionStart breakage.

Line 185 uses raw int(...) on environment input. A malformed value (e.g. "abc") raises ValueError and can abort injection instead of degrading safely.

Proposed defensive parse + clamp
-    wisdom_max_rules = int(os.environ.get("GRADATA_WISDOM_MAX_RULES", "9"))
+    raw_max_rules = os.environ.get("GRADATA_WISDOM_MAX_RULES", "9").strip()
+    try:
+        wisdom_max_rules = max(0, int(raw_max_rules))
+    except ValueError:
+        wisdom_max_rules = 9

Based on learnings: _env_int default clamping to minimum.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/hooks/inject_brain_rules.py` at line 185, Replace the raw
int(...) parsing of GRADATA_WISDOM_MAX_RULES used to set wisdom_max_rules with a
defensive parse: catch ValueError/TypeError, fall back to the default (9), and
clamp the resulting value to a safe minimum (e.g., 0 or 1) and optionally an
upper bound; you can reuse or implement a small helper like _env_int to perform
parse-with-default-and-clamp. Update the code that references wisdom_max_rules
(in inject_brain_rules.py / SessionStart injection) to use this safe value so
malformed env input won’t raise and abort injection.

with patch("gradata.hooks.implicit_feedback.emit_hook_event") as mock_emit:
result = feedback_main({"message": "Are you sure that's correct? It doesn't look right."})
assert result is None
assert result is not None and "chal" in result["result"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Tighten these assertions to prevent false-positive passes.

Line 468 and Line 486 currently allow broad outputs, so conflicting or extra tags can slip through unnoticed. Prefer exact expected payloads for deterministic regression checks.

Suggested test tightening
-    assert result is not None and "chal" in result["result"]
+    assert result == {"result": "[fb:chal]"}
@@
-    assert result is not None and result["result"].startswith("[fb:")
+    assert result == {"result": "[fb:rem,chal]"}

Also applies to: 486-486

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/tests/test_hooks_intelligence.py` at line 468, The current assertion
"assert result is not None and 'chal' in result['result']" is too loose and can
yield false positives; update the test in test_hooks_intelligence (the assertion
referencing the variable result and its "result" key, and the similar assertion
around line 486) to assert exact expected outputs — either compare result to the
full expected dictionary payload or assert result["result"] equals the exact
expected string and also validate any tag lists/sets (e.g., compare sets) to
ensure no extra or missing tags are present; make the assertions deterministic
and strict instead of using a simple substring membership.

@Gradata Gradata merged commit 8f57c14 into main May 1, 2026
9 checks passed
@Gradata Gradata deleted the fix/revert-autoresearch-defeaturing branch May 1, 2026 15:31
Gradata added a commit that referenced this pull request May 1, 2026
Critical:
- cloud/sync.py: fix double /api/v1 prefix on telemetry + corpus paths

Major:
- cli.py: resolve brain_root once for skill export consistency
- skill_export.py: escape backslashes in YAML descriptions
- skill_export.py: whitespace-only desc falls back to auto
- implicit_feedback.py: negative signals win over approval on conflict
- inject_brain_rules.py: harden MAX_RULES int parse against malformed env

Tests:
- update assertions for corrected /telemetry + /corpus paths
- add regression coverage for YAML backslash/newline/whitespace
- tighten loose assertions in hooks_intelligence

Co-authored-by: Oliver <oliver@spritesai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant