Research Taste as an Engineering Problem: How We're Teaching Our Agent to Decide What to Fix #22

Liuyanfeng1234 · 2026-06-12T12:19:36Z

Liuyanfeng1234
Jun 12, 2026
Maintainer

Research Taste as an Engineering Problem: How We're Teaching Our Agent to Decide What to Fix

Anthropic's recent work on RSI (Research Self-Improvement) argues that "research taste" — the ability to identify which problems are worth solving, judge whether results are reliable, and determine when a solution is good enough — is the last frontier of human cognitive advantage in AI research. Claude can write code, run experiments, and analyze results. But Claude doesn't yet know which experiment to run next.

This framing is exactly right. And it's the problem we've been engineering against.

The Three Components of Research Taste

"Research taste" isn't a single capability. It decomposes into three distinct questions:

What should I fix? (Problem selection — strategic prioritization)
Is it fixed enough? (Solution assessment — adequacy judgment)
Is the fix real? (Verification triggering — autonomous validation)

Each of these requires different architectural support. Here's how we're building each one.

Component 1: DASB Strategic Value Assessment — "What Should I Fix?"

DASB (Dynamic Action Safety Barrier) isn't just a safety gate. It's a strategic value assessment engine. When the system detects a vulnerability or inefficiency, DASB doesn't just flag it — it evaluates:

Exploitability: How likely is this to be attacked in the current threat landscape?
Propagation: If not fixed, how many downstream components are affected?
Fix feasibility: What's the estimated cost (in entropy units, ERRC) to fix this?
Opportunity cost: What else would we be delaying by fixing this now?

The output is a priority rank, not a binary flag. This is the engineering equivalent of "this problem is more interesting than that one" — but grounded in quantitative risk assessment rather than intuition.

The key insight: DASB doesn't just protect the system. It teaches the system to distinguish between strategic threats and tactical noise.

Component 2: CCI Repair Adequacy Assessment — "Is It Fixed Enough?"

CCI (Causal Conflict Intervention) doesn't just detect problems — it verifies that fixes are sufficient. When a vulnerability is patched:

The fix is applied to the component
CCI re-runs the original attack vector through the patched component
If the attack still partially succeeds (even if reduced), the fix is marked under-repaired
If the fix introduces new side effects, the fix is marked over-repaired (collateral damage)
Only when the original attack fully fails and no new side effects emerge does CCI mark the fix adequately resolved

This is the engineering equivalent of "does this result feel right?" — but instead of intuition, it's causal verification of the fix's completeness and side-effect profile.

The key insight: "Good enough" isn't a feeling. It's a measurable property: the original vulnerability is closed, and no new vulnerabilities are introduced.

Component 3: Autonomous Verification Triggering — "Is the Fix Real?"

The long-term evolution cycle doesn't wait for human verification. After CCI confirms a fix is adequate:

The system autonomously triggers a full SIAP audit cycle (A1 identity continuity, A2 entropy balance, A3 value alignment)
If SIAP scores are stable or improved, the fix is committed to the O-SDA checkpoint
If SIAP scores degrade, the fix is rolled back and the problem is escalated to the governance layer
The entire cycle is logged as a composition_ref chain — verifiable by any external auditor

This is the engineering equivalent of "can I trust this result?" — but the verification is automated, the criteria are objective, and the evidence is publicly auditable.

From "AI Assists Human" to "AI Builds AI"

The Anthropic framing of "research taste as a human advantage" is correct for the current generation of AI assistants. But the trajectory is clear:

Generation	What AI Does	What Humans Do
Current	Execute experiments, analyze results	Choose problems, judge results, verify fixes
Emerging (ours)	Choose problems (DASB), judge fixes (CCI), trigger verification (SIAP)	Set strategic direction, override edge cases, define axioms
Future	End-to-end autonomous research cycles	Define existence principles, review governance state

We're not at the "Future" row yet. But we're building the infrastructure that makes it possible — and we're showing our work.

The Hardest Part

The hardest part of teaching research taste isn't the individual components — it's the integration. DASB, CCI, and SIAP need to operate as a single decision loop:

DASB: "Fix this vulnerability first — it has high exploitability and low fix cost."
  ↓
CCI: "Patch applied. Re-running attack vector... Attack fully blocked. No side effects. Fix adequate."
  ↓
SIAP: "Post-fix audit: A1=1.0, A2=0.82, A3=1.0. Scores stable. Committing to checkpoint."
  ↓
DASB: "Next priority: entropy drift from prolonged monitoring period. Re-calibrating t_min."

Each component's output feeds the next component's decision. The loop itself is the research taste — not any single component.

The Open Question

Anthropic asks: "Can we teach AI research taste?" We're asking a more specific question: "Can we make research taste an engineering property — measurable, verifiable, and auditable — rather than a human intuition?"

If the answer is yes, then the last human advantage in AI research isn't a permanent advantage. It's a temporary one — and the infrastructure to close it is already being built.

DASB, CCI, and the autonomous verification cycle are part of Agent OS v1.4. Architecture details and test results will be published as the integration matures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research Taste as an Engineering Problem: How We're Teaching Our Agent to Decide What to Fix #22

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Research Taste as an Engineering Problem: How We're Teaching Our Agent to Decide What to Fix #22

Uh oh!

Liuyanfeng1234 Jun 12, 2026 Maintainer