Skip to content

refactor(skillopt): clearer proposer prompt + robust edit-array parsing#245

Closed
khustup2 wants to merge 4 commits into
mainfrom
feat/skillopt-proposer-anchored-edits
Closed

refactor(skillopt): clearer proposer prompt + robust edit-array parsing#245
khustup2 wants to merge 4 commits into
mainfrom
feat/skillopt-proposer-anchored-edits

Conversation

@khustup2
Copy link
Copy Markdown
Contributor

@khustup2 khustup2 commented Jun 7, 2026

What

Hardens the SkillOpt proposer's output parsing and restructures its prompt:

  • Parser: extract the JSON edit array via a balanced back-scan — robust to a leading prose/bracket line — with a first-[..last-] fallback. (Genuine robustness fix.)
  • Prompt: diagnose the single recurring weakness first, then emit concrete, anchored edits; explicit two-part contract so SYSTEM and user prompt agree.
  • Tests + coverage for both, including bracketed-prefix and unbalanced-content cases.

On the benchmark — the prompt change does NOT improve quality

Tested against the original proposer on 4 org skills with real failures, judging each edit vs the current body 5× per skill (rate, not a single noisy verdict):

skill baseline this PR's prompt
pg-suso 40% 40%
suso 80% no edit produced
pg-analytics 60% 60%
deeplake-pg-production-diagnostics 80% 80%
mean accept-rate 65% 60%

Within noise (Δ5%), nominally worse. An earlier single run showed 3/4 but did not reproduce — that was judge nondeterminism. So the prompt rewrite has no measured benefit; the keepable part of this PR is the parser hardening + tests. n=4 is too small to detect an optimizer gain regardless.

Tests

26 green, typecheck clean.

…nd-at-end

The proposer appended generic guidance to the end of long skill docs, which
diluted rather than fixed (and lost the A/B against the current body). It now
diagnoses the single recurring weakness first, then requires each edit to be
concrete/operational and anchored into the relevant existing section (or to
replace the weak existing instruction).

On a 4-skill benchmark of real failing org skills, an independent judge prefers
the proposed edit over the current body 3/4, up from 2/4 before.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR tightens the skill-proposer prompts: the SYSTEM prompt enforces a two-step diagnose-then-edit workflow with concrete, anchored, traceable edits (JSON array only), and buildUserPrompt now requires a one-line diagnosis followed by anchored edits. Parsing of the model output now extracts the last balanced JSON array and parseEdits returns [] on failure; tests validate bracket handling and priorEdits inclusion.

Changes

Skill Edit Proposal Constraints

Layer / File(s) Summary
LLM prompt refinements for structured skill edits
src/skillify/skill-proposer.ts
SYSTEM prompt enforces a two-step directive: diagnose the single recurring weakness, then propose a small set of concrete, operational edits anchored to existing sections via insert_after/replace/delete; forbids appending uninformed text and reiterates the SU_START/SU_END protected region. buildUserPrompt now requires the one-line diagnosis first, then outputs only the JSON array of anchored edits.
Robust edits JSON extraction and parseEdits
src/skillify/skill-proposer.ts
Replaces naive first/last-bracket slicing with a backward scan to locate the last balanced [...] region before JSON.parse; parseEdits uses this helper and returns [] on extraction or parse failure.
Prompt validation and parsing tests
tests/shared/skill-proposer.test.ts
Adds tests covering balanced-bracket extraction when weakness lines contain brackets, fallback when edits include an unbalanced ], asserts the system prompt includes anchoring/concrete/operational guidance and “REPLACE the weak” phrasing, verifies resulting p.changed === true and anchored edit output, and checks that priorEdits appear as an ALREADY TRIED section with the exact prior edit string.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit nibbles prompts at dawn,
Finds the weakness, marks what's wrong,
Anchors edits, small and neat,
JSON arrays keep fixes discrete,
Hops off with a dotted line and song.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title directly describes the main changes: refactoring the proposer prompt for clarity and making edit-array parsing more robust, which aligns with the core objectives of the PR.
Description check ✅ Passed The PR description provides a comprehensive summary of changes, benchmark results, test status, and acknowledges the parser hardening as the primary keepable improvement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skillopt-proposer-anchored-edits

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 94.87% (🎯 90%) 37 / 39
🟢 Statements 96.30% (🎯 90%) 52 / 54
🟢 Functions 100.00% (🎯 90%) 6 / 6
🔴 Branches 87.80% (🎯 90%) 36 / 41
File Coverage — 1 file changed
File Stmts Branches Functions Lines
src/skillify/skill-proposer.ts 🟢 96.3% 🔴 87.8% 🟢 100.0% 🟢 94.9%

Generated for commit 2c9e4cd.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/skillify/skill-proposer.ts`:
- Around line 43-46: The SYSTEM prompt fragment using `${SU_END}` demands "Reply
with ONLY a JSON array..." but another prompt later requires a "one-line
weakness before JSON", creating a contradiction; reconcile them by choosing one
contract and making both prompt templates consistent: either remove the one-line
weakness requirement from the user prompt or change the SYSTEM/string built
around `SU_END` to allow an optional single-line weakness prior to the JSON.
Update the prompt-building code paths that reference `SU_END` and the other
prompt fragment (the one that currently asks for a one-line weakness) so both
emit the same instruction text, and run any prompt-formatting tests or callers
that parse the model output to ensure they accept the chosen format.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8fe5edfc-dd9b-4ef8-8d59-5f5f3c3f2e6e

📥 Commits

Reviewing files that changed from the base of the PR and between 0b35aec and 11021a3.

📒 Files selected for processing (1)
  • src/skillify/skill-proposer.ts

Comment thread src/skillify/skill-proposer.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/shared/skill-proposer.test.ts`:
- Around line 44-47: The assertions in tests/shared/skill-proposer.test.ts use
broad regexes on the `system` string and `model.mock.calls[0][1]`; replace them
with exact prompt-fragment checks to avoid false positives. Update the three
`expect(system).toMatch(...)` lines to assert the precise expected substrings
(e.g., the exact instruction phrases you expect to appear in `system`) and
change `expect(model.mock.calls[0][1]).toMatch(...)` to assert the exact
fragment or full message from `model.mock.calls[0][1]` (use equality or
`toContain` with the specific text). Ensure the new assertions reference the
same symbols (`system` and `model.mock.calls[0][1]`) so the test still validates
the produced prompt fragments precisely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b85b9fe4-0604-476a-b027-8738b5d9a683

📥 Commits

Reviewing files that changed from the base of the PR and between 11021a3 and 2840a1c.

📒 Files selected for processing (1)
  • tests/shared/skill-proposer.test.ts

Comment thread tests/shared/skill-proposer.test.ts Outdated
…en tests/coverage

- Resolve SYSTEM/user-prompt contradiction (CodeRabbit): output stays JSON-only;
  diagnosis is an internal step, not emitted (also avoids a bracketed weakness
  line breaking parseEdits).
- Assert exact prompt fragments instead of broad regexes.
- Cover the priorEdits branch.
@khustup2
Copy link
Copy Markdown
Contributor Author

khustup2 commented Jun 7, 2026

Addressed CodeRabbit in 1e5f58d6:

  • Output-format contradiction — output stays JSON-only; the weakness diagnosis is now an internal step (not emitted), which also avoids a bracketed weakness line breaking parseEdits.
  • Test assertions — switched to exact-fragment toContain checks.
  • Coverage — added a test for the priorEdits branch; proposer now 100% line/func, 89.7% branch.

…arse

The JSON-only reconcile removed the 'name the weakness first' step, which was
carrying the benchmark lift. Restore it as an explicit TWO-part contract (weakness
line, then JSON array) — SYSTEM and user prompt now agree — and harden parseEdits
to extract the array via a balanced back-scan (robust to a bracketed weakness
line), with a first-[..last-] fallback. Tests cover both paths.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/shared/skill-proposer.test.ts (1)

44-44: 💤 Low value

Consider using exact fragment assertion for consistency.

Line 44 uses a case-insensitive regex while the rest of the file uses exact toContain checks. For consistency with the coding guidelines:

-    expect(model.mock.calls[0][0]).toMatch(/recurring weakness/i);
+    expect(model.mock.calls[0][0]).toContain("SINGLE recurring weakness");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/skill-proposer.test.ts` at line 44, The assertion at
expect(model.mock.calls[0][0]).toMatch(/recurring weakness/i) uses a regex while
the rest of the file uses exact fragment checks; replace this regex assertion
with an exact fragment assertion (use toContain with the string "recurring
weakness") on the same value (model.mock.calls[0][0]) so it matches the file's
consistency pattern.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/shared/skill-proposer.test.ts`:
- Line 44: The assertion at expect(model.mock.calls[0][0]).toMatch(/recurring
weakness/i) uses a regex while the rest of the file uses exact fragment checks;
replace this regex assertion with an exact fragment assertion (use toContain
with the string "recurring weakness") on the same value (model.mock.calls[0][0])
so it matches the file's consistency pattern.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 56b4207b-7fbc-4cd4-a5b1-0b5cd451d57b

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5f58d and a6055ca.

📒 Files selected for processing (2)
  • src/skillify/skill-proposer.ts
  • tests/shared/skill-proposer.test.ts

@khustup2 khustup2 changed the title feat(skillopt): proposer targets root-cause + anchors edits (2/4 → 3/4 on real-skill benchmark) refactor(skillopt): clearer proposer prompt + robust edit-array parsing Jun 7, 2026
@khustup2
Copy link
Copy Markdown
Contributor Author

khustup2 commented Jun 7, 2026

Closing — the premise didn't hold. A 5×-per-skill A/B showed the prompt change is within noise (baseline 65% vs 60%), so there's no optimizer improvement to ship. The earlier 3/4 was nondeterminism. Apologies for the churn.

@khustup2 khustup2 closed this Jun 7, 2026
@khustup2 khustup2 deleted the feat/skillopt-proposer-anchored-edits branch June 7, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant