dotnet-test: add unit-under-test + behaviors quality cue to code-testing-generator by YuliiaKovalova · Pull Request #646 · dotnet/skills

YuliiaKovalova · 2026-05-13T10:17:45Z

What

Adds a quality cue to the orchestrator agent (code-testing-generator) and small companion changes in code-testing-implementer and code-testing-fixer that ask each subagent to extract the unit-under-test contract and the behaviors to test before writing or fixing tests.

The cue is orchestrator-level (a verification gate after dispatch) with bounded re-dispatch, not new content rules baked into planner/researcher/implementer prompts. Same structural pattern as the existing research-cue.

Why

CTA was losing rubric quality (nice%, agg%) to the vanilla CLI baseline despite matching coverage. The proximate cause (verified via per-instance patch inspection) was that the planner under-specified what the test was supposed to demonstrate, and the implementer wrote weak assertions (e.g. assert.NotNil(err) instead of assert.Contains(err.Error(), "must not be empty")).

This cue makes the contract explicit before code is written, without prescribing assertion style — so it's framework-agnostic and not benchmark-specific.

Experiment branch to isolate the impact of "invoke prompted subagents more often" from the impact of "make those subagents do richer work." Same baseline as dev/ykovalova/cta-prompt-tuning (main = 66628b6), but strips out every content/quality rule and keeps only the dispatch plumbing. Comparison branch: dev/ykovalova/cta-prompt-tuning (HEAD: bd530be) which contains both the dispatch mechanics AND content/quality rules. Files modified (3 vs 5 in cta-prompt-tuning): - code-testing-generator.agent.md +110 lines - code-testing-implementer.agent.md +11 lines - code-testing-fixer.agent.md +1 line - code-testing-researcher.agent.md UNTOUCHED (baseline) - code-testing-planner.agent.md UNTOUCHED (baseline) KEPT (invocation / dispatch mechanics): Generator: - Rule 1: every task() call MUST use agent_type "dotnet-test:code-testing-..." (without this, calls dispatch generic built-ins and never reach the named CTA agents) - Rule 2: routing table -- which named agent for which job - Rule 3: prefer one named-agent dispatch over many tool calls - Rule 4: orchestrator MUST NOT edit/create test files itself (forces implementer dispatch) - Rule 5: orchestrator MUST NOT run builds/tests via terminal (forces builder/tester dispatch) - Rule 6: every run MUST dispatch the planner (no exceptions for "small" scope; Direct still goes through planner) - Rule 7: every build/test failure MUST dispatch the fixer - Step 1b: mandatory initial researcher dispatch (every strategy) - Direct strategy rewritten: dispatches planner -> implementer -> builder -> tester -> fixer -> linter (was "Skip Steps 3-5, write tests inline") - All Step 3/4/5/6/7/8/9 dispatches converted from runSubagent({agent:...}) to task({ agent_type: "dotnet-test:code-testing-...", name:..., prompt:...}) - Step 9 validator dispatch (forces builder dispatch for cleanup) - Steps 6/7 mandatory builder/tester dispatch wrapper Implementer: - Section 5: "you MUST dispatch fixer for build errors" + no-inline-edit block (forces fixer dispatch on build failures) - Section 6: "you MUST dispatch fixer for test failures" + no-inline-edit block (forces fixer dispatch on test failures) - Section 7: "Format Code (mandatory if a lint command exists)" (was "Optional"; mandatory firing of linter) - Rule 6: never declare SUCCESS while build/tests fail (gates SUCCESS on fixer dispatch) - Rule 7: no inline test-file edits between failed dispatch and fixer Fixer: - Frontmatter description widened to advertise handling of failing tests (without this, the orchestrator's routing logic does not select the fixer for test failures, so even Rule 7's mandate produces no firing -- this is the change that took fixer firing from 0.00/inst to 0.39/inst in earlier iterations) DROPPED (content / quality rules -- in cta-prompt-tuning, NOT here): Generator: - Test-strength rules embedded in implementer dispatch prompt - Test-design rules embedded in implementer dispatch prompt (OFAT, mutation self-check, never mock subject under test) - File-location rules embedded in implementer dispatch prompt - TARGET ENTITIES / PHASE CHECKLIST / TEST TRACEABILITY blocks in implementer dispatch prompt - CHECKLIST format spec in planner dispatch prompt - Step 9 validator's detailed cleanup classification Implementer: - Section 4b "Verify CHECKLIST coverage" pre-completion check - Section 8 "CHECKLIST COVERAGE" report block - "Honor the CHECKLIST" rule Fixer: - "Process -- Failing Tests" section (5-step diagnosis flow) - All anti-weakening / anti-skipping rules - "Re-derive expected from production source" guidance Planner: - CHECKLIST format ("one item per TARGET BEHAVIOR, Source/Variants/ Expected mandatory") - "Test name from research.md conventions" rule - "At least 2 phases" rule Researcher: - Section 8 "Extract Local Test Naming & Style Conventions" - TARGET ENTITIES / TARGET BEHAVIORS / TEST INFRASTRUCTURE structure in research.md - Test naming pattern extraction WHAT THE SUBAGENTS WILL ACTUALLY DO: The researcher / planner / implementer / fixer all operate at baseline behavior -- they receive the same prompts they receive in the upstream "vanilla" runs. The only difference vs vanilla is that the orchestrator ACTUALLY DISPATCHES THEM (where vanilla often inlines the work or skips sub-agent dispatch entirely). EXPECTED COMPARISON: If quality on this branch is similar to or higher than dev/ykovalova/ cta-prompt-tuning (bd530be), then "more dispatches" is the dominant quality lever and the content/quality rules in cta-prompt-tuning are adding marginal or noise-level value. If quality on this branch is materially lower than cta-prompt-tuning, then the content/quality rules are doing the heavy lifting and the dispatch mechanics alone are insufficient. If quality on this branch matches or exceeds vanilla but trails cta-prompt-tuning, then the dispatch mechanics provide a baseline lift and the content rules add an incremental quality layer on top. Rubber-duck check passed (validated dispatch-vs-content classification; fixer frontmatter is routing metadata not a runtime gate; surviving dispatch prompts contain no dangling references to removed CHECKLIST / TARGET ENTITIES / TEST STRENGTH / naming-convention concepts). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Updates the dotnet-test CTA agent prompt set to improve test-quality outcomes by making the unit-under-test contract and behaviors explicit earlier in the pipeline, and by tightening dispatch/verification discipline across the orchestrator and sub-agents.

Changes:

Add orchestrator-level dispatch discipline rules and a new “unit-under-test + behaviors” verification gate in code-testing-generator.
Strengthen “no inline band-aid fixes” guidance in code-testing-implementer (mandatory fixer dispatch on failures; lint/format expectation).
Expand code-testing-fixer metadata to include failing-test assertion correction (in addition to compilation errors).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
plugins/dotnet-test/agents/code-testing-generator.agent.md	Adds dispatch discipline rules and a researcher verification gate intended to improve test intent/behavior specificity.
plugins/dotnet-test/agents/code-testing-implementer.agent.md	Tightens implementer behavior on build/test failures (must dispatch fixer; no inline edits after failures) and makes linting conditional-mandatory.
plugins/dotnet-test/agents/code-testing-fixer.agent.md	Updates fixer front-matter to explicitly include failing-test fixes.

Comments suppressed due to low confidence (1)

plugins/dotnet-test/agents/code-testing-generator.agent.md:165

Step 1b already mandates an initial researcher dispatch that writes .testagent/research.md, but Step 3 then starts another researcher phase that also writes to .testagent/research.md. This duplicates work and can overwrite the contract/behaviors you just verified; clarify the flow by removing one of these steps or making Step 3 conditional / reference Step 1b’s output instead of re-dispatching unconditionally.

### Step 3: Research Phase

```text
task({
  agent_type: "dotnet-test:code-testing-researcher",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Step 1b prompt: explicitly request unit-under-test (file:line) and behaviors so the verification gate rarely needs re-dispatch. - Step 3: rename to 'Deep Research Phase', mark skipped for Direct strategy, and switch from overwriting research.md to extending it (no double-research). - Step references: update '6-9' -> '6-10' and 'Step 9' -> 'Steps 9-10' in the strategy table and the All-strategies-MUST line, since reporting is Step 10. - Step 6 builder prompt: drop '*.sln' glob (could expand to multiple args); use 'dotnet build --no-incremental' (auto-discovers .sln) per dotnet.md. - Step 9: stop overloading the builder agent; perform diff/cleanup directly in the orchestrator (Rule 5 forbids inline build/test, not git/fs hygiene). - Fixer agent: update mission text to cover failing tests and assertion correction (front-matter description already mentioned this; body now matches), with explicit no-Ignore/no-Skip/no-production-rewrite guardrails. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-05-13T11:36:29Z

/evaluate

github-actions · 2026-05-13T11:38:01Z

⏭️ No skills to evaluate — no changed skills with tests were found in this PR. View workflow run

YuliiaKovalova and others added 2 commits May 11, 2026 11:21

CTA: extract unit-under-test + behaviors quality cue (no caps)

0ae8cfd

YuliiaKovalova marked this pull request as ready for review May 13, 2026 10:20

YuliiaKovalova requested review from JanKrivanek and Copilot May 13, 2026 10:20

Copilot started reviewing on behalf of YuliiaKovalova May 13, 2026 10:21 View session

Merge branch 'main' into dev/ykovalova/cta-quality-cue

a7aef2b

Copilot AI reviewed May 13, 2026

View reviewed changes

Evangelink approved these changes May 13, 2026

View reviewed changes

YuliiaKovalova removed the request for review from JanKrivanek May 13, 2026 11:34

Evangelink merged commit f1b09eb into dotnet:main May 13, 2026
37 checks passed

github-actions Bot mentioned this pull request May 14, 2026

🏥 Repository Health Dashboard #288

Open

YuliiaKovalova mentioned this pull request May 14, 2026

Revert "dotnet-test: add unit-under-test + behaviors quality cue to code-testing-generator" #651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dotnet-test: add unit-under-test + behaviors quality cue to code-testing-generator#646

dotnet-test: add unit-under-test + behaviors quality cue to code-testing-generator#646
Evangelink merged 4 commits into
dotnet:mainfrom
YuliiaKovalova:dev/ykovalova/cta-quality-cue

YuliiaKovalova commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Evangelink commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YuliiaKovalova commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Evangelink commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YuliiaKovalova commented May 13, 2026 •

edited

Loading