Add agentic-workflow infrastructure for code review and onboarding#66964
Draft
lewing wants to merge 5 commits into
Draft
Add agentic-workflow infrastructure for code review and onboarding#66964lewing wants to merge 5 commits into
lewing wants to merge 5 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds new agentic-workflow infrastructure under .github/ to support automated code review, test-quality evaluation, guidance self-improvement, PAT-pool token selection, and validation of skills/agents.
Changes:
- Introduces a shared Copilot PAT pool import + a scheduled PAT-health validation workflow.
- Adds new gh-aw workflows/skills/agents for PR code review, PR test evaluation, and “learn from PR” guidance updates.
- Adds CI workflows to validate skills/agents and post validation results back to PRs.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| PROPOSAL-README.md | Proposal/operational notes for introducing agentic workflows and onboarding patterns. |
| .github/AGENTIC-WORKFLOWS.md | Rubric/tenets and required structure for adding gh-aw workflows in this repo. |
| .github/workflows/shared/pat_pool.md | Shared gh-aw import job that selects a random PAT index from a secret pool. |
| .github/workflows/shared/pat_pool.README.md | Operator documentation for configuring and using the PAT pool import. |
| .github/workflows/validate-pat-pool.yml | Daily check that verifies PAT pool entries can successfully run a Copilot CLI request. |
| .github/workflows/code-review.md | New gh-aw PR-triggered dispatcher workflow that routes to area reviewer agents and aggregates results. |
| .github/skills/code-review/SKILL.md | Dispatcher skill defining routing rules + general review conventions fallback. |
| .github/agents/blazor-expert-reviewer.agent.md | Blazor/Razor/JSInterop-specific review agent with principles, dimensions, and posting process. |
| .github/workflows/evaluate-pr-tests.md | New gh-aw slash-command/dispatch workflow to evaluate tests added/changed in a PR (with a bash pre-gate). |
| .github/skills/evaluate-pr-tests/SKILL.md | Rubric for judging adequacy of tests in PRs (including Blazor-specific dimensions). |
| .github/skills/verify-tests-fail-without-fix/SKILL.md | Skill guidance for validating new tests fail when the production fix is reverted. |
| .github/workflows/learn-from-pr.md | New gh-aw workflow to analyze completed PRs and optionally open a draft PR with guidance updates. |
| .github/skills/learn-from-pr/SKILL.md | Analysis-only skill to extract reusable lessons from completed PRs and recommend guidance updates. |
| .github/agents/learn-from-pr.agent.md | Apply-mode agent that turns high/medium recommendations into a single draft PR updating guidance files. |
| .github/skills/assessing-breaking-changes/SKILL.md | Checklist/decision framework for compatibility and breaking-change risk in aspnetcore (with Blazor specifics). |
| .github/workflows/skill-validation.yml | Validates changed skills/agents on PRs (and all on push) using skill-validator from dotnet/skills nightly. |
| .github/workflows/skill-validation-comment.yml | workflow_run companion that posts/updates a PR comment summarizing skill-validator results. |
Comment on lines
+1
to
+5
| --- | ||
| description: Evaluates test quality, coverage, and appropriateness on PRs that add or modify tests in dotnet/aspnetcore | ||
| on: | ||
| slash_command: | ||
| name: evaluate-tests |
Comment on lines
+1
to
+6
| --- | ||
| description: "Analyze a completed PR for lessons learned about codebase conventions, agent failures, and reusable patterns. Use to keep .github/instructions/ and agentic guidance current as the codebase and contributor pool evolves." | ||
| on: | ||
| workflow_dispatch: | ||
| inputs: | ||
| pr_number: |
Comment on lines
+84
to
+87
| if [ "$STATE" != "OPEN" ]; then | ||
| echo "⏭️ PR #$PR_NUMBER is $STATE — skipping evaluation." | ||
| exit 1 | ||
| fi |
Comment on lines
+105
to
+108
| if [ -z "$TEST_FILES" ]; then | ||
| echo "⏭️ No test source files (.cs/.razor) found in PR diff. Nothing to evaluate." | ||
| exit 1 | ||
| fi |
Comment on lines
+222
to
+225
| if [ $valid -eq 0 ]; then | ||
| echo "::error::The PAT pool is empty — no PATs are available" | ||
| exit 1 | ||
| fi |
cc00e83 to
941f05a
Compare
Contributor
|
Hey @dotnet/aspnet-build, looks like this PR is something you want to take a look at. |
lewing
added a commit
to lewing/aspnetcore
that referenced
this pull request
Jun 5, 2026
Six unique findings from the Copilot reviewer (each posted twice across
two review passes):
1-3. Missing .lock.yml files for code-review, evaluate-pr-tests, learn-from-pr
→ Compiled via 'gh aw compile'. Lock files + .github/aw/actions-lock.json
delta now included.
4-5. evaluate-pr-tests.md gate exits 1 for legitimate skip cases (PR not OPEN;
no test files in diff), causing the workflow run to show as failed.
→ Changed to 'exit 0' with ::notice:: emission. The downstream agent
still activates and calls 'noop' when there's no work — one wasted
activation per skip is acceptable cost to avoid false CI failures.
Genuine infra failures (gh API errors) still exit 1.
6. validate-pat-pool.yml summary text said empty pool would 'fall back to
COPILOT_GITHUB_TOKEN' but the workflow exits 1 on empty pool.
→ Reworded the empty-pool summary to acknowledge both behaviors:
runtime falls back (correctly), but the validator surfaces empty as a
failure to prompt operators to seed the pool. Kept exit 1 — alerting
on empty pool is the desired behavior.
Additional fixes uncovered by compilation:
- learn-from-pr.md: dropped 'imports: shared/pat_pool.md' and the engine
PAT-pool case() expression. The shared pat_pool job needs a pre_activation
job that workflow_dispatch-only triggers don't auto-generate. learn-from-pr
is manually invoked, so default COPILOT_GITHUB_TOKEN is fine (same pattern
as cswin32-update.md).
- evaluate-pr-tests.md: changed workflow_dispatch input pr_number from
'required: true' to 'required: false' — slash_command triggers can't
enforce required manual inputs (gh aw compile rejects this combination).
- evaluate-pr-tests.md: removed 'bots: copilot-swe-agent[bot]' — gh aw warns
that bots + slash_command can cause a bot's comment starting with the
command to occupy the concurrency slot, blocking manual invocations.
SKILL.md improvements (informed by simulating the workflow against real
recently-opened PRs):
- evaluate-pr-tests/SKILL.md: added dimension D0 (Substantive value), weighted
highest. The rubric was rewarding tests that score well on style/convention
even when they duplicated pre-existing coverage or asserted non-existent
invariants. D0 leads the evaluation with 'what does this test exercise that
wasn't covered before, and does the assertion match the source?'
- Added an 'Assertion fabrication' check explicitly: agents must trace
assertions back to the production source line(s) and verify the relationship
before scoring D4 ✅. Documents four common failure modes (cargo-cult
invariant, convenient enumeration, tautology, self-referential setup) at
severity 🚩 above ⚠️ .
- Added 'Coverage-burst pattern recognition' for multi-author 'improve test
coverage' waves common with new-contributor onboarding bursts: extra
scrutiny to D0, look for 'ISSUE #N FIX' comments, don't let headline test
counts overstate real coverage gain, calibrate tone direct rather than
encouraging.
- Added dimension D9 (Consolidation opportunities) — recognize when 5+
near-identical [Fact] tests differing only in input value should be one
[Theory] with [InlineData] rows. Surfaced as the headline finding on a real
simulated PR (PR dotnet#67019, InputDate) where 14 tests collapse cleanly to 2.
- Step 1 now explicitly requires fetching the pre-existing version of each
changed test file (via merge-base) for duplicate detection, and fetching
the production source for assertion-fabrication checks.
This change adds a code-review surface, test-quality enforcement, self-improving guidance plumbing, and supporting infrastructure for agentic workflows in aspnetcore. Builds on patterns proven in other .NET org repos (runtime, msbuild, maui) and the cross-agent AGENTS.md convention. ## Files added (16 plus PROPOSAL-README.md reviewer notes) ### Cross-cutting infrastructure - .github/workflows/shared/pat_pool.md — shared PAT-pool import used by new agentic workflows (avoids duplicating the case() expression per workflow). - .github/workflows/shared/pat_pool.README.md — operator docs for the PAT pool. - .github/workflows/validate-pat-pool.yml — daily PAT health check that catches expired tokens before they break a real workflow. - .github/AGENTIC-WORKFLOWS.md — one-page rubric for adding new gh-aw workflows (tenets, required structure, references to public reference implementations). ### Code-review surface (Blazor-aware) - .github/workflows/code-review.md — thin dispatcher workflow. On PR open or sync, routes to area-specific reviewer agents in parallel and aggregates findings into a single review submission. - .github/skills/code-review/SKILL.md — routing table + general aspnetcore conventions fallback for areas without a dedicated agent. Follows the skill-as-shim pattern: the skill is for discoverability; the agent is the source of truth for content. - .github/agents/blazor-expert-reviewer.agent.md — 12 principles + 12 review dimensions covering render-mode parity, pre-rendering + PersistentComponentState, trim/AOT safety, disposal contracts, JS interop discipline, public API surface, server-circuit thread safety, accessibility, localization, Razor source generator, and hot-reload determinism. Four-wave review process (Discover → Validate → Post → Summary) with proof-or-disprove validation before posting to minimize false positives. Severity ladder: BLOCKING / MAJOR / MODERATE / MINOR. Inline review comments for diff-line findings via create_pull_request_review_comment; design-level concerns via add_comment. Never APPROVE (the agent must not count as a PR approval). ### Test-contract enforcement - .github/workflows/evaluate-pr-tests.md — slash-command (/evaluate-tests) and workflow_dispatch triggered. Pre-gates deterministically in bash on "no test files in diff = exit early" before the agent activates. - .github/skills/evaluate-pr-tests/SKILL.md — 8-dimension rubric for assessing whether tests added in a PR actually exercise the change. Includes Blazor-specific dimensions for render-mode coverage at the E2E level. Calls out the TestRenderer pattern (shared infra from src/Components/Shared/test/) as the canonical component-unit-test approach. Selenium is the incumbent E2E framework; prefer Playwright for new E2E projects (reference pattern in src/ProjectTemplates/test/Templates.Blazor.Tests/). - .github/skills/verify-tests-fail-without-fix/SKILL.md — reverts the production change in a PR, reruns the new tests, and verifies they fail. Strongest objective signal that a regression test exercises the bug it claims to fix. ### Compatibility & breaking-change discipline - .github/skills/assessing-breaking-changes/SKILL.md — blast-radius checklist, deprecation protocol, decision framework. aspnetcore- specific: API review board (not ChangeWave); render-mode parity as a compat dimension; wire-compat considerations for SignalR / DataProtection / Antiforgery / Authentication / PersistentComponentState; WASM trim-safety + NativeAOT regression checks. The Warnings-as- Errors rule (new warnings break consumers that set <TreatWarningsAsErrors>true</TreatWarningsAsErrors>). ### Self-improving guidance - .github/workflows/learn-from-pr.md — manual dispatch with apply: false (report only) or apply: true (open draft PR with applied guidance updates). allowed-files restricted to .github/. - .github/skills/learn-from-pr/SKILL.md — analysis pass: extract reusable lessons from a completed PR, classify them by destination (general copilot-instructions.md, per-area AGENTS.md, area instructions.md shim, skill enhancement, agent dimension addition), and prioritize High / Medium / Low. - .github/agents/learn-from-pr.agent.md — apply pass: takes the skill's High/Medium recommendations and opens a single draft PR with the matching edits. ### Skill validation - .github/workflows/skill-validation.yml — validates skills and agents on every PR touching .github/skills/** or .github/agents/**. Downloads the skill-validator binary from dotnet/skills releases, caches daily, smart-filters to only validate changed skills. - .github/workflows/skill-validation-comment.yml — workflow_run- triggered companion that posts results as a PR comment. Safe for fork PRs (never checks out PR code). ## Conventions adopted ### Per-area guidance: AGENTS.md is the canonical destination Follows the cross-agent AGENTS.md convention (https://agents.md) already used in this repo at src/Components/AGENTS.md and eng/common/AGENTS.md. Closest-to-the-edited-file wins; works with Copilot, Claude Code, Cursor, Codex, and others. When Copilot needs to auto-load guidance via applyTo: glob, the .github/instructions/<area>.instructions.md shim delegates to the AGENTS.md (the existing aspnetcore pattern). ### Test frameworks - Razor component unit logic uses the TestRenderer pattern from src/Components/Shared/test/, brought in via $(ComponentsSharedSourceRoot)test/**/*.cs. bUnit is not used in aspnetcore source. - Selenium is the incumbent E2E framework for components (src/Components/test/E2ETest/, imports $(SharedSourceRoot)E2ETesting/E2ETesting.props). For new E2E projects or surfaces, prefer Playwright (reference pattern in src/ProjectTemplates/test/Templates.Blazor.Tests/). Don't mix frameworks within an existing Selenium project. - Render-mode parity (Server / WASM / Auto) is proven at the E2E level — TestRenderer is in-process and does not model render modes. ## Not included (intentional) - .lock.yml files for the new workflows — must be regenerated with 'gh aw compile' before push. - Changes to existing workflows (community-pr-issue-check, issue-triage-agent, cswin32-update, browsertesting-deps-update, test-quarantine) — kept stable; augmentation is a separate follow-up. - .github/copilot/settings.json and .github/copilot-instructions.md — kept stable; the learn-from-pr agent will extend the latter over time. - .github/agents/aspnetcore-expert-reviewer.agent.md for non-Blazor areas — left as forward-reference for the maintainer to author. ## See PROPOSAL-README.md for reviewer notes including open questions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Six unique findings from the Copilot reviewer (each posted twice across
two review passes):
1-3. Missing .lock.yml files for code-review, evaluate-pr-tests, learn-from-pr
→ Compiled via 'gh aw compile'. Lock files + .github/aw/actions-lock.json
delta now included.
4-5. evaluate-pr-tests.md gate exits 1 for legitimate skip cases (PR not OPEN;
no test files in diff), causing the workflow run to show as failed.
→ Changed to 'exit 0' with ::notice:: emission. The downstream agent
still activates and calls 'noop' when there's no work — one wasted
activation per skip is acceptable cost to avoid false CI failures.
Genuine infra failures (gh API errors) still exit 1.
6. validate-pat-pool.yml summary text said empty pool would 'fall back to
COPILOT_GITHUB_TOKEN' but the workflow exits 1 on empty pool.
→ Reworded the empty-pool summary to acknowledge both behaviors:
runtime falls back (correctly), but the validator surfaces empty as a
failure to prompt operators to seed the pool. Kept exit 1 — alerting
on empty pool is the desired behavior.
Additional fixes uncovered by compilation:
- learn-from-pr.md: dropped 'imports: shared/pat_pool.md' and the engine
PAT-pool case() expression. The shared pat_pool job needs a pre_activation
job that workflow_dispatch-only triggers don't auto-generate. learn-from-pr
is manually invoked, so default COPILOT_GITHUB_TOKEN is fine (same pattern
as cswin32-update.md).
- evaluate-pr-tests.md: changed workflow_dispatch input pr_number from
'required: true' to 'required: false' — slash_command triggers can't
enforce required manual inputs (gh aw compile rejects this combination).
- evaluate-pr-tests.md: removed 'bots: copilot-swe-agent[bot]' — gh aw warns
that bots + slash_command can cause a bot's comment starting with the
command to occupy the concurrency slot, blocking manual invocations.
SKILL.md improvements (informed by simulating the workflow against real
recently-opened PRs):
- evaluate-pr-tests/SKILL.md: added dimension D0 (Substantive value), weighted
highest. The rubric was rewarding tests that score well on style/convention
even when they duplicated pre-existing coverage or asserted non-existent
invariants. D0 leads the evaluation with 'what does this test exercise that
wasn't covered before, and does the assertion match the source?'
- Added an 'Assertion fabrication' check explicitly: agents must trace
assertions back to the production source line(s) and verify the relationship
before scoring D4 ✅. Documents four common failure modes (cargo-cult
invariant, convenient enumeration, tautology, self-referential setup) at
severity 🚩 above ⚠️ .
- Added 'Coverage-burst pattern recognition' for multi-author 'improve test
coverage' waves common with new-contributor onboarding bursts: extra
scrutiny to D0, look for 'ISSUE #N FIX' comments, don't let headline test
counts overstate real coverage gain, calibrate tone direct rather than
encouraging.
- Added dimension D9 (Consolidation opportunities) — recognize when 5+
near-identical [Fact] tests differing only in input value should be one
[Theory] with [InlineData] rows. Surfaced as the headline finding on a real
simulated PR (PR dotnet#67019, InputDate) where 14 tests collapse cleanly to 2.
- Step 1 now explicitly requires fetching the pre-existing version of each
changed test file (via merge-base) for duplicate detection, and fetching
the production source for assertion-fabrication checks.
48e8a8b to
267802f
Compare
3 tasks
javiercn
reviewed
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change adds a code-review surface, test-quality enforcement, self-improving guidance plumbing, and supporting infrastructure for agentic workflows in aspnetcore. Builds on patterns proven in other .NET org repos (runtime, msbuild, maui) and the cross-agent AGENTS.md convention.
Files added (16 plus PROPOSAL-README.md reviewer notes)
Cross-cutting infrastructure
Code-review surface (Blazor-aware)
Test-contract enforcement
Compatibility & breaking-change discipline
Self-improving guidance
Skill validation
Conventions adopted
Per-area guidance: AGENTS.md is the canonical destination Follows the cross-agent AGENTS.md convention (https://agents.md) already used in this repo at src/Components/AGENTS.md and eng/common/AGENTS.md. Closest-to-the-edited-file wins; works with Copilot, Claude Code, Cursor, Codex, and others. When Copilot needs to auto-load guidance via applyTo: glob, the
.github/instructions/.instructions.md shim delegates to the AGENTS.md (the existing aspnetcore pattern).
Test frameworks
Not included (intentional)
{PR title}
Summary of the changes (Less than 80 chars)
Description
{Detail}
Fixes #{bug number} (in this specific format)