Reduce code-review skill token usage (~29% smaller) by danmoseley · Pull Request #126779 · dotnet/runtime

danmoseley · 2026-04-10T23:52:11Z

Note

This PR description was generated with the assistance of Copilot.

Reduce code-review skill token usage (~29% smaller)

We've been seeing several 20-minute timeouts during code review runs. To my knowledge this is a hard limit for GitHub agentic workflows. The code-review SKILL.md is the largest skill file in dotnet/runtime at ~68KB (~17K tokens), and reducing its size should help the model work more efficiently within the available time. This PR removes ~20KB (~5K tokens) of low-value content without losing any review rules or procedures.

Changes

1. Remove developer quote examples (−19KB)

The skill contained ~150 indented blockquote lines (> "...") with real maintainer review quotes used as phrasing examples under each rule bullet. These illustrated how to phrase feedback but did not define what to check. All actual review rules, severity guidance, and actionable instructions are preserved.

2. Extract API Approval Verification into a separate file (−3KB from always-loaded context)

The API Approval Verification procedure is a self-contained sub-procedure only needed when a PR introduces new public API surface. It is now in api-approval-check.md alongside SKILL.md, loaded on demand.

Step 0 and Step 2 both contain explicit MUST-load instructions with the exact file path
A fail-closed clause ensures that if the file cannot be loaded, the agent reports an error and blocks the review rather than silently skipping verification
The extracted file is fully self-contained with all three verification steps

3. Tweaked one line to note that always initialize is a managed only rule

Impact

Metric	Before	After	Change
SKILL.md size	68.9 KB	48.7 KB	−29.3%
Always-loaded tokens	~17.2K	~12.2K	−5K
Review rules/procedures	All	All	None removed
API verification	Inline	On-demand file	Loaded when needed

Context: skill sizes in dotnet/skills

For reference, the 78 SKILL.md files in dotnet/skills average ~10.8KB (~2,700 tokens), with the largest at ~36KB (~9,000 tokens). The runtime code-review skill at 48.7KB post-trim is still larger than any of those, but it covers a comprehensive multi-step review procedure rather than a single-purpose task.

Remove 150 indented blockquote lines containing real maintainer quotes used as examples under each review guideline bullet point. The bullet points already state the rules clearly; the quotes add ~18.8 KB (~4,700 tokens) of low-signal content that doesn't improve LLM instruction-following in a dedicated skill context. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move the API Approval Verification procedure (~3KB / ~750 tokens) from SKILL.md into api-approval-check.md. This section is only needed when a PR introduces new public API surface, so it is now loaded on demand rather than always present in the skill context. Both Step 0 and Step 2 references are updated with explicit MUST-load instructions pointing to the extracted file path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR reduces the always-loaded size of the code-review skill documentation to help avoid timeouts during automated review runs, while keeping the same review rules and procedures by removing low-value quoted examples and moving the API approval procedure to an on-demand file.

Changes:

Removed large sets of maintainer-quote examples to reduce SKILL.md size.
Extracted “API Approval Verification” into a standalone, on-demand markdown file.
Updated SKILL.md to require loading the extracted procedure when new public API surface is detected, including a fail-closed instruction if the file can’t be loaded.

Show a summary per file

File	Description
.github/skills/code-review/SKILL.md	Removes quote examples and updates API approval instructions to reference the extracted procedure file.
.github/skills/code-review/api-approval-check.md	New standalone API approval verification procedure to be loaded only when relevant.

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 0

danmoseley · 2026-04-10T23:58:38Z

Other extractions are possible eg., parts related to native code/runtime could be extracted to not load for PR's that don't touch those things. But this is the biggest improvement, no need to churn more without reason.

Native C/C++ code must explicitly initialize fields and locals — the JIT coding conventions doc says the opposite of CA1805. Add scope qualifier to prevent misapplication to native code reviews. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-11T00:26:10Z

🤖 Copilot Code Review — PR #126779

Note

This review was generated by Copilot.

Holistic Assessment

Motivation: This PR reduces token bloat in the code-review skill by removing ~4,700 tokens of illustrative maintainer quotes, extracting the API approval procedure for on-demand loading (~750 tokens saved for non-API PRs), and fixing a scoping ambiguity in the CA1805 rule that could cause false positives on native code. All three changes are well-motivated and address real issues with AI skill consumption.

Approach: The approach is sound — the quotes were redundant with the rule statements, the API procedure extraction uses clear MUST-load semantics with a defined fallback, and the CA1805 clarification is precise and accurate.

Summary: ✅ LGTM. All three changes are clean, well-scoped, and improve the skill document without losing any rule content.

Detailed Findings

✅ Quote Removal — No rule content lost

Verified that all 150+ removed blockquote lines were purely illustrative quotes under existing rule bullets. Compared the rule bullets between the original (164) and new (156) SKILL.md — the 8 missing bullets are exactly the items from the extracted API Approval Verification section, not lost rules. Every review guideline is preserved.

✅ API Approval Extraction — Content integrity preserved

The extracted api-approval-check.md faithfully reproduces the original "API Approval Verification" section content. Both Step 0 (line 35) and Step 2 (line 57) references are updated with explicit MUST-load instructions pointing to the file path, and a fallback error is defined if the file cannot be loaded. The reference in the API Design section (line 285) is also updated. No broken internal anchor links remain anywhere in the repository.

✅ CA1805 Clarification — Correct and necessary

The original "Do not initialize fields to default values" was ambiguous across managed/native boundaries. The new wording ("managed fields") with the explicit parenthetical exception for native C/C++ code prevents false positives. This aligns with the existing native code rules in the same document (e.g., "Zero-initialize arrays and buffers that may be partially used" under Native Code).

✅ Markdown quality — No trailing whitespace

Both modified and new files have no trailing whitespace, consistent with the repo's markdown conventions.

Generated by Code Review for issue #126779 · ◷

danmoseley · 2026-04-11T00:29:07Z

Validated separately that all value in the quotes was already in the bullets. Nothing lost of substance relevant to the agent.

> [!NOTE] > This PR was developed with Copilot assistance based on analysis of workflow run logs and duration data. ## Problem ~20% of custom code review agent runs hit the 20-minute workflow timeout. Analysis of all 43 timeout runs from the last 1000 workflow executions shows: - **93% of timeouts** are caused by GPT-5.4 sub-agents that never return - GPT-5.4 is present in **100% of timeout runs** (24/24 checked in detail) - GPT-5.2-only runs have **0 timeouts** in 6+ successful runs - 86% of timed-out runs had already posted the review — they time out waiting for hung sub-agents - Each timeout causes the agent job to fail, making the overall workflow **red in CI** (even though the `conclusion` job succeeds) — PR authors must manually rerun The current SKILL.md rule "pick the highest version number" causes gpt-5.4 to always be selected when available. ## Changes (SKILL.md only) 1. **Block gpt-5.4** — it has known reliability issues. Recommend `gpt-5.3-codex` as the GPT-family pick instead. If that also exhibits hangs, we can block the GPT family entirely with no expected quality loss. 2. **Exit after posting** — the agent was lingering 2-3 minutes after successfully posting the review comment, waiting for hung sub-agents. Now it exits immediately once the comment is visible. 3. **Reduce max sub-agents from 4 to 3** — with only 2-3 model families available in practice, 4 was never fully utilized. ## What this does NOT change - The 10-minute sub-agent timeout instruction (already in place, appropriate for agents that do return) - The overall workflow `timeout-minutes: 20` (hardcoded in the compiled `.lock.yml`) - The review methodology, severity definitions, or quality bar - Any CCR (Copilot Code Review) configuration ## Expected impact - Eliminates the dominant timeout cause (GPT-5.4 hangs) - Saves 2-3 min per run from exit-after-post - No expected quality regression: GPT contributed unique blocking findings in 0% of sampled runs ## Data | Metric | Value | |--------|-------| | Runs analyzed | 1000 workflow runs, 420 non-skipped, 218 with CLI data | | Timeout rate | 19.7% (43/218) | | GPT-5.4 in timeouts | 100% (24/24 detailed) | | GPT-5.2 timeouts | 0% (0/6+ successful GPT-5.2 runs) | | Reviews with GPT-unique findings | <8% | | GPT-only blocking bugs found | 0 | | MCP add_comment missing | 12/43 timeout runs (~6% of all runs) — platform issue, not addressed here | ## Why not increase the 20-minute timeout? The GPT-5.4 sub-agent hangs indefinitely — there is no evidence it would eventually complete if given more time. Increasing the timeout would just delay the inevitable and waste more compute. ## Duration distribution (218 runs with CLI data) | Bucket | Runs | % | |--------|------|---| | 0–2m | 18 | 8.3% | | 2–4m | 24 | 11.0% | | 4–6m | 29 | 13.3% | | 6–8m | 25 | 11.5% | | 8–10m | 26 | 11.9% | | 10–12m | 21 | 9.6% | | 12–14m | 9 | 4.1% | | 14–16m | 8 | 3.7% | | 16–18m | 9 | 4.1% | | 18–20m | 6 | 2.8% | | 20m+ (timeout) | 43 | 19.7% | The bimodal distribution (healthy hump at 4–10m, spike at 20m wall) confirms these are hangs, not slow completions. Related: #126779 (not a fix for that, but general efficiency improvement to the review agent) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley and others added 2 commits April 10, 2026 16:29

Copilot AI review requested due to automatic review settings April 10, 2026 23:52

github-actions Bot added the area-skills Agent Skills label Apr 10, 2026

danmoseley requested a review from jeffhandley April 10, 2026 23:52

dotnet-policy-service Bot assigned danmoseley Apr 10, 2026

Copilot started reviewing on behalf of danmoseley April 10, 2026 23:53 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

danmoseley commented Apr 11, 2026

View reviewed changes

Comment thread .github/skills/code-review/SKILL.md

danmoseley enabled auto-merge (squash) April 11, 2026 00:29

danmoseley requested a review from hoyosjs April 11, 2026 01:31

jeffhandley approved these changes Apr 11, 2026

View reviewed changes

danmoseley merged commit 25f7804 into main Apr 11, 2026
25 checks passed

danmoseley deleted the remove-skill-quotes branch April 11, 2026 02:17

danmoseley mentioned this pull request Apr 11, 2026

Fix code review agent timeouts caused by gpt-5.4 #126783

Merged

dotnet-maestro Bot mentioned this pull request Apr 15, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#6035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce code-review skill token usage (~29% smaller)#126779

Reduce code-review skill token usage (~29% smaller)#126779
danmoseley merged 3 commits intomainfrom
remove-skill-quotes

danmoseley commented Apr 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

danmoseley commented Apr 10, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 11, 2026

Uh oh!

danmoseley commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danmoseley commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!