Reduce code-review skill token usage (~29% smaller)#126779
Conversation
Remove 150 indented blockquote lines containing real maintainer quotes used as examples under each review guideline bullet point. The bullet points already state the rules clearly; the quotes add ~18.8 KB (~4,700 tokens) of low-signal content that doesn't improve LLM instruction-following in a dedicated skill context. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the API Approval Verification procedure (~3KB / ~750 tokens) from SKILL.md into api-approval-check.md. This section is only needed when a PR introduces new public API surface, so it is now loaded on demand rather than always present in the skill context. Both Step 0 and Step 2 references are updated with explicit MUST-load instructions pointing to the extracted file path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR reduces the always-loaded size of the code-review skill documentation to help avoid timeouts during automated review runs, while keeping the same review rules and procedures by removing low-value quoted examples and moving the API approval procedure to an on-demand file.
Changes:
- Removed large sets of maintainer-quote examples to reduce SKILL.md size.
- Extracted “API Approval Verification” into a standalone, on-demand markdown file.
- Updated SKILL.md to require loading the extracted procedure when new public API surface is detected, including a fail-closed instruction if the file can’t be loaded.
Show a summary per file
| File | Description |
|---|---|
| .github/skills/code-review/SKILL.md | Removes quote examples and updates API approval instructions to reference the extracted procedure file. |
| .github/skills/code-review/api-approval-check.md | New standalone API approval verification procedure to be loaded only when relevant. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 0
|
Other extractions are possible eg., parts related to native code/runtime could be extracted to not load for PR's that don't touch those things. But this is the biggest improvement, no need to churn more without reason. |
Native C/C++ code must explicitly initialize fields and locals — the JIT coding conventions doc says the opposite of CA1805. Add scope qualifier to prevent misapplication to native code reviews. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 Copilot Code Review — PR #126779Note This review was generated by Copilot. Holistic AssessmentMotivation: This PR reduces token bloat in the code-review skill by removing ~4,700 tokens of illustrative maintainer quotes, extracting the API approval procedure for on-demand loading (~750 tokens saved for non-API PRs), and fixing a scoping ambiguity in the CA1805 rule that could cause false positives on native code. All three changes are well-motivated and address real issues with AI skill consumption. Approach: The approach is sound — the quotes were redundant with the rule statements, the API procedure extraction uses clear MUST-load semantics with a defined fallback, and the CA1805 clarification is precise and accurate. Summary: ✅ LGTM. All three changes are clean, well-scoped, and improve the skill document without losing any rule content. Detailed Findings✅ Quote Removal — No rule content lostVerified that all 150+ removed blockquote lines were purely illustrative quotes under existing rule bullets. Compared the rule bullets between the original (164) and new (156) SKILL.md — the 8 missing bullets are exactly the items from the extracted API Approval Verification section, not lost rules. Every review guideline is preserved. ✅ API Approval Extraction — Content integrity preservedThe extracted ✅ CA1805 Clarification — Correct and necessaryThe original "Do not initialize fields to default values" was ambiguous across managed/native boundaries. The new wording ("managed fields") with the explicit parenthetical exception for native C/C++ code prevents false positives. This aligns with the existing native code rules in the same document (e.g., "Zero-initialize arrays and buffers that may be partially used" under Native Code). ✅ Markdown quality — No trailing whitespaceBoth modified and new files have no trailing whitespace, consistent with the repo's markdown conventions.
|
|
Validated separately that all value in the quotes was already in the bullets. Nothing lost of substance relevant to the agent. |
> [!NOTE] > This PR was developed with Copilot assistance based on analysis of workflow run logs and duration data. ## Problem ~20% of custom code review agent runs hit the 20-minute workflow timeout. Analysis of all 43 timeout runs from the last 1000 workflow executions shows: - **93% of timeouts** are caused by GPT-5.4 sub-agents that never return - GPT-5.4 is present in **100% of timeout runs** (24/24 checked in detail) - GPT-5.2-only runs have **0 timeouts** in 6+ successful runs - 86% of timed-out runs had already posted the review — they time out waiting for hung sub-agents - Each timeout causes the agent job to fail, making the overall workflow **red in CI** (even though the `conclusion` job succeeds) — PR authors must manually rerun The current SKILL.md rule "pick the highest version number" causes gpt-5.4 to always be selected when available. ## Changes (SKILL.md only) 1. **Block gpt-5.4** — it has known reliability issues. Recommend `gpt-5.3-codex` as the GPT-family pick instead. If that also exhibits hangs, we can block the GPT family entirely with no expected quality loss. 2. **Exit after posting** — the agent was lingering 2-3 minutes after successfully posting the review comment, waiting for hung sub-agents. Now it exits immediately once the comment is visible. 3. **Reduce max sub-agents from 4 to 3** — with only 2-3 model families available in practice, 4 was never fully utilized. ## What this does NOT change - The 10-minute sub-agent timeout instruction (already in place, appropriate for agents that do return) - The overall workflow `timeout-minutes: 20` (hardcoded in the compiled `.lock.yml`) - The review methodology, severity definitions, or quality bar - Any CCR (Copilot Code Review) configuration ## Expected impact - Eliminates the dominant timeout cause (GPT-5.4 hangs) - Saves 2-3 min per run from exit-after-post - No expected quality regression: GPT contributed unique blocking findings in 0% of sampled runs ## Data | Metric | Value | |--------|-------| | Runs analyzed | 1000 workflow runs, 420 non-skipped, 218 with CLI data | | Timeout rate | 19.7% (43/218) | | GPT-5.4 in timeouts | 100% (24/24 detailed) | | GPT-5.2 timeouts | 0% (0/6+ successful GPT-5.2 runs) | | Reviews with GPT-unique findings | <8% | | GPT-only blocking bugs found | 0 | | MCP add_comment missing | 12/43 timeout runs (~6% of all runs) — platform issue, not addressed here | ## Why not increase the 20-minute timeout? The GPT-5.4 sub-agent hangs indefinitely — there is no evidence it would eventually complete if given more time. Increasing the timeout would just delay the inevitable and waste more compute. ## Duration distribution (218 runs with CLI data) | Bucket | Runs | % | |--------|------|---| | 0–2m | 18 | 8.3% | | 2–4m | 24 | 11.0% | | 4–6m | 29 | 13.3% | | 6–8m | 25 | 11.5% | | 8–10m | 26 | 11.9% | | 10–12m | 21 | 9.6% | | 12–14m | 9 | 4.1% | | 14–16m | 8 | 3.7% | | 16–18m | 9 | 4.1% | | 18–20m | 6 | 2.8% | | 20m+ (timeout) | 43 | 19.7% | The bimodal distribution (healthy hump at 4–10m, spike at 20m wall) confirms these are hangs, not slow completions. Related: #126779 (not a fix for that, but general efficiency improvement to the review agent) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Note
This PR description was generated with the assistance of Copilot.
Reduce code-review skill token usage (~29% smaller)
We've been seeing several 20-minute timeouts during code review runs. To my knowledge this is a hard limit for GitHub agentic workflows. The
code-reviewSKILL.md is the largest skill file in dotnet/runtime at ~68KB (~17K tokens), and reducing its size should help the model work more efficiently within the available time. This PR removes ~20KB (~5K tokens) of low-value content without losing any review rules or procedures.Changes
1. Remove developer quote examples (−19KB)
The skill contained ~150 indented blockquote lines (
> "...") with real maintainer review quotes used as phrasing examples under each rule bullet. These illustrated how to phrase feedback but did not define what to check. All actual review rules, severity guidance, and actionable instructions are preserved.2. Extract API Approval Verification into a separate file (−3KB from always-loaded context)
The API Approval Verification procedure is a self-contained sub-procedure only needed when a PR introduces new public API surface. It is now in
api-approval-check.mdalongside SKILL.md, loaded on demand.3. Tweaked one line to note that always initialize is a managed only rule
Impact
Context: skill sizes in dotnet/skills
For reference, the 78 SKILL.md files in
dotnet/skillsaverage ~10.8KB (~2,700 tokens), with the largest at ~36KB (~9,000 tokens). The runtime code-review skill at 48.7KB post-trim is still larger than any of those, but it covers a comprehensive multi-step review procedure rather than a single-purpose task.