Skip to content

Reduce code-review skill token usage (~29% smaller)#126779

Merged
danmoseley merged 3 commits intomainfrom
remove-skill-quotes
Apr 11, 2026
Merged

Reduce code-review skill token usage (~29% smaller)#126779
danmoseley merged 3 commits intomainfrom
remove-skill-quotes

Conversation

@danmoseley
Copy link
Copy Markdown
Member

@danmoseley danmoseley commented Apr 10, 2026

Note

This PR description was generated with the assistance of Copilot.

Reduce code-review skill token usage (~29% smaller)

We've been seeing several 20-minute timeouts during code review runs. To my knowledge this is a hard limit for GitHub agentic workflows. The code-review SKILL.md is the largest skill file in dotnet/runtime at ~68KB (~17K tokens), and reducing its size should help the model work more efficiently within the available time. This PR removes ~20KB (~5K tokens) of low-value content without losing any review rules or procedures.

Changes

1. Remove developer quote examples (−19KB)

The skill contained ~150 indented blockquote lines (> "...") with real maintainer review quotes used as phrasing examples under each rule bullet. These illustrated how to phrase feedback but did not define what to check. All actual review rules, severity guidance, and actionable instructions are preserved.

2. Extract API Approval Verification into a separate file (−3KB from always-loaded context)

The API Approval Verification procedure is a self-contained sub-procedure only needed when a PR introduces new public API surface. It is now in api-approval-check.md alongside SKILL.md, loaded on demand.

  • Step 0 and Step 2 both contain explicit MUST-load instructions with the exact file path
  • A fail-closed clause ensures that if the file cannot be loaded, the agent reports an error and blocks the review rather than silently skipping verification
  • The extracted file is fully self-contained with all three verification steps

3. Tweaked one line to note that always initialize is a managed only rule

Impact

Metric Before After Change
SKILL.md size 68.9 KB 48.7 KB −29.3%
Always-loaded tokens ~17.2K ~12.2K −5K
Review rules/procedures All All None removed
API verification Inline On-demand file Loaded when needed

Context: skill sizes in dotnet/skills

For reference, the 78 SKILL.md files in dotnet/skills average ~10.8KB (~2,700 tokens), with the largest at ~36KB (~9,000 tokens). The runtime code-review skill at 48.7KB post-trim is still larger than any of those, but it covers a comprehensive multi-step review procedure rather than a single-purpose task.

danmoseley and others added 2 commits April 10, 2026 16:29
Remove 150 indented blockquote lines containing real maintainer
quotes used as examples under each review guideline bullet point.
The bullet points already state the rules clearly; the quotes add
~18.8 KB (~4,700 tokens) of low-signal content that doesn't
improve LLM instruction-following in a dedicated skill context.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the API Approval Verification procedure (~3KB / ~750 tokens) from
SKILL.md into api-approval-check.md. This section is only needed when a
PR introduces new public API surface, so it is now loaded on demand
rather than always present in the skill context.

Both Step 0 and Step 2 references are updated with explicit MUST-load
instructions pointing to the extracted file path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 10, 2026 23:52
@github-actions github-actions Bot added the area-skills Agent Skills label Apr 10, 2026
@danmoseley danmoseley requested a review from jeffhandley April 10, 2026 23:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces the always-loaded size of the code-review skill documentation to help avoid timeouts during automated review runs, while keeping the same review rules and procedures by removing low-value quoted examples and moving the API approval procedure to an on-demand file.

Changes:

  • Removed large sets of maintainer-quote examples to reduce SKILL.md size.
  • Extracted “API Approval Verification” into a standalone, on-demand markdown file.
  • Updated SKILL.md to require loading the extracted procedure when new public API surface is detected, including a fail-closed instruction if the file can’t be loaded.
Show a summary per file
File Description
.github/skills/code-review/SKILL.md Removes quote examples and updates API approval instructions to reference the extracted procedure file.
.github/skills/code-review/api-approval-check.md New standalone API approval verification procedure to be loaded only when relevant.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@danmoseley
Copy link
Copy Markdown
Member Author

Other extractions are possible eg., parts related to native code/runtime could be extracted to not load for PR's that don't touch those things. But this is the biggest improvement, no need to churn more without reason.

Native C/C++ code must explicitly initialize fields and locals — the
JIT coding conventions doc says the opposite of CA1805. Add scope
qualifier to prevent misapplication to native code reviews.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread .github/skills/code-review/SKILL.md
@github-actions
Copy link
Copy Markdown
Contributor

🤖 Copilot Code Review — PR #126779

Note

This review was generated by Copilot.

Holistic Assessment

Motivation: This PR reduces token bloat in the code-review skill by removing ~4,700 tokens of illustrative maintainer quotes, extracting the API approval procedure for on-demand loading (~750 tokens saved for non-API PRs), and fixing a scoping ambiguity in the CA1805 rule that could cause false positives on native code. All three changes are well-motivated and address real issues with AI skill consumption.

Approach: The approach is sound — the quotes were redundant with the rule statements, the API procedure extraction uses clear MUST-load semantics with a defined fallback, and the CA1805 clarification is precise and accurate.

Summary: ✅ LGTM. All three changes are clean, well-scoped, and improve the skill document without losing any rule content.


Detailed Findings

✅ Quote Removal — No rule content lost

Verified that all 150+ removed blockquote lines were purely illustrative quotes under existing rule bullets. Compared the rule bullets between the original (164) and new (156) SKILL.md — the 8 missing bullets are exactly the items from the extracted API Approval Verification section, not lost rules. Every review guideline is preserved.

✅ API Approval Extraction — Content integrity preserved

The extracted api-approval-check.md faithfully reproduces the original "API Approval Verification" section content. Both Step 0 (line 35) and Step 2 (line 57) references are updated with explicit MUST-load instructions pointing to the file path, and a fallback error is defined if the file cannot be loaded. The reference in the API Design section (line 285) is also updated. No broken internal anchor links remain anywhere in the repository.

✅ CA1805 Clarification — Correct and necessary

The original "Do not initialize fields to default values" was ambiguous across managed/native boundaries. The new wording ("managed fields") with the explicit parenthetical exception for native C/C++ code prevents false positives. This aligns with the existing native code rules in the same document (e.g., "Zero-initialize arrays and buffers that may be partially used" under Native Code).

✅ Markdown quality — No trailing whitespace

Both modified and new files have no trailing whitespace, consistent with the repo's markdown conventions.

Generated by Code Review for issue #126779 ·

@danmoseley
Copy link
Copy Markdown
Member Author

Validated separately that all value in the quotes was already in the bullets. Nothing lost of substance relevant to the agent.

@danmoseley danmoseley enabled auto-merge (squash) April 11, 2026 00:29
@danmoseley danmoseley requested a review from hoyosjs April 11, 2026 01:31
@danmoseley danmoseley merged commit 25f7804 into main Apr 11, 2026
25 checks passed
@danmoseley danmoseley deleted the remove-skill-quotes branch April 11, 2026 02:17
danmoseley added a commit that referenced this pull request Apr 11, 2026
> [!NOTE]
> This PR was developed with Copilot assistance based on analysis of
workflow run logs and duration data.

## Problem

~20% of custom code review agent runs hit the 20-minute workflow
timeout. Analysis of all 43 timeout runs from the last 1000 workflow
executions shows:

- **93% of timeouts** are caused by GPT-5.4 sub-agents that never return
- GPT-5.4 is present in **100% of timeout runs** (24/24 checked in
detail)
- GPT-5.2-only runs have **0 timeouts** in 6+ successful runs
- 86% of timed-out runs had already posted the review — they time out
waiting for hung sub-agents
- Each timeout causes the agent job to fail, making the overall workflow
**red in CI** (even though the `conclusion` job succeeds) — PR authors
must manually rerun

The current SKILL.md rule "pick the highest version number" causes
gpt-5.4 to always be selected when available.

## Changes (SKILL.md only)

1. **Block gpt-5.4** — it has known reliability issues. Recommend
`gpt-5.3-codex` as the GPT-family pick instead. If that also exhibits
hangs, we can block the GPT family entirely with no expected quality
loss.
2. **Exit after posting** — the agent was lingering 2-3 minutes after
successfully posting the review comment, waiting for hung sub-agents.
Now it exits immediately once the comment is visible.
3. **Reduce max sub-agents from 4 to 3** — with only 2-3 model families
available in practice, 4 was never fully utilized.

## What this does NOT change

- The 10-minute sub-agent timeout instruction (already in place,
appropriate for agents that do return)
- The overall workflow `timeout-minutes: 20` (hardcoded in the compiled
`.lock.yml`)
- The review methodology, severity definitions, or quality bar
- Any CCR (Copilot Code Review) configuration

## Expected impact

- Eliminates the dominant timeout cause (GPT-5.4 hangs)
- Saves 2-3 min per run from exit-after-post
- No expected quality regression: GPT contributed unique blocking
findings in 0% of sampled runs

## Data

| Metric | Value |
|--------|-------|
| Runs analyzed | 1000 workflow runs, 420 non-skipped, 218 with CLI data
|
| Timeout rate | 19.7% (43/218) |
| GPT-5.4 in timeouts | 100% (24/24 detailed) |
| GPT-5.2 timeouts | 0% (0/6+ successful GPT-5.2 runs) |
| Reviews with GPT-unique findings | <8% |
| GPT-only blocking bugs found | 0 |
| MCP add_comment missing | 12/43 timeout runs (~6% of all runs) —
platform issue, not addressed here |

## Why not increase the 20-minute timeout?

The GPT-5.4 sub-agent hangs indefinitely — there is no evidence it would
eventually complete if given more time. Increasing the timeout would
just delay the inevitable and waste more compute.

## Duration distribution (218 runs with CLI data)

| Bucket | Runs | % |
|--------|------|---|
| 0–2m | 18 | 8.3% |
| 2–4m | 24 | 11.0% |
| 4–6m | 29 | 13.3% |
| 6–8m | 25 | 11.5% |
| 8–10m | 26 | 11.9% |
| 10–12m | 21 | 9.6% |
| 12–14m | 9 | 4.1% |
| 14–16m | 8 | 3.7% |
| 16–18m | 9 | 4.1% |
| 18–20m | 6 | 2.8% |
| 20m+ (timeout) | 43 | 19.7% |

The bimodal distribution (healthy hump at 4–10m, spike at 20m wall)
confirms these are hangs, not slow completions.

Related: #126779 (not a fix for that, but general efficiency improvement
to the review agent)

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-skills Agent Skills

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants