diff --git a/.github/skills/issue-triage/SKILL.md b/.github/skills/issue-triage/SKILL.md index a3c920021376e7..bdc16692ad98ff 100644 --- a/.github/skills/issue-triage/SKILL.md +++ b/.github/skills/issue-triage/SKILL.md @@ -232,7 +232,7 @@ Based on the issue type classified in Step 1, follow the appropriate guide: |------|-------|---------------| | **Bug report** | [Bug triage](references/bug-triage.md) | Reproduction, regression validation, minimal repro derivation, root cause analysis | | **API proposal** | [API proposal triage](references/api-proposal-triage.md) | Merit evaluation, complexity estimation | -| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression with BenchmarkDotNet, git bisect to culprit commit | +| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression, assess severity and impact. For detailed investigation methodology (benchmarking, bisection), use the `performance-investigation` skill. | | **Question** | [Question triage](references/question-triage.md) | Research and answer the question, verify if low confidence | | **Enhancement** | [Enhancement triage](references/enhancement-triage.md) | Subcategory classification, feasibility analysis, trade-off assessment (includes performance improvement requests) | @@ -521,5 +521,6 @@ depending on the outcome: |-----------|-------|-----------------| | API proposal recommended as KEEP | **api-proposal** | Offer to draft a formal API proposal with working prototype | | Bug report with root cause identified | **jit-regression-test** | If the bug is JIT-related, offer to create a regression test | -| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks | +| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression locally (CoreRun builds, bisection) | +| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks via @EgorBot | | Fix PR linked to the issue | **code-review** | Offer to review the fix PR for correctness and consistency | diff --git a/.github/skills/issue-triage/references/perf-regression-triage.md b/.github/skills/issue-triage/references/perf-regression-triage.md index 1382c09ee8e1e6..4fb2b37732548d 100644 --- a/.github/skills/issue-triage/references/perf-regression-triage.md +++ b/.github/skills/issue-triage/references/perf-regression-triage.md @@ -1,13 +1,14 @@ # Performance Regression Triage -Guidance for investigating and triaging performance regressions in -dotnet/runtime. Referenced from the main [SKILL.md](../SKILL.md) during Step 5. +Triage-specific guidance for assessing and recommending action on performance +regressions in dotnet/runtime. Referenced from the main +[SKILL.md](../SKILL.md) during Step 5. -> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd` -> on Windows or `./build.sh` on Linux/macOS. Other shell commands use -> Linux/macOS syntax (`cp -r`, forward-slash paths, `\` line continuation). -> On Windows, adapt accordingly: use `Copy-Item` or `xcopy`, backslash paths, -> and backtick (`` ` ``) line continuation. +For detailed investigation methodology (benchmarking, bisection, bot usage), +use the `performance-investigation` skill. This document covers only the +triage-specific assessment and recommendation criteria. + +## Sources of Performance Regressions A performance regression is a report that something got measurably slower (or uses more memory/allocations) compared to a previous .NET version or a recent @@ -21,307 +22,19 @@ commit. These reports come from several sources: - **Cross-release regressions** -- a regression observed between two stable releases (e.g., .NET 9 → .NET 10) without a specific commit range. -The goals of this triage are to: - -1. **Validate** that the regression is real and reproducible. -2. **Bisect** to the exact commit that introduced it. - -## Feasibility Check - -Before investing time in benchmarking and bisection, assess whether the current -environment can support the investigation. Full bisection requires building -dotnet/runtime at multiple commits (each build takes 5-40 minutes) and running -benchmarks, which is resource-intensive. - -| Factor | Feasible | Not feasible | -|--------|----------|--------------| -| **Disk space** | >50 GB free (for multiple builds) | <20 GB free | -| **Build time budget** | User is willing to wait 30-60+ min | Quick-turnaround triage expected | -| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) | -| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits | -| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware | - -### When full bisection is not feasible - -Use the **lightweight analysis** path instead: - -1. **Analyze `git log`** -- Review commits in the regression range - (`git log --oneline {good}..{bad}`) and identify changes to the affected - code path. Look for algorithmic changes, removed optimizations, added - validation, or new allocations. -2. **Check PR descriptions** -- For each suspicious commit, read the associated - PR description and review comments. Performance trade-offs are often - discussed there. -3. **Narrow by code path** -- Use `git log --oneline {good}..{bad} -- path/` - to filter commits to the affected library or component. -4. **Report the narrowed range** -- Include the list of candidate commits/PRs - in the triage report with an explanation of why each is suspicious. This - gives maintainers a head start even without a definitive bisect result. - -Note in the triage report that full bisection was not attempted and why -(e.g., "environment mismatch", "time constraint"), so maintainers know to -verify independently. - -## Identifying the Bisect Range - -Before benchmarking, determine the good and bad commits that bound the -regression. - -### Automated bot issues (`performanceautofiler`) - -Issues from `performanceautofiler[bot]` follow a standard format: - -- **Run Information** -- Baseline commit, Compare commit, diff link, OS, arch, - and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`). -- **Regression tables** -- Each table shows benchmark name, Baseline time, - Test time, and Test/Base ratio. A ratio >1.0 indicates a regression. -- **Repro commands** -- Typically: - ``` - git clone https://github.com/dotnet/performance.git - python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*' - ``` -- **Graphs** -- Time-series graphs showing when the regression appeared. - -Key fields to extract: - -- The **Baseline** and **Compare** commit SHAs -- these define the bisect range. -- The **benchmark filter** -- the `--filter` argument to reproduce the benchmark. -- The **Test/Base ratio** -- how severe the regression is (>1.5× is significant). - -### Customer reports - -When a customer reports a regression (e.g., "X is slower on .NET 10 than -.NET 9"), there are no pre-defined commit SHAs. You need to determine the -bisect range yourself -- see [Cross-release regressions](#cross-release-regressions) -below. - -Also identify the **scenario to benchmark** from the customer's report -- the -specific API call, code pattern, or workload that regressed. - -### Cross-release regressions - -When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect -on the `main` branch between the commits from which the release branches were -snapped. Release branches in dotnet/runtime are -[snapped from main](../../../../docs/project/branching-guide.md). - -Find the snap points with `git merge-base`: - -``` -git merge-base main release/9.0 # → good commit (last common ancestor) -git merge-base main release/10.0 # → bad commit -``` - -Use the resulting SHAs as the good/bad boundaries for bisection on `main`. -This avoids bisecting across release branches where cherry-picks and backports -make the history non-linear. - -## Phase 1: Create a Standalone Benchmark - -Before investing time in bisection, create a standalone BenchmarkDotNet -project that reproduces the regressing scenario. This project will be used -for both validation (Phase 1) and bisection (Phase 3). - -### Why a standalone project? - -The full [dotnet/performance](https://github.com/dotnet/performance) repo -has many dependencies and can be fragile across different runtime commits. -A standalone project with only the impacted benchmark is faster to build, -easier to iterate on, and more reliable during `git bisect`. - -### Creating the benchmark project - -**From an automated bot issue** -- copy the relevant benchmark class and its -dependencies from the `dotnet/performance` repo into a new standalone project: - -1. Clone `dotnet/performance` and locate the benchmark class referenced in the - issue's `--filter` argument. -2. Create a new console project and add a reference to - `BenchmarkDotNet` (NuGet): - ``` - mkdir PerfRepro && cd PerfRepro - dotnet new console - dotnet add package BenchmarkDotNet - ``` -3. Copy the benchmark class (and any helper types it depends on) into the - project. Adjust namespaces and usings as needed. -4. Add a `Program.cs` entry point: - ```csharp - BenchmarkDotNet.Running.BenchmarkSwitcher - .FromAssembly(typeof(Program).Assembly) - .Run(args); - ``` - -**From a customer report** -- write a minimal BenchmarkDotNet benchmark that -exercises the reported code path: - -1. Create a new console project with `BenchmarkDotNet` as above. -2. Write a `[Benchmark]` method that calls the API or runs the workload the - customer identified as slow. -3. If the customer provided sample code, adapt it into a proper BDN benchmark - with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path. - -### Building dotnet/runtime and obtaining CoreRun - -Build dotnet/runtime at the commit you want to test: - -``` -build.cmd/sh clr+libs -c release -``` - -The key artifact is the **testhost** folder containing **CoreRun** at: - -``` -artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/ -``` - -BenchmarkDotNet uses CoreRun to load the locally-built runtime and libraries, -meaning you can benchmark private builds without installing them as SDKs. - -### Validating the regression - -Build dotnet/runtime at both the good and bad commits, saving each testhost -folder: - -``` -git checkout {bad-sha} -build.cmd/sh clr+libs -c release -cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad - -git checkout {good-sha} -build.cmd/sh clr+libs -c release -cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good -``` - -Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares -them side-by-side when given multiple `--coreRun` paths (the first is treated -as the baseline): - -``` -cd PerfRepro -dotnet run -c Release -f net{ver} -- \ - --filter '*' \ - --coreRun /tmp/corerun-good/.../CoreRun \ - /tmp/corerun-bad/.../CoreRun -``` - -To add a statistical significance column, append `--statisticalTest 5%`. -This performs a Mann–Whitney U test and marks results as `Faster`, `Slower`, -or `Same`. - -### Interpret the results - -| Outcome | Meaning | Next step | -|---------|---------|-----------| -| `Slower` with ratio >1.10 | Regression confirmed | Proceed to Phase 2 | -| `Slower` with ratio between 1.05 and 1.10 | Small regression -- likely real but needs confirmation | Re-run with more iterations (`--iterationCount 30`). If it persists, treat as confirmed and proceed to Phase 2. | -| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. | -| `Slower` but ratio <1.05 | Marginal -- may be noise | Re-run with more iterations (`--iterationCount 30`). If still marginal, note as inconclusive. | - -For a thorough comparison of saved BDN result files, use the -[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer) -tool: - -``` -dotnet run --project performance/src/tools/ResultsComparer \ - --base /path/to/baseline-results \ - --diff /path/to/compare-results \ - --threshold 5% -``` - -## Phase 2: Narrow the Commit Range - -If the bisect range spans many commits, narrow it before running a full -bisect: - -1. **Check `git log --oneline {good}..{bad}`** -- how many commits are in the - range? If it is more than ~200, try to narrow it first. -2. **Test midpoint commits manually** -- pick a commit in the middle of the - range, build, run the benchmark, and determine if it is good or bad. - This halves the range in one step. -3. **For cross-release regressions** -- use the `git merge-base` snap points - described above. If the range between two release snap points is still - large, test at intermediate release preview tags to narrow further. - -## Phase 3: Git Bisect - -Once you have a manageable commit range (good commit and bad commit), use -`git bisect` to binary-search for the culprit. - -### Bisect workflow - -At each step of the bisect, you need to: - -1. **Rebuild the affected component** -- use incremental builds where possible - (see [Incremental Rebuilds](#incremental-rebuilds-during-bisect) below). -2. **Run the standalone benchmark** with the freshly-built CoreRun: - ``` - cd PerfRepro - dotnet run -c Release -f net{ver} -- \ - --filter '*' \ - --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun - ``` -3. **Determine good or bad** -- compare the result against your threshold. - -**Exit codes for `git bisect run`:** -- `0` -- good (no regression at this commit) -- `1`–`124` -- bad (regression present) -- `125` -- skip (build failure or untestable commit) - -The standalone benchmark project must be **outside the dotnet/runtime tree** -since `git bisect` checks out different commits, which would overwrite -in-tree files. Place it in a stable location (e.g., `/tmp/bisect/`). - -### Run the bisect - -``` -cd /path/to/runtime -git bisect start {bad-sha} {good-sha} -git bisect run /path/to/bisect-script.sh -``` - -**Time estimate:** Each bisect step requires a rebuild + benchmark run. -For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect -roughly 50 minutes for the full bisect. - -### After bisect completes - -`git bisect` will output the first bad commit. Run `git bisect reset` to -return to the original branch. - -### Root cause analysis and triage report - -Include the following in the triage report: - -1. **The culprit commit or PR** -- link to the specific commit SHA and its - associated PR. Explain how the change relates to the regressing benchmark. -2. **Root cause analysis** -- describe *why* the change caused the regression - (e.g., an algorithm change, a removed optimization, additional validation - overhead). -3. **If the root cause spans multiple PRs** -- sometimes a regression results - from the combined effect of several changes and `git bisect` lands on a - commit that is only one contributing factor. In this case, report the - narrowest commit range that introduced the regression and list the PRs or - commits within that range that appear relevant to the affected code path. - -## Incremental Rebuilds During Bisect - -Full rebuilds are slow. Minimize per-step build time: +## Investigation -| Component changed | Fast rebuild command | -|-------------------|---------------------| -| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` | -| CoreLib | `build.cmd/sh clr.corelib -c Release` | -| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` | -| All libraries | `build.cmd/sh libs -c Release` | +The investigation goal is to validate that the regression is real and, if +possible, bisect to the exact commit that introduced it. -After an incremental library rebuild, the updated DLL is placed in the -testhost folder automatically. CoreRun will pick up the new version on the -next benchmark run. +Use the `performance-investigation` skill (Workflow 2: Regression Investigation) +for the full methodology, which includes: -**Caveat:** If bisect crosses a commit that changes the build infrastructure -(e.g., SDK version bump in `global.json`), the incremental build may fail. -Use exit code `125` (skip) to handle this gracefully. +- Feasibility checks for local vs. bot-based investigation +- Building dotnet/runtime at specific commits and using CoreRun +- Comparing good/bad commits with BenchmarkDotNet +- Git bisect workflow for finding the culprit commit +- Using @EgorBot and @MihuBot for remote validation ## Performance-Specific Assessment diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md new file mode 100644 index 00000000000000..416c2a2278c0b7 --- /dev/null +++ b/.github/skills/performance-investigation/SKILL.md @@ -0,0 +1,143 @@ +--- +name: performance-investigation +description: > + Investigate performance regressions locally in dotnet/runtime. Use this skill + when asked to investigate a performance regression, bisect to find a culprit + commit, validate a regression with local builds, compare performance between + commits using CoreRun, or benchmark private runtime builds with + BenchmarkDotNet. Also use when asked about CoreRun, testhost, or local + benchmarking against private builds. DO NOT USE FOR ad hoc PR benchmarking + with @EgorBot or @MihuBot (use the performance-benchmark skill instead). +--- + +# Local Performance Investigation for dotnet/runtime + +Investigate performance regressions locally by building the runtime at specific +commits, running BenchmarkDotNet with CoreRun, and using git bisect to find +culprit commits. This skill covers the full local investigation workflow from +validation to root-causing. + +## When to Use This Skill + +- Asked to **investigate a performance regression** (from an issue, bot report, + or customer report) +- Asked to **compare performance** between commits, branches, or releases using + local builds +- Asked to **bisect** to find the commit that introduced a regression +- Asked to **benchmark private runtime builds** using CoreRun +- Asked to **triage a performance issue** (use alongside the `issue-triage` + skill for full triage) +- Given a `tenet-performance` or `tenet-performance-benchmarks` labeled issue + that requires local investigation + +> **Note:** For ad hoc PR benchmarking via @EgorBot or @MihuBot, use the +> `performance-benchmark` skill instead. This skill focuses on local builds, +> CoreRun, and git bisect. + +## Investigation Workflow + +The investigation follows three phases: + +1. **Validate** — Confirm the regression is real and reproducible +2. **Narrow** — Reduce the commit range to a manageable size +3. **Bisect** — Binary-search for the culprit commit + +For the full methodology, including feasibility checks, commit range +identification, and step-by-step bisection instructions, see the +[bisection guide](references/bisection-guide.md). + +For details on building the runtime, using CoreRun, and running BenchmarkDotNet +against private builds, see the +[local benchmarking guide](references/local-benchmarking.md). + +### Reporting Results + +After completing the investigation, include in your report: + +- Whether the regression was **confirmed** or **not reproduced** +- The **culprit commit/PR** (if bisection was performed) +- **Root cause analysis** — why the change caused the regression +- **Severity assessment** — Test/Base ratio, number of affected benchmarks, + user impact + +--- + +## Writing Good Benchmarks + +These guidelines apply whether you're writing a benchmark for local validation +or for contribution to the dotnet/performance repo. + +For comprehensive guidance, see the +[Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md). + +### Key Principles + +- **Move initialization to `[GlobalSetup]`** — separate setup from the measured + code to avoid measuring allocation/initialization overhead +- **Return values** from benchmark methods to prevent dead code elimination +- **Avoid manual loops** — BenchmarkDotNet invokes the benchmark many times + automatically; adding loops distorts measurements +- **No side effects** — benchmarks should be pure and produce consistent results +- **Focus on common cases** — benchmark hot paths and typical usage, not edge + cases +- **Use consistent input data** — always use the same test data for reproducible + comparisons + +### Benchmark Class Requirements + +- Must be `public` +- Must be a `class` (not struct) +- Must not be `sealed` +- Must not be `static` + +### Example: Standalone Investigation Benchmark + +```csharp +using BenchmarkDotNet.Attributes; +using BenchmarkDotNet.Running; + +BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); + +[MemoryDiagnoser] +public class Bench +{ + private string _testString = default!; + + [Params(10, 100, 1000)] + public int Length { get; set; } + + [GlobalSetup] + public void Setup() + { + _testString = new string('a', Length); + } + + [Benchmark] + public int StringOperation() + { + return _testString.IndexOf('z'); + } +} +``` + +--- + +## External Resources + +- [dotnet/performance repository](https://github.com/dotnet/performance) — + central location for all .NET runtime benchmarks +- [Benchmarking workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md) +- [Profiling workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) +- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md) +- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html) +- [Performance guidelines](../../../../docs/project/performance-guidelines.md) — + project-wide performance policy + +## Related Skills + +| Condition | Skill | When to use | +|-----------|-------|-------------| +| Need to benchmark a PR via @EgorBot | **performance-benchmark** | For ad hoc PR benchmarking on dedicated hardware | +| Triaging a performance regression issue | **issue-triage** | For the full triage workflow (assessment, recommendation, labels) | +| Fix PR linked to the regression | **code-review** | To review the fix for correctness and consistency | +| JIT regression test needed | **jit-regression-test** | To extract a JIT regression test from the issue | diff --git a/.github/skills/performance-investigation/evals/evals.json b/.github/skills/performance-investigation/evals/evals.json new file mode 100644 index 00000000000000..21d363829af87e --- /dev/null +++ b/.github/skills/performance-investigation/evals/evals.json @@ -0,0 +1,137 @@ +{ + "skill_name": "performance-investigation", + "evals": [ + { + "id": 1, + "name": "perf-regression-autobot", + "prompt": "Investigate this performance regression: https://github.com/dotnet/runtime/issues/114625", + "expected_output": "Should follow the regression investigation workflow. Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and plan validation or bisection using local builds.", + "assertions": [ + { + "name": "identifies-regression", + "description": "Recognizes and follows the regression investigation workflow", + "type": "contains_any", + "check": ["regression", "investigate", "Regression"] + }, + { + "name": "identifies-commits", + "description": "Identifies or references baseline/compare commits from the bot report", + "type": "contains_any", + "check": ["commit", "SHA", "baseline", "compare", "bisect"] + }, + { + "name": "assesses-severity", + "description": "Assesses the regression severity using the ratio", + "type": "contains_any", + "check": ["ratio", "severity", "Test/Base", "slower", "regression"] + } + ], + "files": [] + }, + { + "id": 2, + "name": "benchmark-with-corerun", + "prompt": "How do I benchmark my local runtime changes against the main branch?", + "expected_output": "Should explain how to build dotnet/runtime, obtain CoreRun from the testhost folder, and run BenchmarkDotNet with the --coreRun argument to compare private builds.", + "assertions": [ + { + "name": "mentions-corerun", + "description": "Explains CoreRun as the mechanism for benchmarking private builds", + "type": "contains_any", + "check": ["CoreRun", "coreRun", "--coreRun", "testhost"] + }, + { + "name": "mentions-build", + "description": "References building the runtime", + "type": "contains_any", + "check": ["clr+libs", "build.cmd", "build.sh"] + }, + { + "name": "mentions-bdn", + "description": "References BenchmarkDotNet for running the benchmarks", + "type": "contains_any", + "check": ["BenchmarkDotNet", "BDN", "[Benchmark]"] + } + ], + "files": [] + }, + { + "id": 3, + "name": "cross-release-regression", + "prompt": "A user reports that string.IndexOf is 2x slower in .NET 10 compared to .NET 9. How should we investigate?", + "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression locally using CoreRun builds.", + "assertions": [ + { + "name": "mentions-merge-base", + "description": "Explains using git merge-base for cross-release bisection", + "type": "contains_any", + "check": ["merge-base", "release branch", "snap point"] + }, + { + "name": "mentions-benchmark-creation", + "description": "Suggests creating a benchmark for the reported scenario", + "type": "contains_any", + "check": ["benchmark", "BenchmarkDotNet", "[Benchmark]", "standalone"] + }, + { + "name": "mentions-bisect", + "description": "References git bisect as part of the investigation", + "type": "contains_any", + "check": ["bisect", "git bisect", "binary search"] + } + ], + "files": [] + }, + { + "id": 4, + "name": "compare-commits-locally", + "prompt": "Compare the performance of two specific commits locally for System.Text.Json serialization", + "expected_output": "Should explain how to build dotnet/runtime at both commits, save testhost/CoreRun artifacts, and run BenchmarkDotNet with --coreRun pointing to both builds for a side-by-side comparison.", + "assertions": [ + { + "name": "mentions-corerun", + "description": "References CoreRun or testhost for running against private builds", + "type": "contains_any", + "check": ["CoreRun", "coreRun", "--coreRun", "testhost"] + }, + { + "name": "mentions-both-builds", + "description": "Explains building at both commits for comparison", + "type": "contains_any", + "check": ["both commits", "good", "bad", "baseline", "two builds", "each commit"] + } + ], + "files": [] + }, + { + "id": 5, + "name": "not-applicable-bug-issue", + "prompt": "Can you check the performance impact of https://github.com/dotnet/runtime/issues/46088", + "expected_output": "Should recognize this is a functional bug (System.Text.Json does not support constructors with byref parameters), not a performance issue. Should indicate that performance benchmarking is not applicable here.", + "assertions": [ + { + "name": "identifies-not-perf", + "description": "Recognizes this is not a performance issue", + "type": "contains_any", + "check": ["not a performance", "not performance-related", "no performance", "functional", "not applicable", "does not apply", "isn't a performance"] + } + ], + "files": [] + }, + { + "id": 6, + "name": "not-applicable-doc-pr", + "prompt": "Benchmark the changes in PR https://github.com/dotnet/runtime/pull/124592 to validate performance", + "expected_output": "Should recognize this is a documentation-only PR (adding XML docs to DI extension methods) and that benchmarking is not applicable or meaningful for documentation changes.", + "assertions": [ + { + "name": "identifies-doc-only", + "description": "Recognizes this is a documentation/non-functional change where benchmarking is not meaningful", + "type": "contains_any", + "check": ["documentation", "doc", "no functional", "no code change", "not applicable", "does not apply", "no performance impact", "not meaningful", "wouldn't affect", "won't affect", "no runtime"] + } + ], + "files": [] + } + ] +} diff --git a/.github/skills/performance-investigation/references/bisection-guide.md b/.github/skills/performance-investigation/references/bisection-guide.md new file mode 100644 index 00000000000000..152858f18dd524 --- /dev/null +++ b/.github/skills/performance-investigation/references/bisection-guide.md @@ -0,0 +1,176 @@ +# Git Bisect for Performance Regressions + +This guide covers how to use `git bisect` to find the exact commit that +introduced a performance regression. It's a 3-phase process: validate the +regression, narrow the commit range, then bisect. + +## Feasibility Check + +Before investing time in bisection, assess whether the current environment can +support the investigation. Full bisection requires building dotnet/runtime at +multiple commits (each build takes 5–40 minutes) and running benchmarks, which +is resource-intensive. + +| Factor | Feasible | Not feasible | +|--------|----------|--------------| +| **Disk space** | >50 GB free (multiple builds) | <20 GB free | +| **Build time budget** | Willing to wait 30–60+ min | Quick-turnaround expected | +| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) | +| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits | +| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware | + +### When full bisection is not feasible + +Use a **lightweight analysis** path instead: + +1. **Analyze `git log`** — Review commits in the regression range + (`git log --oneline {good}..{bad}`) and identify changes to the affected code + path. Look for algorithmic changes, removed optimizations, added validation, + or new allocations. +2. **Check PR descriptions** — For each suspicious commit, read the associated + PR description and review comments. Performance trade-offs are often discussed + there. +3. **Narrow by code path** — Use `git log --oneline {good}..{bad} -- path/` to + filter commits to the affected library or component. +4. **Report the narrowed range** — Include the list of candidate commits/PRs with + an explanation of why each is suspicious. This gives maintainers a head start + even without a definitive bisect result. + +Note in the report that full bisection was not attempted and why. + +## Identifying the Bisect Range + +Determine the good and bad commits that bound the regression. + +### Automated bot issues (`performanceautofiler`) + +Issues from `performanceautofiler[bot]` follow a standard format: + +- **Run Information** — Baseline commit, Compare commit, diff link, OS, arch, + and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`). +- **Regression tables** — Each table shows benchmark name, Baseline time, Test + time, and Test/Base ratio. A ratio >1.0 indicates a regression. +- **Repro commands** — Typically: + ``` + git clone https://github.com/dotnet/performance.git + python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*' + ``` +- **Graphs** — Time-series graphs showing when the regression appeared. + +Key fields to extract: + +- The **Baseline** and **Compare** commit SHAs — these define the bisect range. +- The **benchmark filter** — the `--filter` argument to reproduce the benchmark. +- The **Test/Base ratio** — how severe the regression is (>1.5× is significant). + +### Customer reports + +When a customer reports a regression (e.g., "X is slower on .NET 10 than +.NET 9"), there are no pre-defined commit SHAs. Determine the bisect range using +the cross-release approach below. + +### Cross-release regressions + +When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect on +the `main` branch between the commits from which the release branches were +snapped. Release branches in dotnet/runtime are +[snapped from main](../../../../docs/project/branching-guide.md). + +Find the snap points with `git merge-base`: + +``` +git merge-base main release/9.0 # → good commit (last common ancestor) +git merge-base main release/10.0 # → bad commit +``` + +Use the resulting SHAs as the good/bad boundaries for bisection on `main`. This +avoids bisecting across release branches where cherry-picks and backports make +the history non-linear. + +## Phase 1: Validate the Regression + +Before bisecting, confirm the regression is reproducible. Create a standalone +BenchmarkDotNet project (see +[local benchmarking guide](local-benchmarking.md#creating-a-standalone-benchmark-project)), +build the runtime at the good and bad commits, and compare results. + +If the regression is not reproducible locally, check for environment differences +(OS, arch, CPU model) and note this in your report. Consider using +[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead. + +## Phase 2: Narrow the Commit Range + +If the bisect range spans many commits, narrow it before running a full bisect: + +1. **Check `git log --oneline {good}..{bad}`** — how many commits are in the + range? If more than ~200, narrow first. +2. **Test midpoint commits manually** — pick a commit in the middle of the range, + build, run the benchmark, and determine if it is good or bad. This halves the + range in one step. +3. **For cross-release regressions** — use the `git merge-base` snap points. If + the range between two release snap points is still large, test at intermediate + release preview tags to narrow further. + +## Phase 3: Git Bisect + +Once you have a manageable commit range, use `git bisect` to binary-search for +the culprit. + +### Bisect workflow + +At each step: + +1. **Rebuild the affected component** — use incremental builds where possible + (see [incremental rebuilds](local-benchmarking.md#incremental-rebuilds)). +2. **Run the standalone benchmark** with the freshly-built CoreRun from the + testhost folder (see + [local benchmarking guide](local-benchmarking.md#building-dotnet-runtime-and-obtaining-corerun) + for the exact path): + ``` + cd PerfRepro + dotnet run -c Release -f net{ver} -- \ + --filter '*' \ + --coreRun {runtime}/artifacts/bin/testhost/net{ver}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{ver}/CoreRun + ``` +3. **Determine good or bad** — compare the result against your threshold. + +**Exit codes for `git bisect run`:** +- `0` — good (no regression at this commit) +- `1`–`124` — bad (regression present) +- `125` — skip (build failure or untestable commit) + +The standalone benchmark project must be **outside the dotnet/runtime tree** +since `git bisect` checks out different commits which would overwrite in-tree +files. Place it in a stable location (e.g., `/tmp/bisect/`). + +### Run the bisect + +``` +cd /path/to/runtime +git bisect start {bad-sha} {good-sha} +git bisect run /path/to/bisect-script.sh +``` + +**Time estimate:** Each bisect step requires a rebuild + benchmark run. +For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect +roughly 50 minutes for the full bisect. + +### After bisect completes + +`git bisect` outputs the first bad commit. Run `git bisect reset` to return to +the original branch. + +## Root Cause Analysis + +Include the following in your report: + +1. **The culprit commit or PR** — link to the specific commit SHA and its + associated PR. Explain how the change relates to the regressing benchmark. +2. **Root cause analysis** — describe *why* the change caused the regression + (e.g., an algorithm change, a removed optimization, additional validation + overhead). +3. **If the root cause spans multiple PRs** — sometimes a regression results + from the combined effect of several changes and `git bisect` lands on a + commit that is only one contributing factor. In this case, report the + narrowest commit range and list the PRs within that range that appear + relevant to the affected code path. diff --git a/.github/skills/performance-investigation/references/local-benchmarking.md b/.github/skills/performance-investigation/references/local-benchmarking.md new file mode 100644 index 00000000000000..d4b2ff38329aea --- /dev/null +++ b/.github/skills/performance-investigation/references/local-benchmarking.md @@ -0,0 +1,148 @@ +# Local Benchmarking with Private Runtime Builds + +This guide covers how to benchmark dotnet/runtime changes locally using +BenchmarkDotNet and privately-built runtime binaries (CoreRun). This approach +lets you measure performance without installing a custom SDK — BenchmarkDotNet +loads the locally-built runtime directly. + +> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd` +> on Windows or `./build.sh` on Linux/macOS. Other shell commands use +> Linux/macOS syntax. On Windows, adapt accordingly (use `Copy-Item` or `xcopy`, +> backslash paths, backtick line continuation). + +## Building dotnet/runtime and Obtaining CoreRun + +Build the runtime at the commit you want to test: + +``` +build.cmd/sh clr+libs -c release +``` + +The key artifact is the **testhost** folder containing **CoreRun** at: + +``` +artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/ +``` + +> **Note:** This is different from the bare `corerun` binary under +> `artifacts/bin/coreclr/`. BenchmarkDotNet needs the testhost layout because +> it contains both CoreRun and the complete framework assemblies side-by-side. + +CoreRun is a lightweight host that loads the locally-built runtime and +libraries. BenchmarkDotNet uses it via the `--coreRun` argument to benchmark +private builds without installing them as SDKs. + +## Creating a Standalone Benchmark Project + +For regression validation and bisection, use a standalone BenchmarkDotNet +project rather than the full [dotnet/performance](https://github.com/dotnet/performance) +repo. Standalone projects are faster to build, easier to iterate on, and more +reliable across different runtime commits. + +### From an automated bot issue + +Copy the relevant benchmark class from the `dotnet/performance` repo: + +1. Clone `dotnet/performance` and locate the benchmark class referenced in the + issue's `--filter` argument. +2. Create a new console project: + ``` + mkdir PerfRepro && cd PerfRepro + dotnet new console + dotnet add package BenchmarkDotNet + ``` +3. Copy the benchmark class (and any helper types) into the project. Adjust + namespaces and usings as needed. +4. Add a `Program.cs` entry point: + ```csharp + BenchmarkDotNet.Running.BenchmarkSwitcher + .FromAssembly(typeof(Program).Assembly) + .Run(args); + ``` + +### From a customer report + +Write a minimal BenchmarkDotNet benchmark that exercises the reported code path: + +1. Create a new console project with `BenchmarkDotNet` as above. +2. Write a `[Benchmark]` method that calls the API or runs the workload the + customer identified as slow. +3. If the customer provided sample code, adapt it into a proper BDN benchmark + with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path. + +## Comparing Good and Bad Commits + +Build dotnet/runtime at both the good and bad commits, saving each testhost +folder: + +``` +git checkout {bad-sha} +build.cmd/sh clr+libs -c release +cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad + +git checkout {good-sha} +build.cmd/sh clr+libs -c release +cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good +``` + +Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares them +side-by-side when given multiple `--coreRun` paths (the first is treated as the +baseline): + +``` +cd PerfRepro +dotnet run -c Release -f net{ver} -- \ + --filter '*' \ + --coreRun /tmp/corerun-good/.../CoreRun \ + /tmp/corerun-bad/.../CoreRun +``` + +To add a statistical significance column, append `--statisticalTest 5%`. This +performs a Mann–Whitney U test and marks results as `Faster`, `Slower`, or +`Same`. + +## Interpreting Results + +| Outcome | Meaning | Next step | +|---------|---------|-----------| +| `Slower` with ratio >1.10 | Regression confirmed | Proceed to bisection | +| `Slower` with ratio 1.05–1.10 | Small regression — likely real but needs confirmation | Re-run with `--iterationCount 30`. If it persists, treat as confirmed. | +| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. | +| `Slower` but ratio <1.05 | Marginal — may be noise | Re-run with `--iterationCount 30`. If still marginal, note as inconclusive. | + +## Using ResultsComparer + +For a thorough comparison of saved BDN result files, use the +[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer) +tool: + +``` +dotnet run --project performance/src/tools/ResultsComparer \ + --base /path/to/baseline-results \ + --diff /path/to/compare-results \ + --threshold 5% +``` + +## Incremental Rebuilds + +Full rebuilds are slow. Minimize per-step build time by rebuilding only the +affected component: + +| Component changed | Fast rebuild command | +|-------------------|---------------------| +| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` | +| CoreLib | `build.cmd/sh clr.corelib -c Release` followed by `build.cmd/sh libs.pretest -c Release` | +| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` | +| All libraries | `build.cmd/sh libs -c Release` | + +After an incremental library rebuild (other than System.Private.CoreLib), the +updated DLL is placed in the testhost folder automatically. CoreRun picks up +the new version on the next benchmark run. + +For System.Private.CoreLib, you must run `build.cmd/sh libs.pretest -c Release` +after rebuilding to copy the updated CoreLib into the testhost layout; +otherwise benchmarks may silently run against the older CoreLib. + +**Caveat:** If a rebuild crosses a commit that changes the build infrastructure +(e.g., SDK version bump in `global.json`), the incremental build may fail. In a +`git bisect` context, use exit code `125` (skip) to handle this gracefully.