From 02252ecd4f017a8057d5b57c429ec76bbb443ab0 Mon Sep 17 00:00:00 2001 From: Eirik Tsarpalis Date: Mon, 23 Mar 2026 18:33:37 +0200 Subject: [PATCH 1/5] Add performance-investigation skill, replacing performance-benchmark Consolidate all performance investigation guidance into a single skill with three workflows: - PR benchmark validation (EgorBot/MihuBot) - Regression investigation (CoreRun builds, git bisect) - JIT diff analysis (MihuBot) Reference docs cover EgorBot, MihuBot, local benchmarking with CoreRun, and git bisect methodology. Includes 9 evals covering all workflows plus negative cases. The existing performance-benchmark skill is removed (fully superseded). The issue-triage skill's perf-regression-triage.md is slimmed to keep only triage-specific assessment/recommendation criteria, delegating investigation methodology to the new skill. All cross-references updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/copilot-instructions.md | 2 +- .github/skills/api-proposal/SKILL.md | 2 +- .github/skills/issue-triage/SKILL.md | 4 +- .../references/perf-regression-triage.md | 323 +----------------- .github/skills/jit-regression-test/SKILL.md | 2 +- .github/skills/performance-benchmark/SKILL.md | 191 ----------- .../skills/performance-investigation/SKILL.md | 305 +++++++++++++++++ .../evals/evals.json | 206 +++++++++++ .../references/bisection-guide.md | 173 ++++++++++ .../references/egorbot-reference.md | 73 ++++ .../references/local-benchmarking.md | 140 ++++++++ .../references/mihubot-reference.md | 66 ++++ 12 files changed, 986 insertions(+), 501 deletions(-) delete mode 100644 .github/skills/performance-benchmark/SKILL.md create mode 100644 .github/skills/performance-investigation/SKILL.md create mode 100644 .github/skills/performance-investigation/evals/evals.json create mode 100644 .github/skills/performance-investigation/references/bisection-guide.md create mode 100644 .github/skills/performance-investigation/references/egorbot-reference.md create mode 100644 .github/skills/performance-investigation/references/local-benchmarking.md create mode 100644 .github/skills/performance-investigation/references/mihubot-reference.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index a23e28a783c9bc..b2de17cdd4274a 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -14,7 +14,7 @@ When NOT running under CCA, skip the `code-review` skill if the user has stated Before making changes to a directory, search for `README.md` files in that directory and its parent directories up to the repository root. Read any you find — they contain conventions, patterns, and architectural context relevant to your work. -If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-benchmark` skill to validate the impact before completing. +If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-investigation` skill to validate the impact before completing. You MUST follow all code-formatting and naming conventions defined in [`.editorconfig`](/.editorconfig). diff --git a/.github/skills/api-proposal/SKILL.md b/.github/skills/api-proposal/SKILL.md index 8f1905b87d1428..6c9f2c494fa3d9 100644 --- a/.github/skills/api-proposal/SKILL.md +++ b/.github/skills/api-proposal/SKILL.md @@ -160,7 +160,7 @@ This: 2. **All errors and warnings must be fixed** before proceeding to the draft phase. -3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-benchmark** skill. +3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-investigation** skill. 4. Re-run tests after any review-driven changes to confirm nothing regressed. diff --git a/.github/skills/issue-triage/SKILL.md b/.github/skills/issue-triage/SKILL.md index a3c920021376e7..b104bfbcb48407 100644 --- a/.github/skills/issue-triage/SKILL.md +++ b/.github/skills/issue-triage/SKILL.md @@ -232,7 +232,7 @@ Based on the issue type classified in Step 1, follow the appropriate guide: |------|-------|---------------| | **Bug report** | [Bug triage](references/bug-triage.md) | Reproduction, regression validation, minimal repro derivation, root cause analysis | | **API proposal** | [API proposal triage](references/api-proposal-triage.md) | Merit evaluation, complexity estimation | -| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression with BenchmarkDotNet, git bisect to culprit commit | +| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression, assess severity and impact. For detailed investigation methodology (benchmarking, bisection), use the `performance-investigation` skill. | | **Question** | [Question triage](references/question-triage.md) | Research and answer the question, verify if low confidence | | **Enhancement** | [Enhancement triage](references/enhancement-triage.md) | Subcategory classification, feasibility analysis, trade-off assessment (includes performance improvement requests) | @@ -521,5 +521,5 @@ depending on the outcome: |-----------|-------|-----------------| | API proposal recommended as KEEP | **api-proposal** | Offer to draft a formal API proposal with working prototype | | Bug report with root cause identified | **jit-regression-test** | If the bug is JIT-related, offer to create a regression test | -| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks | +| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression (benchmarking, bisection, JIT diffs) | | Fix PR linked to the issue | **code-review** | Offer to review the fix PR for correctness and consistency | diff --git a/.github/skills/issue-triage/references/perf-regression-triage.md b/.github/skills/issue-triage/references/perf-regression-triage.md index 1382c09ee8e1e6..4fb2b37732548d 100644 --- a/.github/skills/issue-triage/references/perf-regression-triage.md +++ b/.github/skills/issue-triage/references/perf-regression-triage.md @@ -1,13 +1,14 @@ # Performance Regression Triage -Guidance for investigating and triaging performance regressions in -dotnet/runtime. Referenced from the main [SKILL.md](../SKILL.md) during Step 5. +Triage-specific guidance for assessing and recommending action on performance +regressions in dotnet/runtime. Referenced from the main +[SKILL.md](../SKILL.md) during Step 5. -> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd` -> on Windows or `./build.sh` on Linux/macOS. Other shell commands use -> Linux/macOS syntax (`cp -r`, forward-slash paths, `\` line continuation). -> On Windows, adapt accordingly: use `Copy-Item` or `xcopy`, backslash paths, -> and backtick (`` ` ``) line continuation. +For detailed investigation methodology (benchmarking, bisection, bot usage), +use the `performance-investigation` skill. This document covers only the +triage-specific assessment and recommendation criteria. + +## Sources of Performance Regressions A performance regression is a report that something got measurably slower (or uses more memory/allocations) compared to a previous .NET version or a recent @@ -21,307 +22,19 @@ commit. These reports come from several sources: - **Cross-release regressions** -- a regression observed between two stable releases (e.g., .NET 9 → .NET 10) without a specific commit range. -The goals of this triage are to: - -1. **Validate** that the regression is real and reproducible. -2. **Bisect** to the exact commit that introduced it. - -## Feasibility Check - -Before investing time in benchmarking and bisection, assess whether the current -environment can support the investigation. Full bisection requires building -dotnet/runtime at multiple commits (each build takes 5-40 minutes) and running -benchmarks, which is resource-intensive. - -| Factor | Feasible | Not feasible | -|--------|----------|--------------| -| **Disk space** | >50 GB free (for multiple builds) | <20 GB free | -| **Build time budget** | User is willing to wait 30-60+ min | Quick-turnaround triage expected | -| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) | -| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits | -| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware | - -### When full bisection is not feasible - -Use the **lightweight analysis** path instead: - -1. **Analyze `git log`** -- Review commits in the regression range - (`git log --oneline {good}..{bad}`) and identify changes to the affected - code path. Look for algorithmic changes, removed optimizations, added - validation, or new allocations. -2. **Check PR descriptions** -- For each suspicious commit, read the associated - PR description and review comments. Performance trade-offs are often - discussed there. -3. **Narrow by code path** -- Use `git log --oneline {good}..{bad} -- path/` - to filter commits to the affected library or component. -4. **Report the narrowed range** -- Include the list of candidate commits/PRs - in the triage report with an explanation of why each is suspicious. This - gives maintainers a head start even without a definitive bisect result. - -Note in the triage report that full bisection was not attempted and why -(e.g., "environment mismatch", "time constraint"), so maintainers know to -verify independently. - -## Identifying the Bisect Range - -Before benchmarking, determine the good and bad commits that bound the -regression. - -### Automated bot issues (`performanceautofiler`) - -Issues from `performanceautofiler[bot]` follow a standard format: - -- **Run Information** -- Baseline commit, Compare commit, diff link, OS, arch, - and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`). -- **Regression tables** -- Each table shows benchmark name, Baseline time, - Test time, and Test/Base ratio. A ratio >1.0 indicates a regression. -- **Repro commands** -- Typically: - ``` - git clone https://github.com/dotnet/performance.git - python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*' - ``` -- **Graphs** -- Time-series graphs showing when the regression appeared. - -Key fields to extract: - -- The **Baseline** and **Compare** commit SHAs -- these define the bisect range. -- The **benchmark filter** -- the `--filter` argument to reproduce the benchmark. -- The **Test/Base ratio** -- how severe the regression is (>1.5× is significant). - -### Customer reports - -When a customer reports a regression (e.g., "X is slower on .NET 10 than -.NET 9"), there are no pre-defined commit SHAs. You need to determine the -bisect range yourself -- see [Cross-release regressions](#cross-release-regressions) -below. - -Also identify the **scenario to benchmark** from the customer's report -- the -specific API call, code pattern, or workload that regressed. - -### Cross-release regressions - -When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect -on the `main` branch between the commits from which the release branches were -snapped. Release branches in dotnet/runtime are -[snapped from main](../../../../docs/project/branching-guide.md). - -Find the snap points with `git merge-base`: - -``` -git merge-base main release/9.0 # → good commit (last common ancestor) -git merge-base main release/10.0 # → bad commit -``` - -Use the resulting SHAs as the good/bad boundaries for bisection on `main`. -This avoids bisecting across release branches where cherry-picks and backports -make the history non-linear. - -## Phase 1: Create a Standalone Benchmark - -Before investing time in bisection, create a standalone BenchmarkDotNet -project that reproduces the regressing scenario. This project will be used -for both validation (Phase 1) and bisection (Phase 3). - -### Why a standalone project? - -The full [dotnet/performance](https://github.com/dotnet/performance) repo -has many dependencies and can be fragile across different runtime commits. -A standalone project with only the impacted benchmark is faster to build, -easier to iterate on, and more reliable during `git bisect`. - -### Creating the benchmark project - -**From an automated bot issue** -- copy the relevant benchmark class and its -dependencies from the `dotnet/performance` repo into a new standalone project: - -1. Clone `dotnet/performance` and locate the benchmark class referenced in the - issue's `--filter` argument. -2. Create a new console project and add a reference to - `BenchmarkDotNet` (NuGet): - ``` - mkdir PerfRepro && cd PerfRepro - dotnet new console - dotnet add package BenchmarkDotNet - ``` -3. Copy the benchmark class (and any helper types it depends on) into the - project. Adjust namespaces and usings as needed. -4. Add a `Program.cs` entry point: - ```csharp - BenchmarkDotNet.Running.BenchmarkSwitcher - .FromAssembly(typeof(Program).Assembly) - .Run(args); - ``` - -**From a customer report** -- write a minimal BenchmarkDotNet benchmark that -exercises the reported code path: - -1. Create a new console project with `BenchmarkDotNet` as above. -2. Write a `[Benchmark]` method that calls the API or runs the workload the - customer identified as slow. -3. If the customer provided sample code, adapt it into a proper BDN benchmark - with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path. - -### Building dotnet/runtime and obtaining CoreRun - -Build dotnet/runtime at the commit you want to test: - -``` -build.cmd/sh clr+libs -c release -``` - -The key artifact is the **testhost** folder containing **CoreRun** at: - -``` -artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/ -``` - -BenchmarkDotNet uses CoreRun to load the locally-built runtime and libraries, -meaning you can benchmark private builds without installing them as SDKs. - -### Validating the regression - -Build dotnet/runtime at both the good and bad commits, saving each testhost -folder: - -``` -git checkout {bad-sha} -build.cmd/sh clr+libs -c release -cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad - -git checkout {good-sha} -build.cmd/sh clr+libs -c release -cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good -``` - -Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares -them side-by-side when given multiple `--coreRun` paths (the first is treated -as the baseline): - -``` -cd PerfRepro -dotnet run -c Release -f net{ver} -- \ - --filter '*' \ - --coreRun /tmp/corerun-good/.../CoreRun \ - /tmp/corerun-bad/.../CoreRun -``` - -To add a statistical significance column, append `--statisticalTest 5%`. -This performs a Mann–Whitney U test and marks results as `Faster`, `Slower`, -or `Same`. - -### Interpret the results - -| Outcome | Meaning | Next step | -|---------|---------|-----------| -| `Slower` with ratio >1.10 | Regression confirmed | Proceed to Phase 2 | -| `Slower` with ratio between 1.05 and 1.10 | Small regression -- likely real but needs confirmation | Re-run with more iterations (`--iterationCount 30`). If it persists, treat as confirmed and proceed to Phase 2. | -| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. | -| `Slower` but ratio <1.05 | Marginal -- may be noise | Re-run with more iterations (`--iterationCount 30`). If still marginal, note as inconclusive. | - -For a thorough comparison of saved BDN result files, use the -[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer) -tool: - -``` -dotnet run --project performance/src/tools/ResultsComparer \ - --base /path/to/baseline-results \ - --diff /path/to/compare-results \ - --threshold 5% -``` - -## Phase 2: Narrow the Commit Range - -If the bisect range spans many commits, narrow it before running a full -bisect: - -1. **Check `git log --oneline {good}..{bad}`** -- how many commits are in the - range? If it is more than ~200, try to narrow it first. -2. **Test midpoint commits manually** -- pick a commit in the middle of the - range, build, run the benchmark, and determine if it is good or bad. - This halves the range in one step. -3. **For cross-release regressions** -- use the `git merge-base` snap points - described above. If the range between two release snap points is still - large, test at intermediate release preview tags to narrow further. - -## Phase 3: Git Bisect - -Once you have a manageable commit range (good commit and bad commit), use -`git bisect` to binary-search for the culprit. - -### Bisect workflow - -At each step of the bisect, you need to: - -1. **Rebuild the affected component** -- use incremental builds where possible - (see [Incremental Rebuilds](#incremental-rebuilds-during-bisect) below). -2. **Run the standalone benchmark** with the freshly-built CoreRun: - ``` - cd PerfRepro - dotnet run -c Release -f net{ver} -- \ - --filter '*' \ - --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun - ``` -3. **Determine good or bad** -- compare the result against your threshold. - -**Exit codes for `git bisect run`:** -- `0` -- good (no regression at this commit) -- `1`–`124` -- bad (regression present) -- `125` -- skip (build failure or untestable commit) - -The standalone benchmark project must be **outside the dotnet/runtime tree** -since `git bisect` checks out different commits, which would overwrite -in-tree files. Place it in a stable location (e.g., `/tmp/bisect/`). - -### Run the bisect - -``` -cd /path/to/runtime -git bisect start {bad-sha} {good-sha} -git bisect run /path/to/bisect-script.sh -``` - -**Time estimate:** Each bisect step requires a rebuild + benchmark run. -For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect -roughly 50 minutes for the full bisect. - -### After bisect completes - -`git bisect` will output the first bad commit. Run `git bisect reset` to -return to the original branch. - -### Root cause analysis and triage report - -Include the following in the triage report: - -1. **The culprit commit or PR** -- link to the specific commit SHA and its - associated PR. Explain how the change relates to the regressing benchmark. -2. **Root cause analysis** -- describe *why* the change caused the regression - (e.g., an algorithm change, a removed optimization, additional validation - overhead). -3. **If the root cause spans multiple PRs** -- sometimes a regression results - from the combined effect of several changes and `git bisect` lands on a - commit that is only one contributing factor. In this case, report the - narrowest commit range that introduced the regression and list the PRs or - commits within that range that appear relevant to the affected code path. - -## Incremental Rebuilds During Bisect - -Full rebuilds are slow. Minimize per-step build time: +## Investigation -| Component changed | Fast rebuild command | -|-------------------|---------------------| -| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` | -| CoreLib | `build.cmd/sh clr.corelib -c Release` | -| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` | -| All libraries | `build.cmd/sh libs -c Release` | +The investigation goal is to validate that the regression is real and, if +possible, bisect to the exact commit that introduced it. -After an incremental library rebuild, the updated DLL is placed in the -testhost folder automatically. CoreRun will pick up the new version on the -next benchmark run. +Use the `performance-investigation` skill (Workflow 2: Regression Investigation) +for the full methodology, which includes: -**Caveat:** If bisect crosses a commit that changes the build infrastructure -(e.g., SDK version bump in `global.json`), the incremental build may fail. -Use exit code `125` (skip) to handle this gracefully. +- Feasibility checks for local vs. bot-based investigation +- Building dotnet/runtime at specific commits and using CoreRun +- Comparing good/bad commits with BenchmarkDotNet +- Git bisect workflow for finding the culprit commit +- Using @EgorBot and @MihuBot for remote validation ## Performance-Specific Assessment diff --git a/.github/skills/jit-regression-test/SKILL.md b/.github/skills/jit-regression-test/SKILL.md index e6cc8f82d58c50..2e03703531009e 100644 --- a/.github/skills/jit-regression-test/SKILL.md +++ b/.github/skills/jit-regression-test/SKILL.md @@ -7,7 +7,7 @@ description: > bug", "create a regression test for issue #NNNNN", converting issue repro to xunit test. DO NOT USE FOR: non-JIT tests (use standard test patterns), debugging JIT issues without a known repro, performance benchmarks (use - performance-benchmark skill). + performance-investigation skill). --- # JIT Regression Test Extraction diff --git a/.github/skills/performance-benchmark/SKILL.md b/.github/skills/performance-benchmark/SKILL.md deleted file mode 100644 index 9e1b8f0bbf6a31..00000000000000 --- a/.github/skills/performance-benchmark/SKILL.md +++ /dev/null @@ -1,191 +0,0 @@ ---- -name: performance-benchmark -description: Generate and run ad hoc performance benchmarks to validate code changes. Use this when asked to benchmark, profile, or validate the performance impact of a code change in dotnet/runtime. ---- - -# Ad Hoc Performance Benchmarking with @EgorBot - -When you need to validate the performance impact of a code change, follow this process to write a BenchmarkDotNet benchmark and trigger @EgorBot to run it. -The bot will notify you when results are ready, so don't wait for them. - -## Step 1: Write the Benchmark - -Create a BenchmarkDotNet benchmark that tests the specific operation being changed. Follow these guidelines: - -### Benchmark Structure - -```csharp -using BenchmarkDotNet.Attributes; -using BenchmarkDotNet.Running; - -BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); - -public class Bench -{ - // Add setup/cleanup if needed - [GlobalSetup] - public void Setup() - { - // Initialize test data - } - - [Benchmark] - public void MyOperation() - { - // Test the operation - } -} -``` - -### Best Practices - -For comprehensive guidance, see the [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md). - -Key principles: - -- **Move initialization to `[GlobalSetup]`**: Separate setup logic from the measured code to avoid measuring allocation/initialization overhead -- **Return values** from benchmark methods to prevent dead code elimination -- **Avoid loops**: BenchmarkDotNet invokes the benchmark many times automatically; adding manual loops distorts measurements -- **No side effects**: Benchmarks should be pure and produce consistent results -- **Focus on common cases**: Benchmark hot paths and typical usage, not edge cases or error paths -- **Use consistent input data**: Always use the same test data for reproducible comparisons -- **Avoid `[DisassemblyDiagnoser]`**: It causes crashes on Linux. Use `--envvars DOTNET_JitDisasm:MethodName` instead -- **Benchmark class requirements**: Must be `public`, not `sealed`, not `static`, and must be a `class` (not struct) - -### Example: String Operation Benchmark - -```csharp -using BenchmarkDotNet.Attributes; -using BenchmarkDotNet.Running; - -BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); - -[MemoryDiagnoser] -public class Bench -{ - private string _testString = default!; - - [Params(10, 100, 1000)] - public int Length { get; set; } - - [GlobalSetup] - public void Setup() - { - _testString = new string('a', Length); - } - - [Benchmark] - public int StringOperation() - { - return _testString.IndexOf('z'); - } -} -``` - -### Example: Collection Operation Benchmark - -```csharp -using System.Linq; -using BenchmarkDotNet.Attributes; -using BenchmarkDotNet.Running; - -BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); - -[MemoryDiagnoser] -public class Bench -{ - private int[] _array = default!; - private List _list = default!; - - [Params(100, 1000, 10000)] - public int Count { get; set; } - - [GlobalSetup] - public void Setup() - { - _array = Enumerable.Range(0, Count).ToArray(); - _list = _array.ToList(); - } - - [Benchmark] - public bool AnyArray() => _array.Any(); - - [Benchmark] - public bool AnyList() => _list.Any(); - - [Benchmark] - public int SumArray() => _array.Sum(); - - [Benchmark] - public int SumList() => _list.Sum(); -} -``` - -## Step 2: Mention @EgorBot in a comment/PR description - -Post a comment on the PR to trigger EgorBot with your benchmark. The general format is: - -> 📝 **AI-generated content disclosure:** When posting benchmark comments to GitHub under a user's credentials — i.e., the account is **not** a dedicated "copilot" or "bot" account/app — you **MUST** include a concise, visible note (e.g. a `> [!NOTE]` alert) indicating the content was AI/Copilot-generated. Skip this if the user explicitly asks you to omit it. - -@EgorBot [targets] [options] [BenchmarkDotNet args] - -```cs -// Your benchmark code here -``` -> **Note:** When using @EgorBot, follow these formatting rules: -> - The @EgorBot command must not be inside the code block. -> - Only the benchmark code should be inside the code block. -> - Do not place any additional text between the @EgorBot command line and the code block, as EgorBot will treat it as additional command arguments. - -### Target Flags - -- `-linux_amd` -- `-linux_intel` -- `-windows_amd` -- `-windows_intel` -- `-linux_arm64` -- `-osx_arm64` (baremetal, feel free to always include it) - -The most common combination is `-linux_amd -osx_arm64`. Do not include more than 4 targets. - -### Common Options - -Use `-profiler` when absolutely necessary along with `-linux_arm64` and/or `-linux_amd` to include `perf` profiling and disassembly in the results. - -### Example: Basic PR Benchmark - -To benchmark the current PR changes against the base branch: - -@EgorBot -linux_amd -osx_arm64 - -```cs -using BenchmarkDotNet.Attributes; -using BenchmarkDotNet.Running; - -BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); - -[MemoryDiagnoser] -public class Bench -{ - [Benchmark] - public int MyOperation() - { - // Your benchmark code - return 42; - } -} -``` - -## Important Notes - -- **Bot response time**: EgorBot uses polling and may take up to 30 seconds to respond -- **Supported repositories**: EgorBot monitors `dotnet/runtime` and `EgorBot/runtime-utils` -- **PR mode (default)**: When posting in a PR, EgorBot automatically compares the PR changes against the base branch -- **Results variability**: Results may vary between runs due to VM differences. Do not compare results across different architectures or cloud providers -- **Check the manual**: EgorBot replies include a link to the [manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage) for advanced options - -## Additional Resources - -- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md) - Essential reading for writing effective benchmarks -- [BenchmarkDotNet CLI Arguments](https://github.com/dotnet/BenchmarkDotNet/blob/master/docs/articles/guides/console-args.md) -- [EgorBot Manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage) diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md new file mode 100644 index 00000000000000..f4d4d5187be662 --- /dev/null +++ b/.github/skills/performance-investigation/SKILL.md @@ -0,0 +1,305 @@ +--- +name: performance-investigation +description: > + Investigate performance regressions and validate performance impact of code + changes in dotnet/runtime. Use this skill whenever asked to benchmark a PR, + investigate a performance regression, validate performance impact, run + benchmarks, generate JIT diffs, compare performance between commits, triage + a performance issue, or check whether a change improves or regresses + performance. Also use when asked about @EgorBot, @MihuBot, BenchmarkDotNet, + CoreRun, or dotnet/performance. Covers ad hoc PR benchmarking, deep + regression investigation with git bisect, and JIT diff analysis. +--- + +# Performance Investigation for dotnet/runtime + +Investigate performance regressions and validate the performance impact of code +changes. This skill covers three workflows, from quick PR validation to deep +regression root-causing. + +## When to Use This Skill + +- Asked to **benchmark** a PR or validate performance impact of a change +- Asked to **investigate a performance regression** (from an issue, bot report, + or customer report) +- Asked to **generate JIT diffs** or analyze codegen impact +- Asked to **compare performance** between commits, branches, or releases +- Asked to **triage a performance issue** (use alongside the `issue-triage` + skill for full triage) +- Given a `tenet-performance` or `tenet-performance-benchmarks` labeled issue +- Asked how to use `@EgorBot`, `@MihuBot`, BenchmarkDotNet, or CoreRun + +## Choose Your Workflow + +| Context | Workflow | What it does | +|---------|----------|-------------| +| PR is open and you want to measure its impact | [Workflow 1: PR Benchmark Validation](#workflow-1-pr-benchmark-validation) | Write a benchmark, invoke a bot, get results | +| A regression has been reported (issue or bot alert) | [Workflow 2: Regression Investigation](#workflow-2-regression-investigation) | Validate, bisect, root-cause | +| Change affects JIT codegen and you want to see diffs | [Workflow 3: JIT Diff Analysis](#workflow-3-jit-diff-analysis) | Generate JIT diffs via MihuBot | + +If you're triaging a performance regression issue, use Workflow 2 for the +investigation methodology, then return to the `issue-triage` skill for +triage-specific assessment and recommendation. + +--- + +## Workflow 1: PR Benchmark Validation + +Use this when a PR is open and you want to measure its performance impact. + +### Step 1: Write a BenchmarkDotNet Benchmark + +Create a benchmark that targets the specific operation being changed. See +[Writing Good Benchmarks](#writing-good-benchmarks) below for best practices. + +```csharp +using BenchmarkDotNet.Attributes; +using BenchmarkDotNet.Running; + +BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); + +[MemoryDiagnoser] +public class Bench +{ + [GlobalSetup] + public void Setup() + { + // Initialize test data + } + + [Benchmark] + public int MyOperation() + { + // Test the operation — return a value to prevent dead code elimination + return 42; + } +} +``` + +### Step 2: Choose a Bot and Invoke It + +**Use @EgorBot** when you need to run custom benchmark code (written in Step 1): + +Post a comment on the PR: + +``` +@EgorBot -amd -arm + +​```cs +// Your benchmark code here +​``` +``` + +EgorBot builds dotnet/runtime for the PR and base branch, runs the benchmark on +dedicated hardware, and posts BDN results back as a comment. + +See [EgorBot reference](references/egorbot-reference.md) for the full target +list, options, and examples. + +**Use @MihuBot** when you want to run existing benchmarks from the +[dotnet/performance](https://github.com/dotnet/performance) repo: + +``` +@MihuBot benchmark +``` + +This is useful when established benchmarks already cover the affected code path +and you don't need to write custom code. + +See [MihuBot reference](references/mihubot-reference.md) for the full command +syntax and options. + +### Step 3: Interpret Results + +EgorBot and MihuBot post results as PR comments. Look for: + +- **Ratio column** — values >1.0 indicate the PR is slower, <1.0 indicate it's + faster +- **Statistical significance** — if a `--statisticalTest` column is present, + look for `Faster`, `Slower`, or `Same` annotations +- **Memory/allocation changes** — check `Allocated` column if + `[MemoryDiagnoser]` is enabled + +> **AI-generated content disclosure:** When posting bot invocation comments +> under a user's credentials (not a bot account), include a visible note that +> the content was AI/Copilot-generated. + +--- + +## Workflow 2: Regression Investigation + +Use this when a performance regression has been reported — whether from +`performanceautofiler[bot]`, a customer report, or a cross-release comparison. + +### Overview + +The investigation follows three phases: + +1. **Validate** — Confirm the regression is real and reproducible +2. **Narrow** — Reduce the commit range to a manageable size +3. **Bisect** — Binary-search for the culprit commit + +For the full methodology, including feasibility checks, commit range +identification, and step-by-step bisection instructions, see the +[bisection guide](references/bisection-guide.md). + +For details on building the runtime, using CoreRun, and running BenchmarkDotNet +against private builds, see the +[local benchmarking guide](references/local-benchmarking.md). + +### Quick Path: Use Bots Instead of Local Bisection + +If the regression range is narrow (a few commits) or the environment doesn't +support local builds, you can use bots to validate specific commits without +building locally: + +``` +@EgorBot -amd -commits {good-sha},{bad-sha} +``` + +Or with @MihuBot for existing benchmarks: + +``` +@MihuBot benchmark https://github.com/dotnet/runtime/compare/{good-sha}...{bad-sha} +``` + +This won't perform a full bisect, but it can confirm whether the regression +exists and help narrow the range. + +### Reporting Results + +After completing the investigation, include in your report: + +- Whether the regression was **confirmed** or **not reproduced** +- The **culprit commit/PR** (if bisection was performed) +- **Root cause analysis** — why the change caused the regression +- **Severity assessment** — Test/Base ratio, number of affected benchmarks, + user impact + +--- + +## Workflow 3: JIT Diff Analysis + +Use this when a change affects JIT code generation and you want to see how it +changes the emitted machine code across the entire BCL. + +### Invoke MihuBot for JIT Diffs + +Post a comment on the PR: + +``` +@MihuBot +``` + +MihuBot generates comprehensive JIT diffs showing codegen regressions and +improvements. For ARM64-specific diffs or tier-0 analysis: + +``` +@MihuBot -arm -tier0 +``` + +See [MihuBot reference](references/mihubot-reference.md) for the full JIT diff +options, including `-nocctors`, `-includeKnownNoise`, and others. + +### Interpreting JIT Diffs + +MihuBot reports include: + +- **Code size changes** — total bytes added/removed across all methods +- **Per-method diffs** — individual methods that changed, with before/after + assembly +- **Regressions vs improvements** — clearly separated sections + +A small increase in code size across many methods may indicate a JIT change with +broad impact. A large increase in a few methods may indicate a targeted +optimization that trades code size for speed (or a regression). + +--- + +## Writing Good Benchmarks + +These guidelines apply whether you're writing a benchmark for EgorBot, for +local validation, or for contribution to the dotnet/performance repo. + +For comprehensive guidance, see the +[Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md). + +### Key Principles + +- **Move initialization to `[GlobalSetup]`** — separate setup from the measured + code to avoid measuring allocation/initialization overhead +- **Return values** from benchmark methods to prevent dead code elimination +- **Avoid manual loops** — BenchmarkDotNet invokes the benchmark many times + automatically; adding loops distorts measurements +- **No side effects** — benchmarks should be pure and produce consistent results +- **Focus on common cases** — benchmark hot paths and typical usage, not edge + cases +- **Use consistent input data** — always use the same test data for reproducible + comparisons + +### Benchmark Class Requirements + +- Must be `public` +- Must be a `class` (not struct) +- Must not be `sealed` +- Must not be `static` + +### Avoid `[DisassemblyDiagnoser]` + +It causes crashes on Linux. To get disassembly, use the `--envvars` option +instead: + +``` +@EgorBot -amd --envvars DOTNET_JitDisasm:MethodName +``` + +### Example: Comparing Two Implementations + +```csharp +using BenchmarkDotNet.Attributes; +using BenchmarkDotNet.Running; + +BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); + +[MemoryDiagnoser] +public class Bench +{ + private string _testString = default!; + + [Params(10, 100, 1000)] + public int Length { get; set; } + + [GlobalSetup] + public void Setup() + { + _testString = new string('a', Length); + } + + [Benchmark] + public int StringOperation() + { + return _testString.IndexOf('z'); + } +} +``` + +--- + +## External Resources + +- [dotnet/performance repository](https://github.com/dotnet/performance) — + central location for all .NET runtime benchmarks +- [Benchmarking workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md) +- [Profiling workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) +- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md) +- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html) +- [Performance guidelines](../../../../docs/project/performance-guidelines.md) — + project-wide performance policy + +## Related Skills + +| Condition | Skill | When to use | +|-----------|-------|-------------| +| Triaging a performance regression issue | **issue-triage** | For the full triage workflow (assessment, recommendation, labels) | +| Fix PR linked to the regression | **code-review** | To review the fix for correctness and consistency | +| JIT regression test needed | **jit-regression-test** | To extract a JIT regression test from the issue | diff --git a/.github/skills/performance-investigation/evals/evals.json b/.github/skills/performance-investigation/evals/evals.json new file mode 100644 index 00000000000000..bfe6a78f99d9e6 --- /dev/null +++ b/.github/skills/performance-investigation/evals/evals.json @@ -0,0 +1,206 @@ +{ + "skill_name": "performance-investigation", + "evals": [ + { + "id": 1, + "name": "pr-benchmark-request", + "prompt": "Can you benchmark PR https://github.com/dotnet/runtime/pull/121223 to check for performance impact?", + "expected_output": "Should follow Workflow 1 (PR Benchmark Validation). Should write a BenchmarkDotNet benchmark targeting the changed code and invoke @EgorBot to run it on the PR.", + "assertions": [ + { + "name": "uses-workflow-1", + "description": "Follows the PR benchmark validation workflow", + "type": "contains_any", + "check": ["Workflow 1", "PR Benchmark", "benchmark"] + }, + { + "name": "writes-benchmark", + "description": "Creates or references a BenchmarkDotNet benchmark", + "type": "contains_any", + "check": ["[Benchmark]", "BenchmarkDotNet", "BenchmarkSwitcher"] + }, + { + "name": "invokes-bot", + "description": "Invokes EgorBot or MihuBot to run the benchmark", + "type": "contains_any", + "check": ["@EgorBot", "@MihuBot"] + } + ], + "files": [] + }, + { + "id": 2, + "name": "perf-regression-autobot", + "prompt": "Investigate this performance regression: https://github.com/dotnet/runtime/issues/114625", + "expected_output": "Should follow Workflow 2 (Regression Investigation). Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and attempt validation or bisection.", + "assertions": [ + { + "name": "uses-workflow-2", + "description": "Follows the regression investigation workflow", + "type": "contains_any", + "check": ["Workflow 2", "Regression", "regression", "investigate"] + }, + { + "name": "identifies-commits", + "description": "Identifies or references baseline/compare commits from the bot report", + "type": "contains_any", + "check": ["commit", "SHA", "baseline", "compare", "bisect"] + }, + { + "name": "assesses-severity", + "description": "Assesses the regression severity using the ratio", + "type": "contains_any", + "check": ["ratio", "severity", "Test/Base", "slower", "regression"] + } + ], + "files": [] + }, + { + "id": 3, + "name": "jit-diff-request", + "prompt": "Can you generate JIT diffs for my PR that changes the JIT compiler?", + "expected_output": "Should follow Workflow 3 (JIT Diff Analysis). Should invoke @MihuBot to generate JIT diffs and explain how to interpret the results.", + "assertions": [ + { + "name": "uses-mihubot", + "description": "Invokes MihuBot for JIT diffs", + "type": "contains", + "check": "@MihuBot" + }, + { + "name": "mentions-jit-diffs", + "description": "References JIT diff generation", + "type": "contains_any", + "check": ["JIT diff", "jit-diff", "codegen", "code size"] + } + ], + "files": [] + }, + { + "id": 4, + "name": "benchmark-with-corerun", + "prompt": "How do I benchmark my local runtime changes against the main branch?", + "expected_output": "Should explain how to build dotnet/runtime, obtain CoreRun from the testhost folder, and run BenchmarkDotNet with the --coreRun argument to compare private builds.", + "assertions": [ + { + "name": "mentions-corerun", + "description": "Explains CoreRun as the mechanism for benchmarking private builds", + "type": "contains_any", + "check": ["CoreRun", "coreRun", "--coreRun", "testhost"] + }, + { + "name": "mentions-build", + "description": "References building the runtime", + "type": "contains_any", + "check": ["clr+libs", "build.cmd", "build.sh"] + }, + { + "name": "mentions-bdn", + "description": "References BenchmarkDotNet for running the benchmarks", + "type": "contains_any", + "check": ["BenchmarkDotNet", "BDN", "[Benchmark]"] + } + ], + "files": [] + }, + { + "id": 5, + "name": "existing-benchmarks-request", + "prompt": "Run the existing Regex benchmarks from dotnet/performance against PR https://github.com/dotnet/runtime/pull/124628", + "expected_output": "Should use @MihuBot benchmark command to run existing benchmarks from the dotnet/performance repo rather than writing custom benchmark code.", + "assertions": [ + { + "name": "uses-mihubot-benchmark", + "description": "Uses MihuBot's benchmark command for existing benchmarks", + "type": "contains", + "check": "@MihuBot benchmark" + }, + { + "name": "references-perf-repo", + "description": "References the dotnet/performance repository", + "type": "contains_any", + "check": ["dotnet/performance", "performance repo"] + } + ], + "files": [] + }, + { + "id": 6, + "name": "cross-release-regression", + "prompt": "A user reports that string.IndexOf is 2x slower in .NET 10 compared to .NET 9. How should we investigate?", + "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression. Should reference both local investigation and bot-based approaches.", + "assertions": [ + { + "name": "mentions-merge-base", + "description": "Explains using git merge-base for cross-release bisection", + "type": "contains_any", + "check": ["merge-base", "release branch", "snap point"] + }, + { + "name": "mentions-benchmark-creation", + "description": "Suggests creating a benchmark for the reported scenario", + "type": "contains_any", + "check": ["benchmark", "BenchmarkDotNet", "[Benchmark]", "standalone"] + }, + { + "name": "mentions-bisect", + "description": "References git bisect as part of the investigation", + "type": "contains_any", + "check": ["bisect", "git bisect", "binary search"] + } + ], + "files": [] + }, + { + "id": 7, + "name": "compare-specific-commits", + "prompt": "Compare the performance of commits abc1234 and def5678 for the System.Text.Json benchmarks", + "expected_output": "Should invoke @EgorBot with -commits to compare the two specific commits, or use @MihuBot benchmark with a compare URL.", + "assertions": [ + { + "name": "uses-commits-flag", + "description": "Uses the -commits option or compare URL to specify the commits", + "type": "contains_any", + "check": ["-commits", "compare", "abc1234", "def5678"] + }, + { + "name": "invokes-bot", + "description": "Invokes EgorBot or MihuBot to run the comparison", + "type": "contains_any", + "check": ["@EgorBot", "@MihuBot"] + } + ], + "files": [] + }, + { + "id": 8, + "name": "not-applicable-bug-issue", + "prompt": "Can you check the performance impact of https://github.com/dotnet/runtime/issues/46088", + "expected_output": "Should recognize this is a functional bug (System.Text.Json does not support constructors with byref parameters), not a performance issue. Should indicate that performance benchmarking is not applicable here.", + "assertions": [ + { + "name": "identifies-not-perf", + "description": "Recognizes this is not a performance issue", + "type": "contains_any", + "check": ["not a performance", "not performance-related", "no performance", "functional", "not applicable", "does not apply", "isn't a performance"] + } + ], + "files": [] + }, + { + "id": 9, + "name": "not-applicable-doc-pr", + "prompt": "Benchmark the changes in PR https://github.com/dotnet/runtime/pull/124592 to validate performance", + "expected_output": "Should recognize this is a documentation-only PR (adding XML docs to DI extension methods) and that benchmarking is not applicable or meaningful for documentation changes.", + "assertions": [ + { + "name": "identifies-doc-only", + "description": "Recognizes this is a documentation/non-functional change where benchmarking is not meaningful", + "type": "contains_any", + "check": ["documentation", "doc", "no functional", "no code change", "not applicable", "does not apply", "no performance impact", "not meaningful", "wouldn't affect", "won't affect", "no runtime"] + } + ], + "files": [] + } + ] +} diff --git a/.github/skills/performance-investigation/references/bisection-guide.md b/.github/skills/performance-investigation/references/bisection-guide.md new file mode 100644 index 00000000000000..5019c64b389b08 --- /dev/null +++ b/.github/skills/performance-investigation/references/bisection-guide.md @@ -0,0 +1,173 @@ +# Git Bisect for Performance Regressions + +This guide covers how to use `git bisect` to find the exact commit that +introduced a performance regression. It's a 3-phase process: validate the +regression, narrow the commit range, then bisect. + +## Feasibility Check + +Before investing time in bisection, assess whether the current environment can +support the investigation. Full bisection requires building dotnet/runtime at +multiple commits (each build takes 5–40 minutes) and running benchmarks, which +is resource-intensive. + +| Factor | Feasible | Not feasible | +|--------|----------|--------------| +| **Disk space** | >50 GB free (multiple builds) | <20 GB free | +| **Build time budget** | Willing to wait 30–60+ min | Quick-turnaround expected | +| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) | +| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits | +| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware | + +### When full bisection is not feasible + +Use a **lightweight analysis** path instead: + +1. **Analyze `git log`** — Review commits in the regression range + (`git log --oneline {good}..{bad}`) and identify changes to the affected code + path. Look for algorithmic changes, removed optimizations, added validation, + or new allocations. +2. **Check PR descriptions** — For each suspicious commit, read the associated + PR description and review comments. Performance trade-offs are often discussed + there. +3. **Narrow by code path** — Use `git log --oneline {good}..{bad} -- path/` to + filter commits to the affected library or component. +4. **Report the narrowed range** — Include the list of candidate commits/PRs with + an explanation of why each is suspicious. This gives maintainers a head start + even without a definitive bisect result. + +Note in the report that full bisection was not attempted and why. + +## Identifying the Bisect Range + +Determine the good and bad commits that bound the regression. + +### Automated bot issues (`performanceautofiler`) + +Issues from `performanceautofiler[bot]` follow a standard format: + +- **Run Information** — Baseline commit, Compare commit, diff link, OS, arch, + and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`). +- **Regression tables** — Each table shows benchmark name, Baseline time, Test + time, and Test/Base ratio. A ratio >1.0 indicates a regression. +- **Repro commands** — Typically: + ``` + git clone https://github.com/dotnet/performance.git + python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*' + ``` +- **Graphs** — Time-series graphs showing when the regression appeared. + +Key fields to extract: + +- The **Baseline** and **Compare** commit SHAs — these define the bisect range. +- The **benchmark filter** — the `--filter` argument to reproduce the benchmark. +- The **Test/Base ratio** — how severe the regression is (>1.5× is significant). + +### Customer reports + +When a customer reports a regression (e.g., "X is slower on .NET 10 than +.NET 9"), there are no pre-defined commit SHAs. Determine the bisect range using +the cross-release approach below. + +### Cross-release regressions + +When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect on +the `main` branch between the commits from which the release branches were +snapped. Release branches in dotnet/runtime are +[snapped from main](../../../../docs/project/branching-guide.md). + +Find the snap points with `git merge-base`: + +``` +git merge-base main release/9.0 # → good commit (last common ancestor) +git merge-base main release/10.0 # → bad commit +``` + +Use the resulting SHAs as the good/bad boundaries for bisection on `main`. This +avoids bisecting across release branches where cherry-picks and backports make +the history non-linear. + +## Phase 1: Validate the Regression + +Before bisecting, confirm the regression is reproducible. Create a standalone +BenchmarkDotNet project (see +[local benchmarking guide](local-benchmarking.md#creating-a-standalone-benchmark-project)), +build the runtime at the good and bad commits, and compare results. + +If the regression is not reproducible locally, check for environment differences +(OS, arch, CPU model) and note this in your report. Consider using +[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead. + +## Phase 2: Narrow the Commit Range + +If the bisect range spans many commits, narrow it before running a full bisect: + +1. **Check `git log --oneline {good}..{bad}`** — how many commits are in the + range? If more than ~200, narrow first. +2. **Test midpoint commits manually** — pick a commit in the middle of the range, + build, run the benchmark, and determine if it is good or bad. This halves the + range in one step. +3. **For cross-release regressions** — use the `git merge-base` snap points. If + the range between two release snap points is still large, test at intermediate + release preview tags to narrow further. + +## Phase 3: Git Bisect + +Once you have a manageable commit range, use `git bisect` to binary-search for +the culprit. + +### Bisect workflow + +At each step: + +1. **Rebuild the affected component** — use incremental builds where possible + (see [incremental rebuilds](local-benchmarking.md#incremental-rebuilds)). +2. **Run the standalone benchmark** with the freshly-built CoreRun: + ``` + cd PerfRepro + dotnet run -c Release -f net{ver} -- \ + --filter '*' \ + --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun + ``` +3. **Determine good or bad** — compare the result against your threshold. + +**Exit codes for `git bisect run`:** +- `0` — good (no regression at this commit) +- `1`–`124` — bad (regression present) +- `125` — skip (build failure or untestable commit) + +The standalone benchmark project must be **outside the dotnet/runtime tree** +since `git bisect` checks out different commits which would overwrite in-tree +files. Place it in a stable location (e.g., `/tmp/bisect/`). + +### Run the bisect + +``` +cd /path/to/runtime +git bisect start {bad-sha} {good-sha} +git bisect run /path/to/bisect-script.sh +``` + +**Time estimate:** Each bisect step requires a rebuild + benchmark run. +For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect +roughly 50 minutes for the full bisect. + +### After bisect completes + +`git bisect` outputs the first bad commit. Run `git bisect reset` to return to +the original branch. + +## Root Cause Analysis + +Include the following in your report: + +1. **The culprit commit or PR** — link to the specific commit SHA and its + associated PR. Explain how the change relates to the regressing benchmark. +2. **Root cause analysis** — describe *why* the change caused the regression + (e.g., an algorithm change, a removed optimization, additional validation + overhead). +3. **If the root cause spans multiple PRs** — sometimes a regression results + from the combined effect of several changes and `git bisect` lands on a + commit that is only one contributing factor. In this case, report the + narrowest commit range and list the PRs within that range that appear + relevant to the affected code path. diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md new file mode 100644 index 00000000000000..f12b37f45cf3a6 --- /dev/null +++ b/.github/skills/performance-investigation/references/egorbot-reference.md @@ -0,0 +1,73 @@ +# EgorBot Reference + +[EgorBot](https://github.com/EgorBo/EgorBot) is a benchmark-as-a-service bot for +[dotnet/runtime](https://github.com/dotnet/runtime). It runs BenchmarkDotNet +microbenchmarks on dedicated hardware and posts results back as GitHub comments. +Its primary use case is comparing performance before and after a change — either +across a PR or between specific commits. + +For the full and up-to-date command reference (targets, options, defaults), +see the [EgorBot manual](https://github.com/EgorBo/EgorBot). + +## Command Format + +Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a +fenced code block (`` ```cs ``) in the same comment. + +``` +@EgorBot [targets...] [options...] [BDN arguments...] +``` + +> **Formatting rules:** +> - The `@EgorBot` command must be **outside** the code block. +> - Only benchmark source code belongs inside the code block. +> - Do not place text between the `@EgorBot` line and the code block — EgorBot +> treats it as additional command arguments. + +## Examples + +Compare a PR against its base branch on AMD and Apple Silicon: + +``` +@EgorBot -amd -arm +``` + +Compare two specific commits: + +``` +@EgorBot -amd -commits abc1234,def5678 +``` + +Compare a commit against its parent: + +``` +@EgorBot -arm -commits abc1234,abc1234~1 +``` + +Compare a range of commits for a specific benchmark filter: + +``` +@EgorBot -arm -commits abc1234...def5678 --filter "*MyBench*" +``` + +## Practical Notes + +- **Default target:** If no target is specified, runs on Apple Silicon via Helix. +- **PR mode:** When posting in a PR without `-commits`, EgorBot automatically + compares the PR branch against the base branch. +- **No code block:** If no code block is provided, EgorBot runs benchmarks from + the [dotnet/performance](https://github.com/dotnet/performance) repo instead. +- **Response time:** EgorBot uses polling and may take up to 30 seconds to + acknowledge the request. +- **Supported repositories:** `dotnet/runtime` and `EgorBot/runtime-utils`. +- **Result variability:** Results can vary between runs due to VM differences. + Do not compare results across different architectures or cloud providers. +- **AI-generated content disclosure:** When posting EgorBot comments under a + user's credentials (not a bot account), include a visible note that the + content was AI/Copilot-generated. + +## Links + +- [EgorBot manual](https://github.com/EgorBo/EgorBot) — full target list, + options, and usage documentation +- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html) diff --git a/.github/skills/performance-investigation/references/local-benchmarking.md b/.github/skills/performance-investigation/references/local-benchmarking.md new file mode 100644 index 00000000000000..be9d32fdbfc2cf --- /dev/null +++ b/.github/skills/performance-investigation/references/local-benchmarking.md @@ -0,0 +1,140 @@ +# Local Benchmarking with Private Runtime Builds + +This guide covers how to benchmark dotnet/runtime changes locally using +BenchmarkDotNet and privately-built runtime binaries (CoreRun). This approach +lets you measure performance without installing a custom SDK — BenchmarkDotNet +loads the locally-built runtime directly. + +> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd` +> on Windows or `./build.sh` on Linux/macOS. Other shell commands use +> Linux/macOS syntax. On Windows, adapt accordingly (use `Copy-Item` or `xcopy`, +> backslash paths, backtick line continuation). + +## Building dotnet/runtime and Obtaining CoreRun + +Build the runtime at the commit you want to test: + +``` +build.cmd/sh clr+libs -c release +``` + +The key artifact is the **testhost** folder containing **CoreRun** at: + +``` +artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/ +``` + +CoreRun is a lightweight host that loads the locally-built runtime and +libraries. BenchmarkDotNet uses it via the `--coreRun` argument to benchmark +private builds without installing them as SDKs. + +## Creating a Standalone Benchmark Project + +For regression validation and bisection, use a standalone BenchmarkDotNet +project rather than the full [dotnet/performance](https://github.com/dotnet/performance) +repo. Standalone projects are faster to build, easier to iterate on, and more +reliable across different runtime commits. + +### From an automated bot issue + +Copy the relevant benchmark class from the `dotnet/performance` repo: + +1. Clone `dotnet/performance` and locate the benchmark class referenced in the + issue's `--filter` argument. +2. Create a new console project: + ``` + mkdir PerfRepro && cd PerfRepro + dotnet new console + dotnet add package BenchmarkDotNet + ``` +3. Copy the benchmark class (and any helper types) into the project. Adjust + namespaces and usings as needed. +4. Add a `Program.cs` entry point: + ```csharp + BenchmarkDotNet.Running.BenchmarkSwitcher + .FromAssembly(typeof(Program).Assembly) + .Run(args); + ``` + +### From a customer report + +Write a minimal BenchmarkDotNet benchmark that exercises the reported code path: + +1. Create a new console project with `BenchmarkDotNet` as above. +2. Write a `[Benchmark]` method that calls the API or runs the workload the + customer identified as slow. +3. If the customer provided sample code, adapt it into a proper BDN benchmark + with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path. + +## Comparing Good and Bad Commits + +Build dotnet/runtime at both the good and bad commits, saving each testhost +folder: + +``` +git checkout {bad-sha} +build.cmd/sh clr+libs -c release +cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad + +git checkout {good-sha} +build.cmd/sh clr+libs -c release +cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good +``` + +Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares them +side-by-side when given multiple `--coreRun` paths (the first is treated as the +baseline): + +``` +cd PerfRepro +dotnet run -c Release -f net{ver} -- \ + --filter '*' \ + --coreRun /tmp/corerun-good/.../CoreRun \ + /tmp/corerun-bad/.../CoreRun +``` + +To add a statistical significance column, append `--statisticalTest 5%`. This +performs a Mann–Whitney U test and marks results as `Faster`, `Slower`, or +`Same`. + +## Interpreting Results + +| Outcome | Meaning | Next step | +|---------|---------|-----------| +| `Slower` with ratio >1.10 | Regression confirmed | Proceed to bisection | +| `Slower` with ratio 1.05–1.10 | Small regression — likely real but needs confirmation | Re-run with `--iterationCount 30`. If it persists, treat as confirmed. | +| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. | +| `Slower` but ratio <1.05 | Marginal — may be noise | Re-run with `--iterationCount 30`. If still marginal, note as inconclusive. | + +## Using ResultsComparer + +For a thorough comparison of saved BDN result files, use the +[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer) +tool: + +``` +dotnet run --project performance/src/tools/ResultsComparer \ + --base /path/to/baseline-results \ + --diff /path/to/compare-results \ + --threshold 5% +``` + +## Incremental Rebuilds + +Full rebuilds are slow. Minimize per-step build time by rebuilding only the +affected component: + +| Component changed | Fast rebuild command | +|-------------------|---------------------| +| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` | +| CoreLib | `build.cmd/sh clr.corelib -c Release` | +| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` | +| All libraries | `build.cmd/sh libs -c Release` | + +After an incremental library rebuild, the updated DLL is placed in the testhost +folder automatically. CoreRun picks up the new version on the next benchmark +run. + +**Caveat:** If a rebuild crosses a commit that changes the build infrastructure +(e.g., SDK version bump in `global.json`), the incremental build may fail. In a +`git bisect` context, use exit code `125` (skip) to handle this gracefully. diff --git a/.github/skills/performance-investigation/references/mihubot-reference.md b/.github/skills/performance-investigation/references/mihubot-reference.md new file mode 100644 index 00000000000000..458d4fd06bbb4c --- /dev/null +++ b/.github/skills/performance-investigation/references/mihubot-reference.md @@ -0,0 +1,66 @@ +# MihuBot Reference + +[MihuBot](https://github.com/MihuBot/runtime-utils) provides several +performance-related services for dotnet/runtime: JIT diff generation, benchmark +execution from the [dotnet/performance](https://github.com/dotnet/performance) +repo, library fuzzing, and regex source generator diffs. It also has a +[web interface](https://mihubot.xyz/runtime-utils) for submitting jobs. + +For full and up-to-date option details, see the +[MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) repository. + +## JIT Diff Generation + +Generate JIT diffs between a PR and its base branch to see how a change affects +the generated machine code across the BCL. + +``` +@MihuBot +@MihuBot -arm -tier0 +``` + +## Running Benchmarks from dotnet/performance + +Run existing benchmarks from the +[dotnet/performance](https://github.com/dotnet/performance) repository without +writing custom benchmark code. + +``` +@MihuBot benchmark Regex +@MihuBot benchmark GetUnicodeCategory https://github.com/dotnet/runtime/compare/4bb0bcd...c74440f +``` + +## Library Fuzzer + +Run fuzz testing on a library: + +``` +@MihuBot fuzz SearchValues +@MihuBot fuzz SearchValues -dependsOn #107206 +``` + +## Regex Source Generator Diffs + +Generate diffs for regex source generator output and JIT diffs for the +generated code: + +``` +@MihuBot regexdiff +@MihuBot regexdiff -arm +``` + +## Common Options + +Most MihuBot job types support options like `-arm`, `-intel`, `-fast`, +`-dependsOn `, and `-combineWith `. For example: + +``` +@MihuBot -arm -hetzner -combineWith #1000,#1001 +``` + +## Links + +- [MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) — full + documentation and option reference +- [Web interface](https://mihubot.xyz/runtime-utils) for submitting jobs + directly From d1c70b9894eea96ca746bda45abc501de10b167a Mon Sep 17 00:00:00 2001 From: Eirik Tsarpalis Date: Mon, 23 Mar 2026 18:40:20 +0200 Subject: [PATCH 2/5] Remove AI-generated content disclosure from skill (already in copilot-instructions.md) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/performance-investigation/SKILL.md | 4 ---- .../performance-investigation/references/egorbot-reference.md | 3 --- 2 files changed, 7 deletions(-) diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md index f4d4d5187be662..96e1178db4da42 100644 --- a/.github/skills/performance-investigation/SKILL.md +++ b/.github/skills/performance-investigation/SKILL.md @@ -120,10 +120,6 @@ EgorBot and MihuBot post results as PR comments. Look for: - **Memory/allocation changes** — check `Allocated` column if `[MemoryDiagnoser]` is enabled -> **AI-generated content disclosure:** When posting bot invocation comments -> under a user's credentials (not a bot account), include a visible note that -> the content was AI/Copilot-generated. - --- ## Workflow 2: Regression Investigation diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md index f12b37f45cf3a6..abc1085ce56bc8 100644 --- a/.github/skills/performance-investigation/references/egorbot-reference.md +++ b/.github/skills/performance-investigation/references/egorbot-reference.md @@ -62,9 +62,6 @@ Compare a range of commits for a specific benchmark filter: - **Supported repositories:** `dotnet/runtime` and `EgorBot/runtime-utils`. - **Result variability:** Results can vary between runs due to VM differences. Do not compare results across different architectures or cloud providers. -- **AI-generated content disclosure:** When posting EgorBot comments under a - user's credentials (not a bot account), include a visible note that the - content was AI/Copilot-generated. ## Links From 262c0f9b7ff9cc3f9339ac1e5b59063f53987021 Mon Sep 17 00:00:00 2001 From: Eirik Tsarpalis Date: Mon, 23 Mar 2026 19:05:30 +0200 Subject: [PATCH 3/5] Fix code fence formatting in SKILL.md and egorbot-reference.md Use quadruple-backtick outer fence for nested code blocks and simplify inline backtick formatting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/performance-investigation/SKILL.md | 6 +++--- .../references/egorbot-reference.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md index 96e1178db4da42..24cc03c507040b 100644 --- a/.github/skills/performance-investigation/SKILL.md +++ b/.github/skills/performance-investigation/SKILL.md @@ -82,13 +82,13 @@ public class Bench Post a comment on the PR: -``` +```` @EgorBot -amd -arm -​```cs +```cs // Your benchmark code here -​``` ``` +```` EgorBot builds dotnet/runtime for the PR and base branch, runs the benchmark on dedicated hardware, and posts BDN results back as a comment. diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md index abc1085ce56bc8..ae45ecc2cc062c 100644 --- a/.github/skills/performance-investigation/references/egorbot-reference.md +++ b/.github/skills/performance-investigation/references/egorbot-reference.md @@ -12,7 +12,7 @@ see the [EgorBot manual](https://github.com/EgorBo/EgorBot). ## Command Format Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a -fenced code block (`` ```cs ``) in the same comment. +fenced ` ```cs ` code block in the same comment. ``` @EgorBot [targets...] [options...] [BDN arguments...] From 3dbdf2e1435d3047e7e57453a83863e8ee75d88c Mon Sep 17 00:00:00 2001 From: Eirik Tsarpalis Date: Mon, 23 Mar 2026 19:20:30 +0200 Subject: [PATCH 4/5] Address review feedback: formatting, path clarifications, CoreLib caveat - Fix inline code formatting for fenced block marker in egorbot-reference.md - Remove specific MihuBot option names from SKILL.md (not in reference doc) - Clarify testhost vs coreclr CoreRun path distinction in local-benchmarking.md - Expand bisection-guide.md CoreRun path to full testhost path - Add CoreLib libs.pretest caveat for incremental rebuilds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../skills/performance-investigation/SKILL.md | 4 ++-- .../references/bisection-guide.md | 7 +++++-- .../references/egorbot-reference.md | 3 ++- .../references/local-benchmarking.md | 16 ++++++++++++---- 4 files changed, 21 insertions(+), 9 deletions(-) diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md index 24cc03c507040b..17ffeb2dcc1f0d 100644 --- a/.github/skills/performance-investigation/SKILL.md +++ b/.github/skills/performance-investigation/SKILL.md @@ -194,8 +194,8 @@ improvements. For ARM64-specific diffs or tier-0 analysis: @MihuBot -arm -tier0 ``` -See [MihuBot reference](references/mihubot-reference.md) for the full JIT diff -options, including `-nocctors`, `-includeKnownNoise`, and others. +See [MihuBot reference](references/mihubot-reference.md) for the full set of JIT +diff options and usage guidance. ### Interpreting JIT Diffs diff --git a/.github/skills/performance-investigation/references/bisection-guide.md b/.github/skills/performance-investigation/references/bisection-guide.md index 5019c64b389b08..152858f18dd524 100644 --- a/.github/skills/performance-investigation/references/bisection-guide.md +++ b/.github/skills/performance-investigation/references/bisection-guide.md @@ -122,12 +122,15 @@ At each step: 1. **Rebuild the affected component** — use incremental builds where possible (see [incremental rebuilds](local-benchmarking.md#incremental-rebuilds)). -2. **Run the standalone benchmark** with the freshly-built CoreRun: +2. **Run the standalone benchmark** with the freshly-built CoreRun from the + testhost folder (see + [local benchmarking guide](local-benchmarking.md#building-dotnet-runtime-and-obtaining-corerun) + for the exact path): ``` cd PerfRepro dotnet run -c Release -f net{ver} -- \ --filter '*' \ - --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun + --coreRun {runtime}/artifacts/bin/testhost/net{ver}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{ver}/CoreRun ``` 3. **Determine good or bad** — compare the result against your threshold. diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md index ae45ecc2cc062c..39ed2e8ab81774 100644 --- a/.github/skills/performance-investigation/references/egorbot-reference.md +++ b/.github/skills/performance-investigation/references/egorbot-reference.md @@ -12,7 +12,8 @@ see the [EgorBot manual](https://github.com/EgorBo/EgorBot). ## Command Format Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a -fenced ` ```cs ` code block in the same comment. +fenced C# code block (a code fence that begins with three backticks followed +by `cs`) in the same comment. ``` @EgorBot [targets...] [options...] [BDN arguments...] diff --git a/.github/skills/performance-investigation/references/local-benchmarking.md b/.github/skills/performance-investigation/references/local-benchmarking.md index be9d32fdbfc2cf..d4b2ff38329aea 100644 --- a/.github/skills/performance-investigation/references/local-benchmarking.md +++ b/.github/skills/performance-investigation/references/local-benchmarking.md @@ -24,6 +24,10 @@ The key artifact is the **testhost** folder containing **CoreRun** at: artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/ ``` +> **Note:** This is different from the bare `corerun` binary under +> `artifacts/bin/coreclr/`. BenchmarkDotNet needs the testhost layout because +> it contains both CoreRun and the complete framework assemblies side-by-side. + CoreRun is a lightweight host that loads the locally-built runtime and libraries. BenchmarkDotNet uses it via the `--coreRun` argument to benchmark private builds without installing them as SDKs. @@ -127,13 +131,17 @@ affected component: | Component changed | Fast rebuild command | |-------------------|---------------------| | A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` | -| CoreLib | `build.cmd/sh clr.corelib -c Release` | +| CoreLib | `build.cmd/sh clr.corelib -c Release` followed by `build.cmd/sh libs.pretest -c Release` | | CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` | | All libraries | `build.cmd/sh libs -c Release` | -After an incremental library rebuild, the updated DLL is placed in the testhost -folder automatically. CoreRun picks up the new version on the next benchmark -run. +After an incremental library rebuild (other than System.Private.CoreLib), the +updated DLL is placed in the testhost folder automatically. CoreRun picks up +the new version on the next benchmark run. + +For System.Private.CoreLib, you must run `build.cmd/sh libs.pretest -c Release` +after rebuilding to copy the updated CoreLib into the testhost layout; +otherwise benchmarks may silently run against the older CoreLib. **Caveat:** If a rebuild crosses a commit that changes the build infrastructure (e.g., SDK version bump in `global.json`), the incremental build may fail. In a From 5e251d2631bd16c8a85c3c6d78524d05e127d636 Mon Sep 17 00:00:00 2001 From: Eirik Tsarpalis Date: Tue, 24 Mar 2026 12:44:53 +0200 Subject: [PATCH 5/5] Refocus skill to local-only investigation, restore performance-benchmark Per PR feedback: keep performance-benchmark as a separate skill for EgorBot/PR benchmarking. Refocus performance-investigation to local-only workflows: building CoreRun, comparing commits with BDN, git bisect. - Restore performance-benchmark/SKILL.md (unchanged from main) - Revert cross-reference changes to copilot-instructions.md, api-proposal, jit-regression-test (these should reference performance-benchmark) - Remove egorbot-reference.md and mihubot-reference.md (bot territory) - Rewrite SKILL.md to remove Workflow 1 (PR benchmark) and Workflow 3 (JIT diffs), keeping only local investigation - Update issue-triage Related Skills to list both skills - Update evals to match local-only scope (6 evals) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/copilot-instructions.md | 2 +- .github/skills/api-proposal/SKILL.md | 2 +- .github/skills/issue-triage/SKILL.md | 3 +- .github/skills/jit-regression-test/SKILL.md | 2 +- .github/skills/performance-benchmark/SKILL.md | 191 ++++++++++++++++ .../skills/performance-investigation/SKILL.md | 208 +++--------------- .../evals/evals.json | 107 ++------- .../references/egorbot-reference.md | 71 ------ .../references/mihubot-reference.md | 66 ------ 9 files changed, 240 insertions(+), 412 deletions(-) create mode 100644 .github/skills/performance-benchmark/SKILL.md delete mode 100644 .github/skills/performance-investigation/references/egorbot-reference.md delete mode 100644 .github/skills/performance-investigation/references/mihubot-reference.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index b2de17cdd4274a..a23e28a783c9bc 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -14,7 +14,7 @@ When NOT running under CCA, skip the `code-review` skill if the user has stated Before making changes to a directory, search for `README.md` files in that directory and its parent directories up to the repository root. Read any you find — they contain conventions, patterns, and architectural context relevant to your work. -If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-investigation` skill to validate the impact before completing. +If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-benchmark` skill to validate the impact before completing. You MUST follow all code-formatting and naming conventions defined in [`.editorconfig`](/.editorconfig). diff --git a/.github/skills/api-proposal/SKILL.md b/.github/skills/api-proposal/SKILL.md index 6c9f2c494fa3d9..8f1905b87d1428 100644 --- a/.github/skills/api-proposal/SKILL.md +++ b/.github/skills/api-proposal/SKILL.md @@ -160,7 +160,7 @@ This: 2. **All errors and warnings must be fixed** before proceeding to the draft phase. -3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-investigation** skill. +3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-benchmark** skill. 4. Re-run tests after any review-driven changes to confirm nothing regressed. diff --git a/.github/skills/issue-triage/SKILL.md b/.github/skills/issue-triage/SKILL.md index b104bfbcb48407..bdc16692ad98ff 100644 --- a/.github/skills/issue-triage/SKILL.md +++ b/.github/skills/issue-triage/SKILL.md @@ -521,5 +521,6 @@ depending on the outcome: |-----------|-------|-----------------| | API proposal recommended as KEEP | **api-proposal** | Offer to draft a formal API proposal with working prototype | | Bug report with root cause identified | **jit-regression-test** | If the bug is JIT-related, offer to create a regression test | -| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression (benchmarking, bisection, JIT diffs) | +| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression locally (CoreRun builds, bisection) | +| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks via @EgorBot | | Fix PR linked to the issue | **code-review** | Offer to review the fix PR for correctness and consistency | diff --git a/.github/skills/jit-regression-test/SKILL.md b/.github/skills/jit-regression-test/SKILL.md index 2e03703531009e..e6cc8f82d58c50 100644 --- a/.github/skills/jit-regression-test/SKILL.md +++ b/.github/skills/jit-regression-test/SKILL.md @@ -7,7 +7,7 @@ description: > bug", "create a regression test for issue #NNNNN", converting issue repro to xunit test. DO NOT USE FOR: non-JIT tests (use standard test patterns), debugging JIT issues without a known repro, performance benchmarks (use - performance-investigation skill). + performance-benchmark skill). --- # JIT Regression Test Extraction diff --git a/.github/skills/performance-benchmark/SKILL.md b/.github/skills/performance-benchmark/SKILL.md new file mode 100644 index 00000000000000..9e1b8f0bbf6a31 --- /dev/null +++ b/.github/skills/performance-benchmark/SKILL.md @@ -0,0 +1,191 @@ +--- +name: performance-benchmark +description: Generate and run ad hoc performance benchmarks to validate code changes. Use this when asked to benchmark, profile, or validate the performance impact of a code change in dotnet/runtime. +--- + +# Ad Hoc Performance Benchmarking with @EgorBot + +When you need to validate the performance impact of a code change, follow this process to write a BenchmarkDotNet benchmark and trigger @EgorBot to run it. +The bot will notify you when results are ready, so don't wait for them. + +## Step 1: Write the Benchmark + +Create a BenchmarkDotNet benchmark that tests the specific operation being changed. Follow these guidelines: + +### Benchmark Structure + +```csharp +using BenchmarkDotNet.Attributes; +using BenchmarkDotNet.Running; + +BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); + +public class Bench +{ + // Add setup/cleanup if needed + [GlobalSetup] + public void Setup() + { + // Initialize test data + } + + [Benchmark] + public void MyOperation() + { + // Test the operation + } +} +``` + +### Best Practices + +For comprehensive guidance, see the [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md). + +Key principles: + +- **Move initialization to `[GlobalSetup]`**: Separate setup logic from the measured code to avoid measuring allocation/initialization overhead +- **Return values** from benchmark methods to prevent dead code elimination +- **Avoid loops**: BenchmarkDotNet invokes the benchmark many times automatically; adding manual loops distorts measurements +- **No side effects**: Benchmarks should be pure and produce consistent results +- **Focus on common cases**: Benchmark hot paths and typical usage, not edge cases or error paths +- **Use consistent input data**: Always use the same test data for reproducible comparisons +- **Avoid `[DisassemblyDiagnoser]`**: It causes crashes on Linux. Use `--envvars DOTNET_JitDisasm:MethodName` instead +- **Benchmark class requirements**: Must be `public`, not `sealed`, not `static`, and must be a `class` (not struct) + +### Example: String Operation Benchmark + +```csharp +using BenchmarkDotNet.Attributes; +using BenchmarkDotNet.Running; + +BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); + +[MemoryDiagnoser] +public class Bench +{ + private string _testString = default!; + + [Params(10, 100, 1000)] + public int Length { get; set; } + + [GlobalSetup] + public void Setup() + { + _testString = new string('a', Length); + } + + [Benchmark] + public int StringOperation() + { + return _testString.IndexOf('z'); + } +} +``` + +### Example: Collection Operation Benchmark + +```csharp +using System.Linq; +using BenchmarkDotNet.Attributes; +using BenchmarkDotNet.Running; + +BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); + +[MemoryDiagnoser] +public class Bench +{ + private int[] _array = default!; + private List _list = default!; + + [Params(100, 1000, 10000)] + public int Count { get; set; } + + [GlobalSetup] + public void Setup() + { + _array = Enumerable.Range(0, Count).ToArray(); + _list = _array.ToList(); + } + + [Benchmark] + public bool AnyArray() => _array.Any(); + + [Benchmark] + public bool AnyList() => _list.Any(); + + [Benchmark] + public int SumArray() => _array.Sum(); + + [Benchmark] + public int SumList() => _list.Sum(); +} +``` + +## Step 2: Mention @EgorBot in a comment/PR description + +Post a comment on the PR to trigger EgorBot with your benchmark. The general format is: + +> 📝 **AI-generated content disclosure:** When posting benchmark comments to GitHub under a user's credentials — i.e., the account is **not** a dedicated "copilot" or "bot" account/app — you **MUST** include a concise, visible note (e.g. a `> [!NOTE]` alert) indicating the content was AI/Copilot-generated. Skip this if the user explicitly asks you to omit it. + +@EgorBot [targets] [options] [BenchmarkDotNet args] + +```cs +// Your benchmark code here +``` +> **Note:** When using @EgorBot, follow these formatting rules: +> - The @EgorBot command must not be inside the code block. +> - Only the benchmark code should be inside the code block. +> - Do not place any additional text between the @EgorBot command line and the code block, as EgorBot will treat it as additional command arguments. + +### Target Flags + +- `-linux_amd` +- `-linux_intel` +- `-windows_amd` +- `-windows_intel` +- `-linux_arm64` +- `-osx_arm64` (baremetal, feel free to always include it) + +The most common combination is `-linux_amd -osx_arm64`. Do not include more than 4 targets. + +### Common Options + +Use `-profiler` when absolutely necessary along with `-linux_arm64` and/or `-linux_amd` to include `perf` profiling and disassembly in the results. + +### Example: Basic PR Benchmark + +To benchmark the current PR changes against the base branch: + +@EgorBot -linux_amd -osx_arm64 + +```cs +using BenchmarkDotNet.Attributes; +using BenchmarkDotNet.Running; + +BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); + +[MemoryDiagnoser] +public class Bench +{ + [Benchmark] + public int MyOperation() + { + // Your benchmark code + return 42; + } +} +``` + +## Important Notes + +- **Bot response time**: EgorBot uses polling and may take up to 30 seconds to respond +- **Supported repositories**: EgorBot monitors `dotnet/runtime` and `EgorBot/runtime-utils` +- **PR mode (default)**: When posting in a PR, EgorBot automatically compares the PR changes against the base branch +- **Results variability**: Results may vary between runs due to VM differences. Do not compare results across different architectures or cloud providers +- **Check the manual**: EgorBot replies include a link to the [manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage) for advanced options + +## Additional Resources + +- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md) - Essential reading for writing effective benchmarks +- [BenchmarkDotNet CLI Arguments](https://github.com/dotnet/BenchmarkDotNet/blob/master/docs/articles/guides/console-args.md) +- [EgorBot Manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage) diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md index 17ffeb2dcc1f0d..416c2a2278c0b7 100644 --- a/.github/skills/performance-investigation/SKILL.md +++ b/.github/skills/performance-investigation/SKILL.md @@ -1,133 +1,40 @@ --- name: performance-investigation description: > - Investigate performance regressions and validate performance impact of code - changes in dotnet/runtime. Use this skill whenever asked to benchmark a PR, - investigate a performance regression, validate performance impact, run - benchmarks, generate JIT diffs, compare performance between commits, triage - a performance issue, or check whether a change improves or regresses - performance. Also use when asked about @EgorBot, @MihuBot, BenchmarkDotNet, - CoreRun, or dotnet/performance. Covers ad hoc PR benchmarking, deep - regression investigation with git bisect, and JIT diff analysis. + Investigate performance regressions locally in dotnet/runtime. Use this skill + when asked to investigate a performance regression, bisect to find a culprit + commit, validate a regression with local builds, compare performance between + commits using CoreRun, or benchmark private runtime builds with + BenchmarkDotNet. Also use when asked about CoreRun, testhost, or local + benchmarking against private builds. DO NOT USE FOR ad hoc PR benchmarking + with @EgorBot or @MihuBot (use the performance-benchmark skill instead). --- -# Performance Investigation for dotnet/runtime +# Local Performance Investigation for dotnet/runtime -Investigate performance regressions and validate the performance impact of code -changes. This skill covers three workflows, from quick PR validation to deep -regression root-causing. +Investigate performance regressions locally by building the runtime at specific +commits, running BenchmarkDotNet with CoreRun, and using git bisect to find +culprit commits. This skill covers the full local investigation workflow from +validation to root-causing. ## When to Use This Skill -- Asked to **benchmark** a PR or validate performance impact of a change - Asked to **investigate a performance regression** (from an issue, bot report, or customer report) -- Asked to **generate JIT diffs** or analyze codegen impact -- Asked to **compare performance** between commits, branches, or releases +- Asked to **compare performance** between commits, branches, or releases using + local builds +- Asked to **bisect** to find the commit that introduced a regression +- Asked to **benchmark private runtime builds** using CoreRun - Asked to **triage a performance issue** (use alongside the `issue-triage` skill for full triage) - Given a `tenet-performance` or `tenet-performance-benchmarks` labeled issue -- Asked how to use `@EgorBot`, `@MihuBot`, BenchmarkDotNet, or CoreRun + that requires local investigation -## Choose Your Workflow +> **Note:** For ad hoc PR benchmarking via @EgorBot or @MihuBot, use the +> `performance-benchmark` skill instead. This skill focuses on local builds, +> CoreRun, and git bisect. -| Context | Workflow | What it does | -|---------|----------|-------------| -| PR is open and you want to measure its impact | [Workflow 1: PR Benchmark Validation](#workflow-1-pr-benchmark-validation) | Write a benchmark, invoke a bot, get results | -| A regression has been reported (issue or bot alert) | [Workflow 2: Regression Investigation](#workflow-2-regression-investigation) | Validate, bisect, root-cause | -| Change affects JIT codegen and you want to see diffs | [Workflow 3: JIT Diff Analysis](#workflow-3-jit-diff-analysis) | Generate JIT diffs via MihuBot | - -If you're triaging a performance regression issue, use Workflow 2 for the -investigation methodology, then return to the `issue-triage` skill for -triage-specific assessment and recommendation. - ---- - -## Workflow 1: PR Benchmark Validation - -Use this when a PR is open and you want to measure its performance impact. - -### Step 1: Write a BenchmarkDotNet Benchmark - -Create a benchmark that targets the specific operation being changed. See -[Writing Good Benchmarks](#writing-good-benchmarks) below for best practices. - -```csharp -using BenchmarkDotNet.Attributes; -using BenchmarkDotNet.Running; - -BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args); - -[MemoryDiagnoser] -public class Bench -{ - [GlobalSetup] - public void Setup() - { - // Initialize test data - } - - [Benchmark] - public int MyOperation() - { - // Test the operation — return a value to prevent dead code elimination - return 42; - } -} -``` - -### Step 2: Choose a Bot and Invoke It - -**Use @EgorBot** when you need to run custom benchmark code (written in Step 1): - -Post a comment on the PR: - -```` -@EgorBot -amd -arm - -```cs -// Your benchmark code here -``` -```` - -EgorBot builds dotnet/runtime for the PR and base branch, runs the benchmark on -dedicated hardware, and posts BDN results back as a comment. - -See [EgorBot reference](references/egorbot-reference.md) for the full target -list, options, and examples. - -**Use @MihuBot** when you want to run existing benchmarks from the -[dotnet/performance](https://github.com/dotnet/performance) repo: - -``` -@MihuBot benchmark -``` - -This is useful when established benchmarks already cover the affected code path -and you don't need to write custom code. - -See [MihuBot reference](references/mihubot-reference.md) for the full command -syntax and options. - -### Step 3: Interpret Results - -EgorBot and MihuBot post results as PR comments. Look for: - -- **Ratio column** — values >1.0 indicate the PR is slower, <1.0 indicate it's - faster -- **Statistical significance** — if a `--statisticalTest` column is present, - look for `Faster`, `Slower`, or `Same` annotations -- **Memory/allocation changes** — check `Allocated` column if - `[MemoryDiagnoser]` is enabled - ---- - -## Workflow 2: Regression Investigation - -Use this when a performance regression has been reported — whether from -`performanceautofiler[bot]`, a customer report, or a cross-release comparison. - -### Overview +## Investigation Workflow The investigation follows three phases: @@ -143,25 +50,6 @@ For details on building the runtime, using CoreRun, and running BenchmarkDotNet against private builds, see the [local benchmarking guide](references/local-benchmarking.md). -### Quick Path: Use Bots Instead of Local Bisection - -If the regression range is narrow (a few commits) or the environment doesn't -support local builds, you can use bots to validate specific commits without -building locally: - -``` -@EgorBot -amd -commits {good-sha},{bad-sha} -``` - -Or with @MihuBot for existing benchmarks: - -``` -@MihuBot benchmark https://github.com/dotnet/runtime/compare/{good-sha}...{bad-sha} -``` - -This won't perform a full bisect, but it can confirm whether the regression -exists and help narrow the range. - ### Reporting Results After completing the investigation, include in your report: @@ -174,48 +62,10 @@ After completing the investigation, include in your report: --- -## Workflow 3: JIT Diff Analysis - -Use this when a change affects JIT code generation and you want to see how it -changes the emitted machine code across the entire BCL. - -### Invoke MihuBot for JIT Diffs - -Post a comment on the PR: - -``` -@MihuBot -``` - -MihuBot generates comprehensive JIT diffs showing codegen regressions and -improvements. For ARM64-specific diffs or tier-0 analysis: - -``` -@MihuBot -arm -tier0 -``` - -See [MihuBot reference](references/mihubot-reference.md) for the full set of JIT -diff options and usage guidance. - -### Interpreting JIT Diffs - -MihuBot reports include: - -- **Code size changes** — total bytes added/removed across all methods -- **Per-method diffs** — individual methods that changed, with before/after - assembly -- **Regressions vs improvements** — clearly separated sections - -A small increase in code size across many methods may indicate a JIT change with -broad impact. A large increase in a few methods may indicate a targeted -optimization that trades code size for speed (or a regression). - ---- - ## Writing Good Benchmarks -These guidelines apply whether you're writing a benchmark for EgorBot, for -local validation, or for contribution to the dotnet/performance repo. +These guidelines apply whether you're writing a benchmark for local validation +or for contribution to the dotnet/performance repo. For comprehensive guidance, see the [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md). @@ -240,16 +90,7 @@ For comprehensive guidance, see the - Must not be `sealed` - Must not be `static` -### Avoid `[DisassemblyDiagnoser]` - -It causes crashes on Linux. To get disassembly, use the `--envvars` option -instead: - -``` -@EgorBot -amd --envvars DOTNET_JitDisasm:MethodName -``` - -### Example: Comparing Two Implementations +### Example: Standalone Investigation Benchmark ```csharp using BenchmarkDotNet.Attributes; @@ -296,6 +137,7 @@ public class Bench | Condition | Skill | When to use | |-----------|-------|-------------| +| Need to benchmark a PR via @EgorBot | **performance-benchmark** | For ad hoc PR benchmarking on dedicated hardware | | Triaging a performance regression issue | **issue-triage** | For the full triage workflow (assessment, recommendation, labels) | | Fix PR linked to the regression | **code-review** | To review the fix for correctness and consistency | | JIT regression test needed | **jit-regression-test** | To extract a JIT regression test from the issue | diff --git a/.github/skills/performance-investigation/evals/evals.json b/.github/skills/performance-investigation/evals/evals.json index bfe6a78f99d9e6..21d363829af87e 100644 --- a/.github/skills/performance-investigation/evals/evals.json +++ b/.github/skills/performance-investigation/evals/evals.json @@ -3,42 +3,15 @@ "evals": [ { "id": 1, - "name": "pr-benchmark-request", - "prompt": "Can you benchmark PR https://github.com/dotnet/runtime/pull/121223 to check for performance impact?", - "expected_output": "Should follow Workflow 1 (PR Benchmark Validation). Should write a BenchmarkDotNet benchmark targeting the changed code and invoke @EgorBot to run it on the PR.", - "assertions": [ - { - "name": "uses-workflow-1", - "description": "Follows the PR benchmark validation workflow", - "type": "contains_any", - "check": ["Workflow 1", "PR Benchmark", "benchmark"] - }, - { - "name": "writes-benchmark", - "description": "Creates or references a BenchmarkDotNet benchmark", - "type": "contains_any", - "check": ["[Benchmark]", "BenchmarkDotNet", "BenchmarkSwitcher"] - }, - { - "name": "invokes-bot", - "description": "Invokes EgorBot or MihuBot to run the benchmark", - "type": "contains_any", - "check": ["@EgorBot", "@MihuBot"] - } - ], - "files": [] - }, - { - "id": 2, "name": "perf-regression-autobot", "prompt": "Investigate this performance regression: https://github.com/dotnet/runtime/issues/114625", - "expected_output": "Should follow Workflow 2 (Regression Investigation). Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and attempt validation or bisection.", + "expected_output": "Should follow the regression investigation workflow. Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and plan validation or bisection using local builds.", "assertions": [ { - "name": "uses-workflow-2", - "description": "Follows the regression investigation workflow", + "name": "identifies-regression", + "description": "Recognizes and follows the regression investigation workflow", "type": "contains_any", - "check": ["Workflow 2", "Regression", "regression", "investigate"] + "check": ["regression", "investigate", "Regression"] }, { "name": "identifies-commits", @@ -56,28 +29,7 @@ "files": [] }, { - "id": 3, - "name": "jit-diff-request", - "prompt": "Can you generate JIT diffs for my PR that changes the JIT compiler?", - "expected_output": "Should follow Workflow 3 (JIT Diff Analysis). Should invoke @MihuBot to generate JIT diffs and explain how to interpret the results.", - "assertions": [ - { - "name": "uses-mihubot", - "description": "Invokes MihuBot for JIT diffs", - "type": "contains", - "check": "@MihuBot" - }, - { - "name": "mentions-jit-diffs", - "description": "References JIT diff generation", - "type": "contains_any", - "check": ["JIT diff", "jit-diff", "codegen", "code size"] - } - ], - "files": [] - }, - { - "id": 4, + "id": 2, "name": "benchmark-with-corerun", "prompt": "How do I benchmark my local runtime changes against the main branch?", "expected_output": "Should explain how to build dotnet/runtime, obtain CoreRun from the testhost folder, and run BenchmarkDotNet with the --coreRun argument to compare private builds.", @@ -104,31 +56,10 @@ "files": [] }, { - "id": 5, - "name": "existing-benchmarks-request", - "prompt": "Run the existing Regex benchmarks from dotnet/performance against PR https://github.com/dotnet/runtime/pull/124628", - "expected_output": "Should use @MihuBot benchmark command to run existing benchmarks from the dotnet/performance repo rather than writing custom benchmark code.", - "assertions": [ - { - "name": "uses-mihubot-benchmark", - "description": "Uses MihuBot's benchmark command for existing benchmarks", - "type": "contains", - "check": "@MihuBot benchmark" - }, - { - "name": "references-perf-repo", - "description": "References the dotnet/performance repository", - "type": "contains_any", - "check": ["dotnet/performance", "performance repo"] - } - ], - "files": [] - }, - { - "id": 6, + "id": 3, "name": "cross-release-regression", "prompt": "A user reports that string.IndexOf is 2x slower in .NET 10 compared to .NET 9. How should we investigate?", - "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression. Should reference both local investigation and bot-based approaches.", + "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression locally using CoreRun builds.", "assertions": [ { "name": "mentions-merge-base", @@ -152,28 +83,28 @@ "files": [] }, { - "id": 7, - "name": "compare-specific-commits", - "prompt": "Compare the performance of commits abc1234 and def5678 for the System.Text.Json benchmarks", - "expected_output": "Should invoke @EgorBot with -commits to compare the two specific commits, or use @MihuBot benchmark with a compare URL.", + "id": 4, + "name": "compare-commits-locally", + "prompt": "Compare the performance of two specific commits locally for System.Text.Json serialization", + "expected_output": "Should explain how to build dotnet/runtime at both commits, save testhost/CoreRun artifacts, and run BenchmarkDotNet with --coreRun pointing to both builds for a side-by-side comparison.", "assertions": [ { - "name": "uses-commits-flag", - "description": "Uses the -commits option or compare URL to specify the commits", + "name": "mentions-corerun", + "description": "References CoreRun or testhost for running against private builds", "type": "contains_any", - "check": ["-commits", "compare", "abc1234", "def5678"] + "check": ["CoreRun", "coreRun", "--coreRun", "testhost"] }, { - "name": "invokes-bot", - "description": "Invokes EgorBot or MihuBot to run the comparison", + "name": "mentions-both-builds", + "description": "Explains building at both commits for comparison", "type": "contains_any", - "check": ["@EgorBot", "@MihuBot"] + "check": ["both commits", "good", "bad", "baseline", "two builds", "each commit"] } ], "files": [] }, { - "id": 8, + "id": 5, "name": "not-applicable-bug-issue", "prompt": "Can you check the performance impact of https://github.com/dotnet/runtime/issues/46088", "expected_output": "Should recognize this is a functional bug (System.Text.Json does not support constructors with byref parameters), not a performance issue. Should indicate that performance benchmarking is not applicable here.", @@ -188,7 +119,7 @@ "files": [] }, { - "id": 9, + "id": 6, "name": "not-applicable-doc-pr", "prompt": "Benchmark the changes in PR https://github.com/dotnet/runtime/pull/124592 to validate performance", "expected_output": "Should recognize this is a documentation-only PR (adding XML docs to DI extension methods) and that benchmarking is not applicable or meaningful for documentation changes.", diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md deleted file mode 100644 index 39ed2e8ab81774..00000000000000 --- a/.github/skills/performance-investigation/references/egorbot-reference.md +++ /dev/null @@ -1,71 +0,0 @@ -# EgorBot Reference - -[EgorBot](https://github.com/EgorBo/EgorBot) is a benchmark-as-a-service bot for -[dotnet/runtime](https://github.com/dotnet/runtime). It runs BenchmarkDotNet -microbenchmarks on dedicated hardware and posts results back as GitHub comments. -Its primary use case is comparing performance before and after a change — either -across a PR or between specific commits. - -For the full and up-to-date command reference (targets, options, defaults), -see the [EgorBot manual](https://github.com/EgorBo/EgorBot). - -## Command Format - -Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a -fenced C# code block (a code fence that begins with three backticks followed -by `cs`) in the same comment. - -``` -@EgorBot [targets...] [options...] [BDN arguments...] -``` - -> **Formatting rules:** -> - The `@EgorBot` command must be **outside** the code block. -> - Only benchmark source code belongs inside the code block. -> - Do not place text between the `@EgorBot` line and the code block — EgorBot -> treats it as additional command arguments. - -## Examples - -Compare a PR against its base branch on AMD and Apple Silicon: - -``` -@EgorBot -amd -arm -``` - -Compare two specific commits: - -``` -@EgorBot -amd -commits abc1234,def5678 -``` - -Compare a commit against its parent: - -``` -@EgorBot -arm -commits abc1234,abc1234~1 -``` - -Compare a range of commits for a specific benchmark filter: - -``` -@EgorBot -arm -commits abc1234...def5678 --filter "*MyBench*" -``` - -## Practical Notes - -- **Default target:** If no target is specified, runs on Apple Silicon via Helix. -- **PR mode:** When posting in a PR without `-commits`, EgorBot automatically - compares the PR branch against the base branch. -- **No code block:** If no code block is provided, EgorBot runs benchmarks from - the [dotnet/performance](https://github.com/dotnet/performance) repo instead. -- **Response time:** EgorBot uses polling and may take up to 30 seconds to - acknowledge the request. -- **Supported repositories:** `dotnet/runtime` and `EgorBot/runtime-utils`. -- **Result variability:** Results can vary between runs due to VM differences. - Do not compare results across different architectures or cloud providers. - -## Links - -- [EgorBot manual](https://github.com/EgorBo/EgorBot) — full target list, - options, and usage documentation -- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html) diff --git a/.github/skills/performance-investigation/references/mihubot-reference.md b/.github/skills/performance-investigation/references/mihubot-reference.md deleted file mode 100644 index 458d4fd06bbb4c..00000000000000 --- a/.github/skills/performance-investigation/references/mihubot-reference.md +++ /dev/null @@ -1,66 +0,0 @@ -# MihuBot Reference - -[MihuBot](https://github.com/MihuBot/runtime-utils) provides several -performance-related services for dotnet/runtime: JIT diff generation, benchmark -execution from the [dotnet/performance](https://github.com/dotnet/performance) -repo, library fuzzing, and regex source generator diffs. It also has a -[web interface](https://mihubot.xyz/runtime-utils) for submitting jobs. - -For full and up-to-date option details, see the -[MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) repository. - -## JIT Diff Generation - -Generate JIT diffs between a PR and its base branch to see how a change affects -the generated machine code across the BCL. - -``` -@MihuBot -@MihuBot -arm -tier0 -``` - -## Running Benchmarks from dotnet/performance - -Run existing benchmarks from the -[dotnet/performance](https://github.com/dotnet/performance) repository without -writing custom benchmark code. - -``` -@MihuBot benchmark Regex -@MihuBot benchmark GetUnicodeCategory https://github.com/dotnet/runtime/compare/4bb0bcd...c74440f -``` - -## Library Fuzzer - -Run fuzz testing on a library: - -``` -@MihuBot fuzz SearchValues -@MihuBot fuzz SearchValues -dependsOn #107206 -``` - -## Regex Source Generator Diffs - -Generate diffs for regex source generator output and JIT diffs for the -generated code: - -``` -@MihuBot regexdiff -@MihuBot regexdiff -arm -``` - -## Common Options - -Most MihuBot job types support options like `-arm`, `-intel`, `-fast`, -`-dependsOn `, and `-combineWith `. For example: - -``` -@MihuBot -arm -hetzner -combineWith #1000,#1001 -``` - -## Links - -- [MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) — full - documentation and option reference -- [Web interface](https://mihubot.xyz/runtime-utils) for submitting jobs - directly