From 02252ecd4f017a8057d5b57c429ec76bbb443ab0 Mon Sep 17 00:00:00 2001
From: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Date: Mon, 23 Mar 2026 18:33:37 +0200
Subject: [PATCH 1/5] Add performance-investigation skill, replacing
 performance-benchmark

Consolidate all performance investigation guidance into a single skill with
three workflows:
- PR benchmark validation (EgorBot/MihuBot)
- Regression investigation (CoreRun builds, git bisect)
- JIT diff analysis (MihuBot)

Reference docs cover EgorBot, MihuBot, local benchmarking with CoreRun,
and git bisect methodology. Includes 9 evals covering all workflows plus
negative cases.

The existing performance-benchmark skill is removed (fully superseded).
The issue-triage skill's perf-regression-triage.md is slimmed to keep only
triage-specific assessment/recommendation criteria, delegating investigation
methodology to the new skill. All cross-references updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/copilot-instructions.md               |   2 +-
 .github/skills/api-proposal/SKILL.md          |   2 +-
 .github/skills/issue-triage/SKILL.md          |   4 +-
 .../references/perf-regression-triage.md      | 323 +-----------------
 .github/skills/jit-regression-test/SKILL.md   |   2 +-
 .github/skills/performance-benchmark/SKILL.md | 191 -----------
 .../skills/performance-investigation/SKILL.md | 305 +++++++++++++++++
 .../evals/evals.json                          | 206 +++++++++++
 .../references/bisection-guide.md             | 173 ++++++++++
 .../references/egorbot-reference.md           |  73 ++++
 .../references/local-benchmarking.md          | 140 ++++++++
 .../references/mihubot-reference.md           |  66 ++++
 12 files changed, 986 insertions(+), 501 deletions(-)
 delete mode 100644 .github/skills/performance-benchmark/SKILL.md
 create mode 100644 .github/skills/performance-investigation/SKILL.md
 create mode 100644 .github/skills/performance-investigation/evals/evals.json
 create mode 100644 .github/skills/performance-investigation/references/bisection-guide.md
 create mode 100644 .github/skills/performance-investigation/references/egorbot-reference.md
 create mode 100644 .github/skills/performance-investigation/references/local-benchmarking.md
 create mode 100644 .github/skills/performance-investigation/references/mihubot-reference.md

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index a23e28a783c9bc..b2de17cdd4274a 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -14,7 +14,7 @@ When NOT running under CCA, skip the `code-review` skill if the user has stated
 
 Before making changes to a directory, search for `README.md` files in that directory and its parent directories up to the repository root. Read any you find — they contain conventions, patterns, and architectural context relevant to your work.
 
-If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-benchmark` skill to validate the impact before completing.
+If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-investigation` skill to validate the impact before completing.
 
 You MUST follow all code-formatting and naming conventions defined in [`.editorconfig`](/.editorconfig).
 
diff --git a/.github/skills/api-proposal/SKILL.md b/.github/skills/api-proposal/SKILL.md
index 8f1905b87d1428..6c9f2c494fa3d9 100644
--- a/.github/skills/api-proposal/SKILL.md
+++ b/.github/skills/api-proposal/SKILL.md
@@ -160,7 +160,7 @@ This:
 
 2. **All errors and warnings must be fixed** before proceeding to the draft phase.
 
-3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-benchmark** skill.
+3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-investigation** skill.
 
 4. Re-run tests after any review-driven changes to confirm nothing regressed.
 
diff --git a/.github/skills/issue-triage/SKILL.md b/.github/skills/issue-triage/SKILL.md
index a3c920021376e7..b104bfbcb48407 100644
--- a/.github/skills/issue-triage/SKILL.md
+++ b/.github/skills/issue-triage/SKILL.md
@@ -232,7 +232,7 @@ Based on the issue type classified in Step 1, follow the appropriate guide:
 |------|-------|---------------|
 | **Bug report** | [Bug triage](references/bug-triage.md) | Reproduction, regression validation, minimal repro derivation, root cause analysis |
 | **API proposal** | [API proposal triage](references/api-proposal-triage.md) | Merit evaluation, complexity estimation |
-| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression with BenchmarkDotNet, git bisect to culprit commit |
+| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression, assess severity and impact. For detailed investigation methodology (benchmarking, bisection), use the `performance-investigation` skill. |
 | **Question** | [Question triage](references/question-triage.md) | Research and answer the question, verify if low confidence |
 | **Enhancement** | [Enhancement triage](references/enhancement-triage.md) | Subcategory classification, feasibility analysis, trade-off assessment (includes performance improvement requests) |
 
@@ -521,5 +521,5 @@ depending on the outcome:
 |-----------|-------|-----------------|
 | API proposal recommended as KEEP | **api-proposal** | Offer to draft a formal API proposal with working prototype |
 | Bug report with root cause identified | **jit-regression-test** | If the bug is JIT-related, offer to create a regression test |
-| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks |
+| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression (benchmarking, bisection, JIT diffs) |
 | Fix PR linked to the issue | **code-review** | Offer to review the fix PR for correctness and consistency |
diff --git a/.github/skills/issue-triage/references/perf-regression-triage.md b/.github/skills/issue-triage/references/perf-regression-triage.md
index 1382c09ee8e1e6..4fb2b37732548d 100644
--- a/.github/skills/issue-triage/references/perf-regression-triage.md
+++ b/.github/skills/issue-triage/references/perf-regression-triage.md
@@ -1,13 +1,14 @@
 # Performance Regression Triage
 
-Guidance for investigating and triaging performance regressions in
-dotnet/runtime. Referenced from the main [SKILL.md](../SKILL.md) during Step 5.
+Triage-specific guidance for assessing and recommending action on performance
+regressions in dotnet/runtime. Referenced from the main
+[SKILL.md](../SKILL.md) during Step 5.
 
-> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd`
-> on Windows or `./build.sh` on Linux/macOS. Other shell commands use
-> Linux/macOS syntax (`cp -r`, forward-slash paths, `\` line continuation).
-> On Windows, adapt accordingly: use `Copy-Item` or `xcopy`, backslash paths,
-> and backtick (`` ` ``) line continuation.
+For detailed investigation methodology (benchmarking, bisection, bot usage),
+use the `performance-investigation` skill. This document covers only the
+triage-specific assessment and recommendation criteria.
+
+## Sources of Performance Regressions
 
 A performance regression is a report that something got measurably slower (or
 uses more memory/allocations) compared to a previous .NET version or a recent
@@ -21,307 +22,19 @@ commit. These reports come from several sources:
 - **Cross-release regressions** -- a regression observed between two stable
   releases (e.g., .NET 9 → .NET 10) without a specific commit range.
 
-The goals of this triage are to:
-
-1. **Validate** that the regression is real and reproducible.
-2. **Bisect** to the exact commit that introduced it.
-
-## Feasibility Check
-
-Before investing time in benchmarking and bisection, assess whether the current
-environment can support the investigation. Full bisection requires building
-dotnet/runtime at multiple commits (each build takes 5-40 minutes) and running
-benchmarks, which is resource-intensive.
-
-| Factor | Feasible | Not feasible |
-|--------|----------|--------------|
-| **Disk space** | >50 GB free (for multiple builds) | <20 GB free |
-| **Build time budget** | User is willing to wait 30-60+ min | Quick-turnaround triage expected |
-| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) |
-| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits |
-| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware |
-
-### When full bisection is not feasible
-
-Use the **lightweight analysis** path instead:
-
-1. **Analyze `git log`** -- Review commits in the regression range
-   (`git log --oneline {good}..{bad}`) and identify changes to the affected
-   code path. Look for algorithmic changes, removed optimizations, added
-   validation, or new allocations.
-2. **Check PR descriptions** -- For each suspicious commit, read the associated
-   PR description and review comments. Performance trade-offs are often
-   discussed there.
-3. **Narrow by code path** -- Use `git log --oneline {good}..{bad} -- path/`
-   to filter commits to the affected library or component.
-4. **Report the narrowed range** -- Include the list of candidate commits/PRs
-   in the triage report with an explanation of why each is suspicious. This
-   gives maintainers a head start even without a definitive bisect result.
-
-Note in the triage report that full bisection was not attempted and why
-(e.g., "environment mismatch", "time constraint"), so maintainers know to
-verify independently.
-
-## Identifying the Bisect Range
-
-Before benchmarking, determine the good and bad commits that bound the
-regression.
-
-### Automated bot issues (`performanceautofiler`)
-
-Issues from `performanceautofiler[bot]` follow a standard format:
-
-- **Run Information** -- Baseline commit, Compare commit, diff link, OS, arch,
-  and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`).
-- **Regression tables** -- Each table shows benchmark name, Baseline time,
-  Test time, and Test/Base ratio. A ratio >1.0 indicates a regression.
-- **Repro commands** -- Typically:
-  ```
-  git clone https://github.com/dotnet/performance.git
-  python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*'
-  ```
-- **Graphs** -- Time-series graphs showing when the regression appeared.
-
-Key fields to extract:
-
-- The **Baseline** and **Compare** commit SHAs -- these define the bisect range.
-- The **benchmark filter** -- the `--filter` argument to reproduce the benchmark.
-- The **Test/Base ratio** -- how severe the regression is (>1.5× is significant).
-
-### Customer reports
-
-When a customer reports a regression (e.g., "X is slower on .NET 10 than
-.NET 9"), there are no pre-defined commit SHAs. You need to determine the
-bisect range yourself -- see [Cross-release regressions](#cross-release-regressions)
-below.
-
-Also identify the **scenario to benchmark** from the customer's report -- the
-specific API call, code pattern, or workload that regressed.
-
-### Cross-release regressions
-
-When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect
-on the `main` branch between the commits from which the release branches were
-snapped. Release branches in dotnet/runtime are
-[snapped from main](../../../../docs/project/branching-guide.md).
-
-Find the snap points with `git merge-base`:
-
-```
-git merge-base main release/9.0    # → good commit (last common ancestor)
-git merge-base main release/10.0   # → bad commit
-```
-
-Use the resulting SHAs as the good/bad boundaries for bisection on `main`.
-This avoids bisecting across release branches where cherry-picks and backports
-make the history non-linear.
-
-## Phase 1: Create a Standalone Benchmark
-
-Before investing time in bisection, create a standalone BenchmarkDotNet
-project that reproduces the regressing scenario. This project will be used
-for both validation (Phase 1) and bisection (Phase 3).
-
-### Why a standalone project?
-
-The full [dotnet/performance](https://github.com/dotnet/performance) repo
-has many dependencies and can be fragile across different runtime commits.
-A standalone project with only the impacted benchmark is faster to build,
-easier to iterate on, and more reliable during `git bisect`.
-
-### Creating the benchmark project
-
-**From an automated bot issue** -- copy the relevant benchmark class and its
-dependencies from the `dotnet/performance` repo into a new standalone project:
-
-1. Clone `dotnet/performance` and locate the benchmark class referenced in the
-   issue's `--filter` argument.
-2. Create a new console project and add a reference to
-   `BenchmarkDotNet` (NuGet):
-   ```
-   mkdir PerfRepro && cd PerfRepro
-   dotnet new console
-   dotnet add package BenchmarkDotNet
-   ```
-3. Copy the benchmark class (and any helper types it depends on) into the
-   project. Adjust namespaces and usings as needed.
-4. Add a `Program.cs` entry point:
-   ```csharp
-   BenchmarkDotNet.Running.BenchmarkSwitcher
-       .FromAssembly(typeof(Program).Assembly)
-       .Run(args);
-   ```
-
-**From a customer report** -- write a minimal BenchmarkDotNet benchmark that
-exercises the reported code path:
-
-1. Create a new console project with `BenchmarkDotNet` as above.
-2. Write a `[Benchmark]` method that calls the API or runs the workload the
-   customer identified as slow.
-3. If the customer provided sample code, adapt it into a proper BDN benchmark
-   with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path.
-
-### Building dotnet/runtime and obtaining CoreRun
-
-Build dotnet/runtime at the commit you want to test:
-
-```
-build.cmd/sh clr+libs -c release
-```
-
-The key artifact is the **testhost** folder containing **CoreRun** at:
-
-```
-artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/
-```
-
-BenchmarkDotNet uses CoreRun to load the locally-built runtime and libraries,
-meaning you can benchmark private builds without installing them as SDKs.
-
-### Validating the regression
-
-Build dotnet/runtime at both the good and bad commits, saving each testhost
-folder:
-
-```
-git checkout {bad-sha}
-build.cmd/sh clr+libs -c release
-cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad
-
-git checkout {good-sha}
-build.cmd/sh clr+libs -c release
-cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good
-```
-
-Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares
-them side-by-side when given multiple `--coreRun` paths (the first is treated
-as the baseline):
-
-```
-cd PerfRepro
-dotnet run -c Release -f net{ver} -- \
-    --filter '*' \
-    --coreRun /tmp/corerun-good/.../CoreRun \
-              /tmp/corerun-bad/.../CoreRun
-```
-
-To add a statistical significance column, append `--statisticalTest 5%`.
-This performs a Mann–Whitney U test and marks results as `Faster`, `Slower`,
-or `Same`.
-
-### Interpret the results
-
-| Outcome | Meaning | Next step |
-|---------|---------|-----------|
-| `Slower` with ratio >1.10 | Regression confirmed | Proceed to Phase 2 |
-| `Slower` with ratio between 1.05 and 1.10 | Small regression -- likely real but needs confirmation | Re-run with more iterations (`--iterationCount 30`). If it persists, treat as confirmed and proceed to Phase 2. |
-| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. |
-| `Slower` but ratio <1.05 | Marginal -- may be noise | Re-run with more iterations (`--iterationCount 30`). If still marginal, note as inconclusive. |
-
-For a thorough comparison of saved BDN result files, use the
-[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer)
-tool:
-
-```
-dotnet run --project performance/src/tools/ResultsComparer \
-    --base /path/to/baseline-results \
-    --diff /path/to/compare-results \
-    --threshold 5%
-```
-
-## Phase 2: Narrow the Commit Range
-
-If the bisect range spans many commits, narrow it before running a full
-bisect:
-
-1. **Check `git log --oneline {good}..{bad}`** -- how many commits are in the
-   range? If it is more than ~200, try to narrow it first.
-2. **Test midpoint commits manually** -- pick a commit in the middle of the
-   range, build, run the benchmark, and determine if it is good or bad.
-   This halves the range in one step.
-3. **For cross-release regressions** -- use the `git merge-base` snap points
-   described above. If the range between two release snap points is still
-   large, test at intermediate release preview tags to narrow further.
-
-## Phase 3: Git Bisect
-
-Once you have a manageable commit range (good commit and bad commit), use
-`git bisect` to binary-search for the culprit.
-
-### Bisect workflow
-
-At each step of the bisect, you need to:
-
-1. **Rebuild the affected component** -- use incremental builds where possible
-   (see [Incremental Rebuilds](#incremental-rebuilds-during-bisect) below).
-2. **Run the standalone benchmark** with the freshly-built CoreRun:
-   ```
-   cd PerfRepro
-   dotnet run -c Release -f net{ver} -- \
-       --filter '*' \
-       --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun
-   ```
-3. **Determine good or bad** -- compare the result against your threshold.
-
-**Exit codes for `git bisect run`:**
-- `0` -- good (no regression at this commit)
-- `1`–`124` -- bad (regression present)
-- `125` -- skip (build failure or untestable commit)
-
-The standalone benchmark project must be **outside the dotnet/runtime tree**
-since `git bisect` checks out different commits, which would overwrite
-in-tree files. Place it in a stable location (e.g., `/tmp/bisect/`).
-
-### Run the bisect
-
-```
-cd /path/to/runtime
-git bisect start {bad-sha} {good-sha}
-git bisect run /path/to/bisect-script.sh
-```
-
-**Time estimate:** Each bisect step requires a rebuild + benchmark run.
-For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect
-roughly 50 minutes for the full bisect.
-
-### After bisect completes
-
-`git bisect` will output the first bad commit. Run `git bisect reset` to
-return to the original branch.
-
-### Root cause analysis and triage report
-
-Include the following in the triage report:
-
-1. **The culprit commit or PR** -- link to the specific commit SHA and its
-   associated PR. Explain how the change relates to the regressing benchmark.
-2. **Root cause analysis** -- describe *why* the change caused the regression
-   (e.g., an algorithm change, a removed optimization, additional validation
-   overhead).
-3. **If the root cause spans multiple PRs** -- sometimes a regression results
-   from the combined effect of several changes and `git bisect` lands on a
-   commit that is only one contributing factor. In this case, report the
-   narrowest commit range that introduced the regression and list the PRs or
-   commits within that range that appear relevant to the affected code path.
-
-## Incremental Rebuilds During Bisect
-
-Full rebuilds are slow. Minimize per-step build time:
+## Investigation
 
-| Component changed | Fast rebuild command |
-|-------------------|---------------------|
-| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` |
-| CoreLib | `build.cmd/sh clr.corelib -c Release` |
-| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` |
-| All libraries | `build.cmd/sh libs -c Release` |
+The investigation goal is to validate that the regression is real and, if
+possible, bisect to the exact commit that introduced it.
 
-After an incremental library rebuild, the updated DLL is placed in the
-testhost folder automatically. CoreRun will pick up the new version on the
-next benchmark run.
+Use the `performance-investigation` skill (Workflow 2: Regression Investigation)
+for the full methodology, which includes:
 
-**Caveat:** If bisect crosses a commit that changes the build infrastructure
-(e.g., SDK version bump in `global.json`), the incremental build may fail.
-Use exit code `125` (skip) to handle this gracefully.
+- Feasibility checks for local vs. bot-based investigation
+- Building dotnet/runtime at specific commits and using CoreRun
+- Comparing good/bad commits with BenchmarkDotNet
+- Git bisect workflow for finding the culprit commit
+- Using @EgorBot and @MihuBot for remote validation
 
 ## Performance-Specific Assessment
 
diff --git a/.github/skills/jit-regression-test/SKILL.md b/.github/skills/jit-regression-test/SKILL.md
index e6cc8f82d58c50..2e03703531009e 100644
--- a/.github/skills/jit-regression-test/SKILL.md
+++ b/.github/skills/jit-regression-test/SKILL.md
@@ -7,7 +7,7 @@ description: >
   bug", "create a regression test for issue #NNNNN", converting issue repro to
   xunit test. DO NOT USE FOR: non-JIT tests (use standard test patterns),
   debugging JIT issues without a known repro, performance benchmarks (use
-  performance-benchmark skill).
+  performance-investigation skill).
 ---
 
 # JIT Regression Test Extraction
diff --git a/.github/skills/performance-benchmark/SKILL.md b/.github/skills/performance-benchmark/SKILL.md
deleted file mode 100644
index 9e1b8f0bbf6a31..00000000000000
--- a/.github/skills/performance-benchmark/SKILL.md
+++ /dev/null
@@ -1,191 +0,0 @@
----
-name: performance-benchmark
-description: Generate and run ad hoc performance benchmarks to validate code changes. Use this when asked to benchmark, profile, or validate the performance impact of a code change in dotnet/runtime.
----
-
-# Ad Hoc Performance Benchmarking with @EgorBot
-
-When you need to validate the performance impact of a code change, follow this process to write a BenchmarkDotNet benchmark and trigger @EgorBot to run it.
-The bot will notify you when results are ready, so don't wait for them.
-
-## Step 1: Write the Benchmark
-
-Create a BenchmarkDotNet benchmark that tests the specific operation being changed. Follow these guidelines:
-
-### Benchmark Structure
-
-```csharp
-using BenchmarkDotNet.Attributes;
-using BenchmarkDotNet.Running;
-
-BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
-
-public class Bench
-{
-    // Add setup/cleanup if needed
-    [GlobalSetup]
-    public void Setup()
-    {
-        // Initialize test data
-    }
-
-    [Benchmark]
-    public void MyOperation()
-    {
-        // Test the operation
-    }
-}
-```
-
-### Best Practices
-
-For comprehensive guidance, see the [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md).
-
-Key principles:
-
-- **Move initialization to `[GlobalSetup]`**: Separate setup logic from the measured code to avoid measuring allocation/initialization overhead
-- **Return values** from benchmark methods to prevent dead code elimination
-- **Avoid loops**: BenchmarkDotNet invokes the benchmark many times automatically; adding manual loops distorts measurements
-- **No side effects**: Benchmarks should be pure and produce consistent results
-- **Focus on common cases**: Benchmark hot paths and typical usage, not edge cases or error paths
-- **Use consistent input data**: Always use the same test data for reproducible comparisons
-- **Avoid `[DisassemblyDiagnoser]`**: It causes crashes on Linux. Use `--envvars DOTNET_JitDisasm:MethodName` instead
-- **Benchmark class requirements**: Must be `public`, not `sealed`, not `static`, and must be a `class` (not struct)
-
-### Example: String Operation Benchmark
-
-```csharp
-using BenchmarkDotNet.Attributes;
-using BenchmarkDotNet.Running;
-
-BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
-
-[MemoryDiagnoser]
-public class Bench
-{
-    private string _testString = default!;
-
-    [Params(10, 100, 1000)]
-    public int Length { get; set; }
-
-    [GlobalSetup]
-    public void Setup()
-    {
-        _testString = new string('a', Length);
-    }
-
-    [Benchmark]
-    public int StringOperation()
-    {
-        return _testString.IndexOf('z');
-    }
-}
-```
-
-### Example: Collection Operation Benchmark
-
-```csharp
-using System.Linq;
-using BenchmarkDotNet.Attributes;
-using BenchmarkDotNet.Running;
-
-BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
-
-[MemoryDiagnoser]
-public class Bench
-{
-    private int[] _array = default!;
-    private List<int> _list = default!;
-
-    [Params(100, 1000, 10000)]
-    public int Count { get; set; }
-
-    [GlobalSetup]
-    public void Setup()
-    {
-        _array = Enumerable.Range(0, Count).ToArray();
-        _list = _array.ToList();
-    }
-
-    [Benchmark]
-    public bool AnyArray() => _array.Any();
-
-    [Benchmark]
-    public bool AnyList() => _list.Any();
-
-    [Benchmark]
-    public int SumArray() => _array.Sum();
-
-    [Benchmark]
-    public int SumList() => _list.Sum();
-}
-```
-
-## Step 2: Mention @EgorBot in a comment/PR description
-
-Post a comment on the PR to trigger EgorBot with your benchmark. The general format is:
-
-> 📝 **AI-generated content disclosure:** When posting benchmark comments to GitHub under a user's credentials — i.e., the account is **not** a dedicated "copilot" or "bot" account/app — you **MUST** include a concise, visible note (e.g. a `> [!NOTE]` alert) indicating the content was AI/Copilot-generated. Skip this if the user explicitly asks you to omit it.
-
-@EgorBot [targets] [options] [BenchmarkDotNet args]
-
-```cs
-// Your benchmark code here
-```
-> **Note:** When using @EgorBot, follow these formatting rules:
-> - The @EgorBot command must not be inside the code block.
-> - Only the benchmark code should be inside the code block.
-> - Do not place any additional text between the @EgorBot command line and the code block, as EgorBot will treat it as additional command arguments.
-
-### Target Flags
-
-- `-linux_amd`
-- `-linux_intel`
-- `-windows_amd`
-- `-windows_intel`
-- `-linux_arm64`
-- `-osx_arm64` (baremetal, feel free to always include it)
-
-The most common combination is `-linux_amd -osx_arm64`. Do not include more than 4 targets.
-
-### Common Options
-
-Use `-profiler` when absolutely necessary along with `-linux_arm64` and/or `-linux_amd` to include `perf` profiling and disassembly in the results.
-
-### Example: Basic PR Benchmark
-
-To benchmark the current PR changes against the base branch:
-
-@EgorBot -linux_amd -osx_arm64
-
-```cs
-using BenchmarkDotNet.Attributes;
-using BenchmarkDotNet.Running;
-
-BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
-
-[MemoryDiagnoser]
-public class Bench
-{
-    [Benchmark]
-    public int MyOperation()
-    {
-        // Your benchmark code
-        return 42;
-    }
-}
-```
-
-## Important Notes
-
-- **Bot response time**: EgorBot uses polling and may take up to 30 seconds to respond
-- **Supported repositories**: EgorBot monitors `dotnet/runtime` and `EgorBot/runtime-utils`
-- **PR mode (default)**: When posting in a PR, EgorBot automatically compares the PR changes against the base branch
-- **Results variability**: Results may vary between runs due to VM differences. Do not compare results across different architectures or cloud providers
-- **Check the manual**: EgorBot replies include a link to the [manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage) for advanced options
-
-## Additional Resources
-
-- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md) - Essential reading for writing effective benchmarks
-- [BenchmarkDotNet CLI Arguments](https://github.com/dotnet/BenchmarkDotNet/blob/master/docs/articles/guides/console-args.md)
-- [EgorBot Manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage)
diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md
new file mode 100644
index 00000000000000..f4d4d5187be662
--- /dev/null
+++ b/.github/skills/performance-investigation/SKILL.md
@@ -0,0 +1,305 @@
+---
+name: performance-investigation
+description: >
+  Investigate performance regressions and validate performance impact of code
+  changes in dotnet/runtime. Use this skill whenever asked to benchmark a PR,
+  investigate a performance regression, validate performance impact, run
+  benchmarks, generate JIT diffs, compare performance between commits, triage
+  a performance issue, or check whether a change improves or regresses
+  performance. Also use when asked about @EgorBot, @MihuBot, BenchmarkDotNet,
+  CoreRun, or dotnet/performance. Covers ad hoc PR benchmarking, deep
+  regression investigation with git bisect, and JIT diff analysis.
+---
+
+# Performance Investigation for dotnet/runtime
+
+Investigate performance regressions and validate the performance impact of code
+changes. This skill covers three workflows, from quick PR validation to deep
+regression root-causing.
+
+## When to Use This Skill
+
+- Asked to **benchmark** a PR or validate performance impact of a change
+- Asked to **investigate a performance regression** (from an issue, bot report,
+  or customer report)
+- Asked to **generate JIT diffs** or analyze codegen impact
+- Asked to **compare performance** between commits, branches, or releases
+- Asked to **triage a performance issue** (use alongside the `issue-triage`
+  skill for full triage)
+- Given a `tenet-performance` or `tenet-performance-benchmarks` labeled issue
+- Asked how to use `@EgorBot`, `@MihuBot`, BenchmarkDotNet, or CoreRun
+
+## Choose Your Workflow
+
+| Context | Workflow | What it does |
+|---------|----------|-------------|
+| PR is open and you want to measure its impact | [Workflow 1: PR Benchmark Validation](#workflow-1-pr-benchmark-validation) | Write a benchmark, invoke a bot, get results |
+| A regression has been reported (issue or bot alert) | [Workflow 2: Regression Investigation](#workflow-2-regression-investigation) | Validate, bisect, root-cause |
+| Change affects JIT codegen and you want to see diffs | [Workflow 3: JIT Diff Analysis](#workflow-3-jit-diff-analysis) | Generate JIT diffs via MihuBot |
+
+If you're triaging a performance regression issue, use Workflow 2 for the
+investigation methodology, then return to the `issue-triage` skill for
+triage-specific assessment and recommendation.
+
+---
+
+## Workflow 1: PR Benchmark Validation
+
+Use this when a PR is open and you want to measure its performance impact.
+
+### Step 1: Write a BenchmarkDotNet Benchmark
+
+Create a benchmark that targets the specific operation being changed. See
+[Writing Good Benchmarks](#writing-good-benchmarks) below for best practices.
+
+```csharp
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Running;
+
+BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
+
+[MemoryDiagnoser]
+public class Bench
+{
+    [GlobalSetup]
+    public void Setup()
+    {
+        // Initialize test data
+    }
+
+    [Benchmark]
+    public int MyOperation()
+    {
+        // Test the operation — return a value to prevent dead code elimination
+        return 42;
+    }
+}
+```
+
+### Step 2: Choose a Bot and Invoke It
+
+**Use @EgorBot** when you need to run custom benchmark code (written in Step 1):
+
+Post a comment on the PR:
+
+```
+@EgorBot -amd -arm
+
+​```cs
+// Your benchmark code here
+​```
+```
+
+EgorBot builds dotnet/runtime for the PR and base branch, runs the benchmark on
+dedicated hardware, and posts BDN results back as a comment.
+
+See [EgorBot reference](references/egorbot-reference.md) for the full target
+list, options, and examples.
+
+**Use @MihuBot** when you want to run existing benchmarks from the
+[dotnet/performance](https://github.com/dotnet/performance) repo:
+
+```
+@MihuBot benchmark <filter>
+```
+
+This is useful when established benchmarks already cover the affected code path
+and you don't need to write custom code.
+
+See [MihuBot reference](references/mihubot-reference.md) for the full command
+syntax and options.
+
+### Step 3: Interpret Results
+
+EgorBot and MihuBot post results as PR comments. Look for:
+
+- **Ratio column** — values >1.0 indicate the PR is slower, <1.0 indicate it's
+  faster
+- **Statistical significance** — if a `--statisticalTest` column is present,
+  look for `Faster`, `Slower`, or `Same` annotations
+- **Memory/allocation changes** — check `Allocated` column if
+  `[MemoryDiagnoser]` is enabled
+
+> **AI-generated content disclosure:** When posting bot invocation comments
+> under a user's credentials (not a bot account), include a visible note that
+> the content was AI/Copilot-generated.
+
+---
+
+## Workflow 2: Regression Investigation
+
+Use this when a performance regression has been reported — whether from
+`performanceautofiler[bot]`, a customer report, or a cross-release comparison.
+
+### Overview
+
+The investigation follows three phases:
+
+1. **Validate** — Confirm the regression is real and reproducible
+2. **Narrow** — Reduce the commit range to a manageable size
+3. **Bisect** — Binary-search for the culprit commit
+
+For the full methodology, including feasibility checks, commit range
+identification, and step-by-step bisection instructions, see the
+[bisection guide](references/bisection-guide.md).
+
+For details on building the runtime, using CoreRun, and running BenchmarkDotNet
+against private builds, see the
+[local benchmarking guide](references/local-benchmarking.md).
+
+### Quick Path: Use Bots Instead of Local Bisection
+
+If the regression range is narrow (a few commits) or the environment doesn't
+support local builds, you can use bots to validate specific commits without
+building locally:
+
+```
+@EgorBot -amd -commits {good-sha},{bad-sha}
+```
+
+Or with @MihuBot for existing benchmarks:
+
+```
+@MihuBot benchmark <filter> https://github.com/dotnet/runtime/compare/{good-sha}...{bad-sha}
+```
+
+This won't perform a full bisect, but it can confirm whether the regression
+exists and help narrow the range.
+
+### Reporting Results
+
+After completing the investigation, include in your report:
+
+- Whether the regression was **confirmed** or **not reproduced**
+- The **culprit commit/PR** (if bisection was performed)
+- **Root cause analysis** — why the change caused the regression
+- **Severity assessment** — Test/Base ratio, number of affected benchmarks,
+  user impact
+
+---
+
+## Workflow 3: JIT Diff Analysis
+
+Use this when a change affects JIT code generation and you want to see how it
+changes the emitted machine code across the entire BCL.
+
+### Invoke MihuBot for JIT Diffs
+
+Post a comment on the PR:
+
+```
+@MihuBot
+```
+
+MihuBot generates comprehensive JIT diffs showing codegen regressions and
+improvements. For ARM64-specific diffs or tier-0 analysis:
+
+```
+@MihuBot -arm -tier0
+```
+
+See [MihuBot reference](references/mihubot-reference.md) for the full JIT diff
+options, including `-nocctors`, `-includeKnownNoise`, and others.
+
+### Interpreting JIT Diffs
+
+MihuBot reports include:
+
+- **Code size changes** — total bytes added/removed across all methods
+- **Per-method diffs** — individual methods that changed, with before/after
+  assembly
+- **Regressions vs improvements** — clearly separated sections
+
+A small increase in code size across many methods may indicate a JIT change with
+broad impact. A large increase in a few methods may indicate a targeted
+optimization that trades code size for speed (or a regression).
+
+---
+
+## Writing Good Benchmarks
+
+These guidelines apply whether you're writing a benchmark for EgorBot, for
+local validation, or for contribution to the dotnet/performance repo.
+
+For comprehensive guidance, see the
+[Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md).
+
+### Key Principles
+
+- **Move initialization to `[GlobalSetup]`** — separate setup from the measured
+  code to avoid measuring allocation/initialization overhead
+- **Return values** from benchmark methods to prevent dead code elimination
+- **Avoid manual loops** — BenchmarkDotNet invokes the benchmark many times
+  automatically; adding loops distorts measurements
+- **No side effects** — benchmarks should be pure and produce consistent results
+- **Focus on common cases** — benchmark hot paths and typical usage, not edge
+  cases
+- **Use consistent input data** — always use the same test data for reproducible
+  comparisons
+
+### Benchmark Class Requirements
+
+- Must be `public`
+- Must be a `class` (not struct)
+- Must not be `sealed`
+- Must not be `static`
+
+### Avoid `[DisassemblyDiagnoser]`
+
+It causes crashes on Linux. To get disassembly, use the `--envvars` option
+instead:
+
+```
+@EgorBot -amd --envvars DOTNET_JitDisasm:MethodName
+```
+
+### Example: Comparing Two Implementations
+
+```csharp
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Running;
+
+BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
+
+[MemoryDiagnoser]
+public class Bench
+{
+    private string _testString = default!;
+
+    [Params(10, 100, 1000)]
+    public int Length { get; set; }
+
+    [GlobalSetup]
+    public void Setup()
+    {
+        _testString = new string('a', Length);
+    }
+
+    [Benchmark]
+    public int StringOperation()
+    {
+        return _testString.IndexOf('z');
+    }
+}
+```
+
+---
+
+## External Resources
+
+- [dotnet/performance repository](https://github.com/dotnet/performance) —
+  central location for all .NET runtime benchmarks
+- [Benchmarking workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
+- [Profiling workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md)
+- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md)
+- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html)
+- [Performance guidelines](../../../../docs/project/performance-guidelines.md) —
+  project-wide performance policy
+
+## Related Skills
+
+| Condition | Skill | When to use |
+|-----------|-------|-------------|
+| Triaging a performance regression issue | **issue-triage** | For the full triage workflow (assessment, recommendation, labels) |
+| Fix PR linked to the regression | **code-review** | To review the fix for correctness and consistency |
+| JIT regression test needed | **jit-regression-test** | To extract a JIT regression test from the issue |
diff --git a/.github/skills/performance-investigation/evals/evals.json b/.github/skills/performance-investigation/evals/evals.json
new file mode 100644
index 00000000000000..bfe6a78f99d9e6
--- /dev/null
+++ b/.github/skills/performance-investigation/evals/evals.json
@@ -0,0 +1,206 @@
+{
+  "skill_name": "performance-investigation",
+  "evals": [
+    {
+      "id": 1,
+      "name": "pr-benchmark-request",
+      "prompt": "Can you benchmark PR https://github.com/dotnet/runtime/pull/121223 to check for performance impact?",
+      "expected_output": "Should follow Workflow 1 (PR Benchmark Validation). Should write a BenchmarkDotNet benchmark targeting the changed code and invoke @EgorBot to run it on the PR.",
+      "assertions": [
+        {
+          "name": "uses-workflow-1",
+          "description": "Follows the PR benchmark validation workflow",
+          "type": "contains_any",
+          "check": ["Workflow 1", "PR Benchmark", "benchmark"]
+        },
+        {
+          "name": "writes-benchmark",
+          "description": "Creates or references a BenchmarkDotNet benchmark",
+          "type": "contains_any",
+          "check": ["[Benchmark]", "BenchmarkDotNet", "BenchmarkSwitcher"]
+        },
+        {
+          "name": "invokes-bot",
+          "description": "Invokes EgorBot or MihuBot to run the benchmark",
+          "type": "contains_any",
+          "check": ["@EgorBot", "@MihuBot"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 2,
+      "name": "perf-regression-autobot",
+      "prompt": "Investigate this performance regression: https://github.com/dotnet/runtime/issues/114625",
+      "expected_output": "Should follow Workflow 2 (Regression Investigation). Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and attempt validation or bisection.",
+      "assertions": [
+        {
+          "name": "uses-workflow-2",
+          "description": "Follows the regression investigation workflow",
+          "type": "contains_any",
+          "check": ["Workflow 2", "Regression", "regression", "investigate"]
+        },
+        {
+          "name": "identifies-commits",
+          "description": "Identifies or references baseline/compare commits from the bot report",
+          "type": "contains_any",
+          "check": ["commit", "SHA", "baseline", "compare", "bisect"]
+        },
+        {
+          "name": "assesses-severity",
+          "description": "Assesses the regression severity using the ratio",
+          "type": "contains_any",
+          "check": ["ratio", "severity", "Test/Base", "slower", "regression"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 3,
+      "name": "jit-diff-request",
+      "prompt": "Can you generate JIT diffs for my PR that changes the JIT compiler?",
+      "expected_output": "Should follow Workflow 3 (JIT Diff Analysis). Should invoke @MihuBot to generate JIT diffs and explain how to interpret the results.",
+      "assertions": [
+        {
+          "name": "uses-mihubot",
+          "description": "Invokes MihuBot for JIT diffs",
+          "type": "contains",
+          "check": "@MihuBot"
+        },
+        {
+          "name": "mentions-jit-diffs",
+          "description": "References JIT diff generation",
+          "type": "contains_any",
+          "check": ["JIT diff", "jit-diff", "codegen", "code size"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 4,
+      "name": "benchmark-with-corerun",
+      "prompt": "How do I benchmark my local runtime changes against the main branch?",
+      "expected_output": "Should explain how to build dotnet/runtime, obtain CoreRun from the testhost folder, and run BenchmarkDotNet with the --coreRun argument to compare private builds.",
+      "assertions": [
+        {
+          "name": "mentions-corerun",
+          "description": "Explains CoreRun as the mechanism for benchmarking private builds",
+          "type": "contains_any",
+          "check": ["CoreRun", "coreRun", "--coreRun", "testhost"]
+        },
+        {
+          "name": "mentions-build",
+          "description": "References building the runtime",
+          "type": "contains_any",
+          "check": ["clr+libs", "build.cmd", "build.sh"]
+        },
+        {
+          "name": "mentions-bdn",
+          "description": "References BenchmarkDotNet for running the benchmarks",
+          "type": "contains_any",
+          "check": ["BenchmarkDotNet", "BDN", "[Benchmark]"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 5,
+      "name": "existing-benchmarks-request",
+      "prompt": "Run the existing Regex benchmarks from dotnet/performance against PR https://github.com/dotnet/runtime/pull/124628",
+      "expected_output": "Should use @MihuBot benchmark command to run existing benchmarks from the dotnet/performance repo rather than writing custom benchmark code.",
+      "assertions": [
+        {
+          "name": "uses-mihubot-benchmark",
+          "description": "Uses MihuBot's benchmark command for existing benchmarks",
+          "type": "contains",
+          "check": "@MihuBot benchmark"
+        },
+        {
+          "name": "references-perf-repo",
+          "description": "References the dotnet/performance repository",
+          "type": "contains_any",
+          "check": ["dotnet/performance", "performance repo"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 6,
+      "name": "cross-release-regression",
+      "prompt": "A user reports that string.IndexOf is 2x slower in .NET 10 compared to .NET 9. How should we investigate?",
+      "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression. Should reference both local investigation and bot-based approaches.",
+      "assertions": [
+        {
+          "name": "mentions-merge-base",
+          "description": "Explains using git merge-base for cross-release bisection",
+          "type": "contains_any",
+          "check": ["merge-base", "release branch", "snap point"]
+        },
+        {
+          "name": "mentions-benchmark-creation",
+          "description": "Suggests creating a benchmark for the reported scenario",
+          "type": "contains_any",
+          "check": ["benchmark", "BenchmarkDotNet", "[Benchmark]", "standalone"]
+        },
+        {
+          "name": "mentions-bisect",
+          "description": "References git bisect as part of the investigation",
+          "type": "contains_any",
+          "check": ["bisect", "git bisect", "binary search"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 7,
+      "name": "compare-specific-commits",
+      "prompt": "Compare the performance of commits abc1234 and def5678 for the System.Text.Json benchmarks",
+      "expected_output": "Should invoke @EgorBot with -commits to compare the two specific commits, or use @MihuBot benchmark with a compare URL.",
+      "assertions": [
+        {
+          "name": "uses-commits-flag",
+          "description": "Uses the -commits option or compare URL to specify the commits",
+          "type": "contains_any",
+          "check": ["-commits", "compare", "abc1234", "def5678"]
+        },
+        {
+          "name": "invokes-bot",
+          "description": "Invokes EgorBot or MihuBot to run the comparison",
+          "type": "contains_any",
+          "check": ["@EgorBot", "@MihuBot"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 8,
+      "name": "not-applicable-bug-issue",
+      "prompt": "Can you check the performance impact of https://github.com/dotnet/runtime/issues/46088",
+      "expected_output": "Should recognize this is a functional bug (System.Text.Json does not support constructors with byref parameters), not a performance issue. Should indicate that performance benchmarking is not applicable here.",
+      "assertions": [
+        {
+          "name": "identifies-not-perf",
+          "description": "Recognizes this is not a performance issue",
+          "type": "contains_any",
+          "check": ["not a performance", "not performance-related", "no performance", "functional", "not applicable", "does not apply", "isn't a performance"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 9,
+      "name": "not-applicable-doc-pr",
+      "prompt": "Benchmark the changes in PR https://github.com/dotnet/runtime/pull/124592 to validate performance",
+      "expected_output": "Should recognize this is a documentation-only PR (adding XML docs to DI extension methods) and that benchmarking is not applicable or meaningful for documentation changes.",
+      "assertions": [
+        {
+          "name": "identifies-doc-only",
+          "description": "Recognizes this is a documentation/non-functional change where benchmarking is not meaningful",
+          "type": "contains_any",
+          "check": ["documentation", "doc", "no functional", "no code change", "not applicable", "does not apply", "no performance impact", "not meaningful", "wouldn't affect", "won't affect", "no runtime"]
+        }
+      ],
+      "files": []
+    }
+  ]
+}
diff --git a/.github/skills/performance-investigation/references/bisection-guide.md b/.github/skills/performance-investigation/references/bisection-guide.md
new file mode 100644
index 00000000000000..5019c64b389b08
--- /dev/null
+++ b/.github/skills/performance-investigation/references/bisection-guide.md
@@ -0,0 +1,173 @@
+# Git Bisect for Performance Regressions
+
+This guide covers how to use `git bisect` to find the exact commit that
+introduced a performance regression. It's a 3-phase process: validate the
+regression, narrow the commit range, then bisect.
+
+## Feasibility Check
+
+Before investing time in bisection, assess whether the current environment can
+support the investigation. Full bisection requires building dotnet/runtime at
+multiple commits (each build takes 5–40 minutes) and running benchmarks, which
+is resource-intensive.
+
+| Factor | Feasible | Not feasible |
+|--------|----------|--------------|
+| **Disk space** | >50 GB free (multiple builds) | <20 GB free |
+| **Build time budget** | Willing to wait 30–60+ min | Quick-turnaround expected |
+| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) |
+| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits |
+| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware |
+
+### When full bisection is not feasible
+
+Use a **lightweight analysis** path instead:
+
+1. **Analyze `git log`** — Review commits in the regression range
+   (`git log --oneline {good}..{bad}`) and identify changes to the affected code
+   path. Look for algorithmic changes, removed optimizations, added validation,
+   or new allocations.
+2. **Check PR descriptions** — For each suspicious commit, read the associated
+   PR description and review comments. Performance trade-offs are often discussed
+   there.
+3. **Narrow by code path** — Use `git log --oneline {good}..{bad} -- path/` to
+   filter commits to the affected library or component.
+4. **Report the narrowed range** — Include the list of candidate commits/PRs with
+   an explanation of why each is suspicious. This gives maintainers a head start
+   even without a definitive bisect result.
+
+Note in the report that full bisection was not attempted and why.
+
+## Identifying the Bisect Range
+
+Determine the good and bad commits that bound the regression.
+
+### Automated bot issues (`performanceautofiler`)
+
+Issues from `performanceautofiler[bot]` follow a standard format:
+
+- **Run Information** — Baseline commit, Compare commit, diff link, OS, arch,
+  and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`).
+- **Regression tables** — Each table shows benchmark name, Baseline time, Test
+  time, and Test/Base ratio. A ratio >1.0 indicates a regression.
+- **Repro commands** — Typically:
+  ```
+  git clone https://github.com/dotnet/performance.git
+  python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*'
+  ```
+- **Graphs** — Time-series graphs showing when the regression appeared.
+
+Key fields to extract:
+
+- The **Baseline** and **Compare** commit SHAs — these define the bisect range.
+- The **benchmark filter** — the `--filter` argument to reproduce the benchmark.
+- The **Test/Base ratio** — how severe the regression is (>1.5× is significant).
+
+### Customer reports
+
+When a customer reports a regression (e.g., "X is slower on .NET 10 than
+.NET 9"), there are no pre-defined commit SHAs. Determine the bisect range using
+the cross-release approach below.
+
+### Cross-release regressions
+
+When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect on
+the `main` branch between the commits from which the release branches were
+snapped. Release branches in dotnet/runtime are
+[snapped from main](../../../../docs/project/branching-guide.md).
+
+Find the snap points with `git merge-base`:
+
+```
+git merge-base main release/9.0    # → good commit (last common ancestor)
+git merge-base main release/10.0   # → bad commit
+```
+
+Use the resulting SHAs as the good/bad boundaries for bisection on `main`. This
+avoids bisecting across release branches where cherry-picks and backports make
+the history non-linear.
+
+## Phase 1: Validate the Regression
+
+Before bisecting, confirm the regression is reproducible. Create a standalone
+BenchmarkDotNet project (see
+[local benchmarking guide](local-benchmarking.md#creating-a-standalone-benchmark-project)),
+build the runtime at the good and bad commits, and compare results.
+
+If the regression is not reproducible locally, check for environment differences
+(OS, arch, CPU model) and note this in your report. Consider using
+[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead.
+
+## Phase 2: Narrow the Commit Range
+
+If the bisect range spans many commits, narrow it before running a full bisect:
+
+1. **Check `git log --oneline {good}..{bad}`** — how many commits are in the
+   range? If more than ~200, narrow first.
+2. **Test midpoint commits manually** — pick a commit in the middle of the range,
+   build, run the benchmark, and determine if it is good or bad. This halves the
+   range in one step.
+3. **For cross-release regressions** — use the `git merge-base` snap points. If
+   the range between two release snap points is still large, test at intermediate
+   release preview tags to narrow further.
+
+## Phase 3: Git Bisect
+
+Once you have a manageable commit range, use `git bisect` to binary-search for
+the culprit.
+
+### Bisect workflow
+
+At each step:
+
+1. **Rebuild the affected component** — use incremental builds where possible
+   (see [incremental rebuilds](local-benchmarking.md#incremental-rebuilds)).
+2. **Run the standalone benchmark** with the freshly-built CoreRun:
+   ```
+   cd PerfRepro
+   dotnet run -c Release -f net{ver} -- \
+       --filter '*' \
+       --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun
+   ```
+3. **Determine good or bad** — compare the result against your threshold.
+
+**Exit codes for `git bisect run`:**
+- `0` — good (no regression at this commit)
+- `1`–`124` — bad (regression present)
+- `125` — skip (build failure or untestable commit)
+
+The standalone benchmark project must be **outside the dotnet/runtime tree**
+since `git bisect` checks out different commits which would overwrite in-tree
+files. Place it in a stable location (e.g., `/tmp/bisect/`).
+
+### Run the bisect
+
+```
+cd /path/to/runtime
+git bisect start {bad-sha} {good-sha}
+git bisect run /path/to/bisect-script.sh
+```
+
+**Time estimate:** Each bisect step requires a rebuild + benchmark run.
+For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect
+roughly 50 minutes for the full bisect.
+
+### After bisect completes
+
+`git bisect` outputs the first bad commit. Run `git bisect reset` to return to
+the original branch.
+
+## Root Cause Analysis
+
+Include the following in your report:
+
+1. **The culprit commit or PR** — link to the specific commit SHA and its
+   associated PR. Explain how the change relates to the regressing benchmark.
+2. **Root cause analysis** — describe *why* the change caused the regression
+   (e.g., an algorithm change, a removed optimization, additional validation
+   overhead).
+3. **If the root cause spans multiple PRs** — sometimes a regression results
+   from the combined effect of several changes and `git bisect` lands on a
+   commit that is only one contributing factor. In this case, report the
+   narrowest commit range and list the PRs within that range that appear
+   relevant to the affected code path.
diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md
new file mode 100644
index 00000000000000..f12b37f45cf3a6
--- /dev/null
+++ b/.github/skills/performance-investigation/references/egorbot-reference.md
@@ -0,0 +1,73 @@
+# EgorBot Reference
+
+[EgorBot](https://github.com/EgorBo/EgorBot) is a benchmark-as-a-service bot for
+[dotnet/runtime](https://github.com/dotnet/runtime). It runs BenchmarkDotNet
+microbenchmarks on dedicated hardware and posts results back as GitHub comments.
+Its primary use case is comparing performance before and after a change — either
+across a PR or between specific commits.
+
+For the full and up-to-date command reference (targets, options, defaults),
+see the [EgorBot manual](https://github.com/EgorBo/EgorBot).
+
+## Command Format
+
+Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a
+fenced code block (`` ```cs ``) in the same comment.
+
+```
+@EgorBot [targets...] [options...] [BDN arguments...]
+```
+
+> **Formatting rules:**
+> - The `@EgorBot` command must be **outside** the code block.
+> - Only benchmark source code belongs inside the code block.
+> - Do not place text between the `@EgorBot` line and the code block — EgorBot
+>   treats it as additional command arguments.
+
+## Examples
+
+Compare a PR against its base branch on AMD and Apple Silicon:
+
+```
+@EgorBot -amd -arm
+```
+
+Compare two specific commits:
+
+```
+@EgorBot -amd -commits abc1234,def5678
+```
+
+Compare a commit against its parent:
+
+```
+@EgorBot -arm -commits abc1234,abc1234~1
+```
+
+Compare a range of commits for a specific benchmark filter:
+
+```
+@EgorBot -arm -commits abc1234...def5678 --filter "*MyBench*"
+```
+
+## Practical Notes
+
+- **Default target:** If no target is specified, runs on Apple Silicon via Helix.
+- **PR mode:** When posting in a PR without `-commits`, EgorBot automatically
+  compares the PR branch against the base branch.
+- **No code block:** If no code block is provided, EgorBot runs benchmarks from
+  the [dotnet/performance](https://github.com/dotnet/performance) repo instead.
+- **Response time:** EgorBot uses polling and may take up to 30 seconds to
+  acknowledge the request.
+- **Supported repositories:** `dotnet/runtime` and `EgorBot/runtime-utils`.
+- **Result variability:** Results can vary between runs due to VM differences.
+  Do not compare results across different architectures or cloud providers.
+- **AI-generated content disclosure:** When posting EgorBot comments under a
+  user's credentials (not a bot account), include a visible note that the
+  content was AI/Copilot-generated.
+
+## Links
+
+- [EgorBot manual](https://github.com/EgorBo/EgorBot) — full target list,
+  options, and usage documentation
+- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html)
diff --git a/.github/skills/performance-investigation/references/local-benchmarking.md b/.github/skills/performance-investigation/references/local-benchmarking.md
new file mode 100644
index 00000000000000..be9d32fdbfc2cf
--- /dev/null
+++ b/.github/skills/performance-investigation/references/local-benchmarking.md
@@ -0,0 +1,140 @@
+# Local Benchmarking with Private Runtime Builds
+
+This guide covers how to benchmark dotnet/runtime changes locally using
+BenchmarkDotNet and privately-built runtime binaries (CoreRun). This approach
+lets you measure performance without installing a custom SDK — BenchmarkDotNet
+loads the locally-built runtime directly.
+
+> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd`
+> on Windows or `./build.sh` on Linux/macOS. Other shell commands use
+> Linux/macOS syntax. On Windows, adapt accordingly (use `Copy-Item` or `xcopy`,
+> backslash paths, backtick line continuation).
+
+## Building dotnet/runtime and Obtaining CoreRun
+
+Build the runtime at the commit you want to test:
+
+```
+build.cmd/sh clr+libs -c release
+```
+
+The key artifact is the **testhost** folder containing **CoreRun** at:
+
+```
+artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/
+```
+
+CoreRun is a lightweight host that loads the locally-built runtime and
+libraries. BenchmarkDotNet uses it via the `--coreRun` argument to benchmark
+private builds without installing them as SDKs.
+
+## Creating a Standalone Benchmark Project
+
+For regression validation and bisection, use a standalone BenchmarkDotNet
+project rather than the full [dotnet/performance](https://github.com/dotnet/performance)
+repo. Standalone projects are faster to build, easier to iterate on, and more
+reliable across different runtime commits.
+
+### From an automated bot issue
+
+Copy the relevant benchmark class from the `dotnet/performance` repo:
+
+1. Clone `dotnet/performance` and locate the benchmark class referenced in the
+   issue's `--filter` argument.
+2. Create a new console project:
+   ```
+   mkdir PerfRepro && cd PerfRepro
+   dotnet new console
+   dotnet add package BenchmarkDotNet
+   ```
+3. Copy the benchmark class (and any helper types) into the project. Adjust
+   namespaces and usings as needed.
+4. Add a `Program.cs` entry point:
+   ```csharp
+   BenchmarkDotNet.Running.BenchmarkSwitcher
+       .FromAssembly(typeof(Program).Assembly)
+       .Run(args);
+   ```
+
+### From a customer report
+
+Write a minimal BenchmarkDotNet benchmark that exercises the reported code path:
+
+1. Create a new console project with `BenchmarkDotNet` as above.
+2. Write a `[Benchmark]` method that calls the API or runs the workload the
+   customer identified as slow.
+3. If the customer provided sample code, adapt it into a proper BDN benchmark
+   with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path.
+
+## Comparing Good and Bad Commits
+
+Build dotnet/runtime at both the good and bad commits, saving each testhost
+folder:
+
+```
+git checkout {bad-sha}
+build.cmd/sh clr+libs -c release
+cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad
+
+git checkout {good-sha}
+build.cmd/sh clr+libs -c release
+cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good
+```
+
+Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares them
+side-by-side when given multiple `--coreRun` paths (the first is treated as the
+baseline):
+
+```
+cd PerfRepro
+dotnet run -c Release -f net{ver} -- \
+    --filter '*' \
+    --coreRun /tmp/corerun-good/.../CoreRun \
+              /tmp/corerun-bad/.../CoreRun
+```
+
+To add a statistical significance column, append `--statisticalTest 5%`. This
+performs a Mann–Whitney U test and marks results as `Faster`, `Slower`, or
+`Same`.
+
+## Interpreting Results
+
+| Outcome | Meaning | Next step |
+|---------|---------|-----------|
+| `Slower` with ratio >1.10 | Regression confirmed | Proceed to bisection |
+| `Slower` with ratio 1.05–1.10 | Small regression — likely real but needs confirmation | Re-run with `--iterationCount 30`. If it persists, treat as confirmed. |
+| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. |
+| `Slower` but ratio <1.05 | Marginal — may be noise | Re-run with `--iterationCount 30`. If still marginal, note as inconclusive. |
+
+## Using ResultsComparer
+
+For a thorough comparison of saved BDN result files, use the
+[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer)
+tool:
+
+```
+dotnet run --project performance/src/tools/ResultsComparer \
+    --base /path/to/baseline-results \
+    --diff /path/to/compare-results \
+    --threshold 5%
+```
+
+## Incremental Rebuilds
+
+Full rebuilds are slow. Minimize per-step build time by rebuilding only the
+affected component:
+
+| Component changed | Fast rebuild command |
+|-------------------|---------------------|
+| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` |
+| CoreLib | `build.cmd/sh clr.corelib -c Release` |
+| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` |
+| All libraries | `build.cmd/sh libs -c Release` |
+
+After an incremental library rebuild, the updated DLL is placed in the testhost
+folder automatically. CoreRun picks up the new version on the next benchmark
+run.
+
+**Caveat:** If a rebuild crosses a commit that changes the build infrastructure
+(e.g., SDK version bump in `global.json`), the incremental build may fail. In a
+`git bisect` context, use exit code `125` (skip) to handle this gracefully.
diff --git a/.github/skills/performance-investigation/references/mihubot-reference.md b/.github/skills/performance-investigation/references/mihubot-reference.md
new file mode 100644
index 00000000000000..458d4fd06bbb4c
--- /dev/null
+++ b/.github/skills/performance-investigation/references/mihubot-reference.md
@@ -0,0 +1,66 @@
+# MihuBot Reference
+
+[MihuBot](https://github.com/MihuBot/runtime-utils) provides several
+performance-related services for dotnet/runtime: JIT diff generation, benchmark
+execution from the [dotnet/performance](https://github.com/dotnet/performance)
+repo, library fuzzing, and regex source generator diffs. It also has a
+[web interface](https://mihubot.xyz/runtime-utils) for submitting jobs.
+
+For full and up-to-date option details, see the
+[MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) repository.
+
+## JIT Diff Generation
+
+Generate JIT diffs between a PR and its base branch to see how a change affects
+the generated machine code across the BCL.
+
+```
+@MihuBot
+@MihuBot -arm -tier0
+```
+
+## Running Benchmarks from dotnet/performance
+
+Run existing benchmarks from the
+[dotnet/performance](https://github.com/dotnet/performance) repository without
+writing custom benchmark code.
+
+```
+@MihuBot benchmark Regex
+@MihuBot benchmark GetUnicodeCategory https://github.com/dotnet/runtime/compare/4bb0bcd...c74440f
+```
+
+## Library Fuzzer
+
+Run fuzz testing on a library:
+
+```
+@MihuBot fuzz SearchValues
+@MihuBot fuzz SearchValues -dependsOn #107206
+```
+
+## Regex Source Generator Diffs
+
+Generate diffs for regex source generator output and JIT diffs for the
+generated code:
+
+```
+@MihuBot regexdiff
+@MihuBot regexdiff -arm
+```
+
+## Common Options
+
+Most MihuBot job types support options like `-arm`, `-intel`, `-fast`,
+`-dependsOn <prs>`, and `-combineWith <prs>`. For example:
+
+```
+@MihuBot -arm -hetzner -combineWith #1000,#1001
+```
+
+## Links
+
+- [MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) — full
+  documentation and option reference
+- [Web interface](https://mihubot.xyz/runtime-utils) for submitting jobs
+  directly

From d1c70b9894eea96ca746bda45abc501de10b167a Mon Sep 17 00:00:00 2001
From: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Date: Mon, 23 Mar 2026 18:40:20 +0200
Subject: [PATCH 2/5] Remove AI-generated content disclosure from skill
 (already in copilot-instructions.md)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/skills/performance-investigation/SKILL.md             | 4 ----
 .../performance-investigation/references/egorbot-reference.md | 3 ---
 2 files changed, 7 deletions(-)

diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md
index f4d4d5187be662..96e1178db4da42 100644
--- a/.github/skills/performance-investigation/SKILL.md
+++ b/.github/skills/performance-investigation/SKILL.md
@@ -120,10 +120,6 @@ EgorBot and MihuBot post results as PR comments. Look for:
 - **Memory/allocation changes** — check `Allocated` column if
   `[MemoryDiagnoser]` is enabled
 
-> **AI-generated content disclosure:** When posting bot invocation comments
-> under a user's credentials (not a bot account), include a visible note that
-> the content was AI/Copilot-generated.
-
 ---
 
 ## Workflow 2: Regression Investigation
diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md
index f12b37f45cf3a6..abc1085ce56bc8 100644
--- a/.github/skills/performance-investigation/references/egorbot-reference.md
+++ b/.github/skills/performance-investigation/references/egorbot-reference.md
@@ -62,9 +62,6 @@ Compare a range of commits for a specific benchmark filter:
 - **Supported repositories:** `dotnet/runtime` and `EgorBot/runtime-utils`.
 - **Result variability:** Results can vary between runs due to VM differences.
   Do not compare results across different architectures or cloud providers.
-- **AI-generated content disclosure:** When posting EgorBot comments under a
-  user's credentials (not a bot account), include a visible note that the
-  content was AI/Copilot-generated.
 
 ## Links
 

From 262c0f9b7ff9cc3f9339ac1e5b59063f53987021 Mon Sep 17 00:00:00 2001
From: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Date: Mon, 23 Mar 2026 19:05:30 +0200
Subject: [PATCH 3/5] Fix code fence formatting in SKILL.md and
 egorbot-reference.md

Use quadruple-backtick outer fence for nested code blocks and simplify
inline backtick formatting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/skills/performance-investigation/SKILL.md           | 6 +++---
 .../references/egorbot-reference.md                         | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md
index 96e1178db4da42..24cc03c507040b 100644
--- a/.github/skills/performance-investigation/SKILL.md
+++ b/.github/skills/performance-investigation/SKILL.md
@@ -82,13 +82,13 @@ public class Bench
 
 Post a comment on the PR:
 
-```
+````
 @EgorBot -amd -arm
 
-​```cs
+```cs
 // Your benchmark code here
-​```
 ```
+````
 
 EgorBot builds dotnet/runtime for the PR and base branch, runs the benchmark on
 dedicated hardware, and posts BDN results back as a comment.
diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md
index abc1085ce56bc8..ae45ecc2cc062c 100644
--- a/.github/skills/performance-investigation/references/egorbot-reference.md
+++ b/.github/skills/performance-investigation/references/egorbot-reference.md
@@ -12,7 +12,7 @@ see the [EgorBot manual](https://github.com/EgorBo/EgorBot).
 ## Command Format
 
 Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a
-fenced code block (`` ```cs ``) in the same comment.
+fenced ` ```cs ` code block in the same comment.
 
 ```
 @EgorBot [targets...] [options...] [BDN arguments...]

From 3dbdf2e1435d3047e7e57453a83863e8ee75d88c Mon Sep 17 00:00:00 2001
From: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Date: Mon, 23 Mar 2026 19:20:30 +0200
Subject: [PATCH 4/5] Address review feedback: formatting, path clarifications,
 CoreLib caveat

- Fix inline code formatting for fenced block marker in egorbot-reference.md
- Remove specific MihuBot option names from SKILL.md (not in reference doc)
- Clarify testhost vs coreclr CoreRun path distinction in local-benchmarking.md
- Expand bisection-guide.md CoreRun path to full testhost path
- Add CoreLib libs.pretest caveat for incremental rebuilds

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .../skills/performance-investigation/SKILL.md    |  4 ++--
 .../references/bisection-guide.md                |  7 +++++--
 .../references/egorbot-reference.md              |  3 ++-
 .../references/local-benchmarking.md             | 16 ++++++++++++----
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md
index 24cc03c507040b..17ffeb2dcc1f0d 100644
--- a/.github/skills/performance-investigation/SKILL.md
+++ b/.github/skills/performance-investigation/SKILL.md
@@ -194,8 +194,8 @@ improvements. For ARM64-specific diffs or tier-0 analysis:
 @MihuBot -arm -tier0
 ```
 
-See [MihuBot reference](references/mihubot-reference.md) for the full JIT diff
-options, including `-nocctors`, `-includeKnownNoise`, and others.
+See [MihuBot reference](references/mihubot-reference.md) for the full set of JIT
+diff options and usage guidance.
 
 ### Interpreting JIT Diffs
 
diff --git a/.github/skills/performance-investigation/references/bisection-guide.md b/.github/skills/performance-investigation/references/bisection-guide.md
index 5019c64b389b08..152858f18dd524 100644
--- a/.github/skills/performance-investigation/references/bisection-guide.md
+++ b/.github/skills/performance-investigation/references/bisection-guide.md
@@ -122,12 +122,15 @@ At each step:
 
 1. **Rebuild the affected component** — use incremental builds where possible
    (see [incremental rebuilds](local-benchmarking.md#incremental-rebuilds)).
-2. **Run the standalone benchmark** with the freshly-built CoreRun:
+2. **Run the standalone benchmark** with the freshly-built CoreRun from the
+   testhost folder (see
+   [local benchmarking guide](local-benchmarking.md#building-dotnet-runtime-and-obtaining-corerun)
+   for the exact path):
    ```
    cd PerfRepro
    dotnet run -c Release -f net{ver} -- \
        --filter '*' \
-       --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun
+       --coreRun {runtime}/artifacts/bin/testhost/net{ver}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{ver}/CoreRun
    ```
 3. **Determine good or bad** — compare the result against your threshold.
 
diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md
index ae45ecc2cc062c..39ed2e8ab81774 100644
--- a/.github/skills/performance-investigation/references/egorbot-reference.md
+++ b/.github/skills/performance-investigation/references/egorbot-reference.md
@@ -12,7 +12,8 @@ see the [EgorBot manual](https://github.com/EgorBo/EgorBot).
 ## Command Format
 
 Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a
-fenced ` ```cs ` code block in the same comment.
+fenced C# code block (a code fence that begins with three backticks followed
+by `cs`) in the same comment.
 
 ```
 @EgorBot [targets...] [options...] [BDN arguments...]
diff --git a/.github/skills/performance-investigation/references/local-benchmarking.md b/.github/skills/performance-investigation/references/local-benchmarking.md
index be9d32fdbfc2cf..d4b2ff38329aea 100644
--- a/.github/skills/performance-investigation/references/local-benchmarking.md
+++ b/.github/skills/performance-investigation/references/local-benchmarking.md
@@ -24,6 +24,10 @@ The key artifact is the **testhost** folder containing **CoreRun** at:
 artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/
 ```
 
+> **Note:** This is different from the bare `corerun` binary under
+> `artifacts/bin/coreclr/`. BenchmarkDotNet needs the testhost layout because
+> it contains both CoreRun and the complete framework assemblies side-by-side.
+
 CoreRun is a lightweight host that loads the locally-built runtime and
 libraries. BenchmarkDotNet uses it via the `--coreRun` argument to benchmark
 private builds without installing them as SDKs.
@@ -127,13 +131,17 @@ affected component:
 | Component changed | Fast rebuild command |
 |-------------------|---------------------|
 | A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` |
-| CoreLib | `build.cmd/sh clr.corelib -c Release` |
+| CoreLib | `build.cmd/sh clr.corelib -c Release` followed by `build.cmd/sh libs.pretest -c Release` |
 | CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` |
 | All libraries | `build.cmd/sh libs -c Release` |
 
-After an incremental library rebuild, the updated DLL is placed in the testhost
-folder automatically. CoreRun picks up the new version on the next benchmark
-run.
+After an incremental library rebuild (other than System.Private.CoreLib), the
+updated DLL is placed in the testhost folder automatically. CoreRun picks up
+the new version on the next benchmark run.
+
+For System.Private.CoreLib, you must run `build.cmd/sh libs.pretest -c Release`
+after rebuilding to copy the updated CoreLib into the testhost layout;
+otherwise benchmarks may silently run against the older CoreLib.
 
 **Caveat:** If a rebuild crosses a commit that changes the build infrastructure
 (e.g., SDK version bump in `global.json`), the incremental build may fail. In a

From 5e251d2631bd16c8a85c3c6d78524d05e127d636 Mon Sep 17 00:00:00 2001
From: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Date: Tue, 24 Mar 2026 12:44:53 +0200
Subject: [PATCH 5/5] Refocus skill to local-only investigation, restore
 performance-benchmark

Per PR feedback: keep performance-benchmark as a separate skill for
EgorBot/PR benchmarking. Refocus performance-investigation to local-only
workflows: building CoreRun, comparing commits with BDN, git bisect.

- Restore performance-benchmark/SKILL.md (unchanged from main)
- Revert cross-reference changes to copilot-instructions.md, api-proposal,
  jit-regression-test (these should reference performance-benchmark)
- Remove egorbot-reference.md and mihubot-reference.md (bot territory)
- Rewrite SKILL.md to remove Workflow 1 (PR benchmark) and Workflow 3
  (JIT diffs), keeping only local investigation
- Update issue-triage Related Skills to list both skills
- Update evals to match local-only scope (6 evals)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/copilot-instructions.md               |   2 +-
 .github/skills/api-proposal/SKILL.md          |   2 +-
 .github/skills/issue-triage/SKILL.md          |   3 +-
 .github/skills/jit-regression-test/SKILL.md   |   2 +-
 .github/skills/performance-benchmark/SKILL.md | 191 ++++++++++++++++
 .../skills/performance-investigation/SKILL.md | 208 +++---------------
 .../evals/evals.json                          | 107 ++-------
 .../references/egorbot-reference.md           |  71 ------
 .../references/mihubot-reference.md           |  66 ------
 9 files changed, 240 insertions(+), 412 deletions(-)
 create mode 100644 .github/skills/performance-benchmark/SKILL.md
 delete mode 100644 .github/skills/performance-investigation/references/egorbot-reference.md
 delete mode 100644 .github/skills/performance-investigation/references/mihubot-reference.md

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index b2de17cdd4274a..a23e28a783c9bc 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -14,7 +14,7 @@ When NOT running under CCA, skip the `code-review` skill if the user has stated
 
 Before making changes to a directory, search for `README.md` files in that directory and its parent directories up to the repository root. Read any you find — they contain conventions, patterns, and architectural context relevant to your work.
 
-If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-investigation` skill to validate the impact before completing.
+If the changes are intended to improve performance, or if they could negatively impact performance, use the `performance-benchmark` skill to validate the impact before completing.
 
 You MUST follow all code-formatting and naming conventions defined in [`.editorconfig`](/.editorconfig).
 
diff --git a/.github/skills/api-proposal/SKILL.md b/.github/skills/api-proposal/SKILL.md
index 6c9f2c494fa3d9..8f1905b87d1428 100644
--- a/.github/skills/api-proposal/SKILL.md
+++ b/.github/skills/api-proposal/SKILL.md
@@ -160,7 +160,7 @@ This:
 
 2. **All errors and warnings must be fixed** before proceeding to the draft phase.
 
-3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-investigation** skill.
+3. If the API change could affect performance (hot paths, allocations, new collection types), suggest running the **performance-benchmark** skill.
 
 4. Re-run tests after any review-driven changes to confirm nothing regressed.
 
diff --git a/.github/skills/issue-triage/SKILL.md b/.github/skills/issue-triage/SKILL.md
index b104bfbcb48407..bdc16692ad98ff 100644
--- a/.github/skills/issue-triage/SKILL.md
+++ b/.github/skills/issue-triage/SKILL.md
@@ -521,5 +521,6 @@ depending on the outcome:
 |-----------|-------|-----------------|
 | API proposal recommended as KEEP | **api-proposal** | Offer to draft a formal API proposal with working prototype |
 | Bug report with root cause identified | **jit-regression-test** | If the bug is JIT-related, offer to create a regression test |
-| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression (benchmarking, bisection, JIT diffs) |
+| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression locally (CoreRun builds, bisection) |
+| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks via @EgorBot |
 | Fix PR linked to the issue | **code-review** | Offer to review the fix PR for correctness and consistency |
diff --git a/.github/skills/jit-regression-test/SKILL.md b/.github/skills/jit-regression-test/SKILL.md
index 2e03703531009e..e6cc8f82d58c50 100644
--- a/.github/skills/jit-regression-test/SKILL.md
+++ b/.github/skills/jit-regression-test/SKILL.md
@@ -7,7 +7,7 @@ description: >
   bug", "create a regression test for issue #NNNNN", converting issue repro to
   xunit test. DO NOT USE FOR: non-JIT tests (use standard test patterns),
   debugging JIT issues without a known repro, performance benchmarks (use
-  performance-investigation skill).
+  performance-benchmark skill).
 ---
 
 # JIT Regression Test Extraction
diff --git a/.github/skills/performance-benchmark/SKILL.md b/.github/skills/performance-benchmark/SKILL.md
new file mode 100644
index 00000000000000..9e1b8f0bbf6a31
--- /dev/null
+++ b/.github/skills/performance-benchmark/SKILL.md
@@ -0,0 +1,191 @@
+---
+name: performance-benchmark
+description: Generate and run ad hoc performance benchmarks to validate code changes. Use this when asked to benchmark, profile, or validate the performance impact of a code change in dotnet/runtime.
+---
+
+# Ad Hoc Performance Benchmarking with @EgorBot
+
+When you need to validate the performance impact of a code change, follow this process to write a BenchmarkDotNet benchmark and trigger @EgorBot to run it.
+The bot will notify you when results are ready, so don't wait for them.
+
+## Step 1: Write the Benchmark
+
+Create a BenchmarkDotNet benchmark that tests the specific operation being changed. Follow these guidelines:
+
+### Benchmark Structure
+
+```csharp
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Running;
+
+BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
+
+public class Bench
+{
+    // Add setup/cleanup if needed
+    [GlobalSetup]
+    public void Setup()
+    {
+        // Initialize test data
+    }
+
+    [Benchmark]
+    public void MyOperation()
+    {
+        // Test the operation
+    }
+}
+```
+
+### Best Practices
+
+For comprehensive guidance, see the [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md).
+
+Key principles:
+
+- **Move initialization to `[GlobalSetup]`**: Separate setup logic from the measured code to avoid measuring allocation/initialization overhead
+- **Return values** from benchmark methods to prevent dead code elimination
+- **Avoid loops**: BenchmarkDotNet invokes the benchmark many times automatically; adding manual loops distorts measurements
+- **No side effects**: Benchmarks should be pure and produce consistent results
+- **Focus on common cases**: Benchmark hot paths and typical usage, not edge cases or error paths
+- **Use consistent input data**: Always use the same test data for reproducible comparisons
+- **Avoid `[DisassemblyDiagnoser]`**: It causes crashes on Linux. Use `--envvars DOTNET_JitDisasm:MethodName` instead
+- **Benchmark class requirements**: Must be `public`, not `sealed`, not `static`, and must be a `class` (not struct)
+
+### Example: String Operation Benchmark
+
+```csharp
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Running;
+
+BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
+
+[MemoryDiagnoser]
+public class Bench
+{
+    private string _testString = default!;
+
+    [Params(10, 100, 1000)]
+    public int Length { get; set; }
+
+    [GlobalSetup]
+    public void Setup()
+    {
+        _testString = new string('a', Length);
+    }
+
+    [Benchmark]
+    public int StringOperation()
+    {
+        return _testString.IndexOf('z');
+    }
+}
+```
+
+### Example: Collection Operation Benchmark
+
+```csharp
+using System.Linq;
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Running;
+
+BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
+
+[MemoryDiagnoser]
+public class Bench
+{
+    private int[] _array = default!;
+    private List<int> _list = default!;
+
+    [Params(100, 1000, 10000)]
+    public int Count { get; set; }
+
+    [GlobalSetup]
+    public void Setup()
+    {
+        _array = Enumerable.Range(0, Count).ToArray();
+        _list = _array.ToList();
+    }
+
+    [Benchmark]
+    public bool AnyArray() => _array.Any();
+
+    [Benchmark]
+    public bool AnyList() => _list.Any();
+
+    [Benchmark]
+    public int SumArray() => _array.Sum();
+
+    [Benchmark]
+    public int SumList() => _list.Sum();
+}
+```
+
+## Step 2: Mention @EgorBot in a comment/PR description
+
+Post a comment on the PR to trigger EgorBot with your benchmark. The general format is:
+
+> 📝 **AI-generated content disclosure:** When posting benchmark comments to GitHub under a user's credentials — i.e., the account is **not** a dedicated "copilot" or "bot" account/app — you **MUST** include a concise, visible note (e.g. a `> [!NOTE]` alert) indicating the content was AI/Copilot-generated. Skip this if the user explicitly asks you to omit it.
+
+@EgorBot [targets] [options] [BenchmarkDotNet args]
+
+```cs
+// Your benchmark code here
+```
+> **Note:** When using @EgorBot, follow these formatting rules:
+> - The @EgorBot command must not be inside the code block.
+> - Only the benchmark code should be inside the code block.
+> - Do not place any additional text between the @EgorBot command line and the code block, as EgorBot will treat it as additional command arguments.
+
+### Target Flags
+
+- `-linux_amd`
+- `-linux_intel`
+- `-windows_amd`
+- `-windows_intel`
+- `-linux_arm64`
+- `-osx_arm64` (baremetal, feel free to always include it)
+
+The most common combination is `-linux_amd -osx_arm64`. Do not include more than 4 targets.
+
+### Common Options
+
+Use `-profiler` when absolutely necessary along with `-linux_arm64` and/or `-linux_amd` to include `perf` profiling and disassembly in the results.
+
+### Example: Basic PR Benchmark
+
+To benchmark the current PR changes against the base branch:
+
+@EgorBot -linux_amd -osx_arm64
+
+```cs
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Running;
+
+BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
+
+[MemoryDiagnoser]
+public class Bench
+{
+    [Benchmark]
+    public int MyOperation()
+    {
+        // Your benchmark code
+        return 42;
+    }
+}
+```
+
+## Important Notes
+
+- **Bot response time**: EgorBot uses polling and may take up to 30 seconds to respond
+- **Supported repositories**: EgorBot monitors `dotnet/runtime` and `EgorBot/runtime-utils`
+- **PR mode (default)**: When posting in a PR, EgorBot automatically compares the PR changes against the base branch
+- **Results variability**: Results may vary between runs due to VM differences. Do not compare results across different architectures or cloud providers
+- **Check the manual**: EgorBot replies include a link to the [manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage) for advanced options
+
+## Additional Resources
+
+- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md) - Essential reading for writing effective benchmarks
+- [BenchmarkDotNet CLI Arguments](https://github.com/dotnet/BenchmarkDotNet/blob/master/docs/articles/guides/console-args.md)
+- [EgorBot Manual](https://github.com/EgorBo/EgorBot?tab=readme-ov-file#github-usage)
diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md
index 17ffeb2dcc1f0d..416c2a2278c0b7 100644
--- a/.github/skills/performance-investigation/SKILL.md
+++ b/.github/skills/performance-investigation/SKILL.md
@@ -1,133 +1,40 @@
 ---
 name: performance-investigation
 description: >
-  Investigate performance regressions and validate performance impact of code
-  changes in dotnet/runtime. Use this skill whenever asked to benchmark a PR,
-  investigate a performance regression, validate performance impact, run
-  benchmarks, generate JIT diffs, compare performance between commits, triage
-  a performance issue, or check whether a change improves or regresses
-  performance. Also use when asked about @EgorBot, @MihuBot, BenchmarkDotNet,
-  CoreRun, or dotnet/performance. Covers ad hoc PR benchmarking, deep
-  regression investigation with git bisect, and JIT diff analysis.
+  Investigate performance regressions locally in dotnet/runtime. Use this skill
+  when asked to investigate a performance regression, bisect to find a culprit
+  commit, validate a regression with local builds, compare performance between
+  commits using CoreRun, or benchmark private runtime builds with
+  BenchmarkDotNet. Also use when asked about CoreRun, testhost, or local
+  benchmarking against private builds. DO NOT USE FOR ad hoc PR benchmarking
+  with @EgorBot or @MihuBot (use the performance-benchmark skill instead).
 ---
 
-# Performance Investigation for dotnet/runtime
+# Local Performance Investigation for dotnet/runtime
 
-Investigate performance regressions and validate the performance impact of code
-changes. This skill covers three workflows, from quick PR validation to deep
-regression root-causing.
+Investigate performance regressions locally by building the runtime at specific
+commits, running BenchmarkDotNet with CoreRun, and using git bisect to find
+culprit commits. This skill covers the full local investigation workflow from
+validation to root-causing.
 
 ## When to Use This Skill
 
-- Asked to **benchmark** a PR or validate performance impact of a change
 - Asked to **investigate a performance regression** (from an issue, bot report,
   or customer report)
-- Asked to **generate JIT diffs** or analyze codegen impact
-- Asked to **compare performance** between commits, branches, or releases
+- Asked to **compare performance** between commits, branches, or releases using
+  local builds
+- Asked to **bisect** to find the commit that introduced a regression
+- Asked to **benchmark private runtime builds** using CoreRun
 - Asked to **triage a performance issue** (use alongside the `issue-triage`
   skill for full triage)
 - Given a `tenet-performance` or `tenet-performance-benchmarks` labeled issue
-- Asked how to use `@EgorBot`, `@MihuBot`, BenchmarkDotNet, or CoreRun
+  that requires local investigation
 
-## Choose Your Workflow
+> **Note:** For ad hoc PR benchmarking via @EgorBot or @MihuBot, use the
+> `performance-benchmark` skill instead. This skill focuses on local builds,
+> CoreRun, and git bisect.
 
-| Context | Workflow | What it does |
-|---------|----------|-------------|
-| PR is open and you want to measure its impact | [Workflow 1: PR Benchmark Validation](#workflow-1-pr-benchmark-validation) | Write a benchmark, invoke a bot, get results |
-| A regression has been reported (issue or bot alert) | [Workflow 2: Regression Investigation](#workflow-2-regression-investigation) | Validate, bisect, root-cause |
-| Change affects JIT codegen and you want to see diffs | [Workflow 3: JIT Diff Analysis](#workflow-3-jit-diff-analysis) | Generate JIT diffs via MihuBot |
-
-If you're triaging a performance regression issue, use Workflow 2 for the
-investigation methodology, then return to the `issue-triage` skill for
-triage-specific assessment and recommendation.
-
----
-
-## Workflow 1: PR Benchmark Validation
-
-Use this when a PR is open and you want to measure its performance impact.
-
-### Step 1: Write a BenchmarkDotNet Benchmark
-
-Create a benchmark that targets the specific operation being changed. See
-[Writing Good Benchmarks](#writing-good-benchmarks) below for best practices.
-
-```csharp
-using BenchmarkDotNet.Attributes;
-using BenchmarkDotNet.Running;
-
-BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
-
-[MemoryDiagnoser]
-public class Bench
-{
-    [GlobalSetup]
-    public void Setup()
-    {
-        // Initialize test data
-    }
-
-    [Benchmark]
-    public int MyOperation()
-    {
-        // Test the operation — return a value to prevent dead code elimination
-        return 42;
-    }
-}
-```
-
-### Step 2: Choose a Bot and Invoke It
-
-**Use @EgorBot** when you need to run custom benchmark code (written in Step 1):
-
-Post a comment on the PR:
-
-````
-@EgorBot -amd -arm
-
-```cs
-// Your benchmark code here
-```
-````
-
-EgorBot builds dotnet/runtime for the PR and base branch, runs the benchmark on
-dedicated hardware, and posts BDN results back as a comment.
-
-See [EgorBot reference](references/egorbot-reference.md) for the full target
-list, options, and examples.
-
-**Use @MihuBot** when you want to run existing benchmarks from the
-[dotnet/performance](https://github.com/dotnet/performance) repo:
-
-```
-@MihuBot benchmark <filter>
-```
-
-This is useful when established benchmarks already cover the affected code path
-and you don't need to write custom code.
-
-See [MihuBot reference](references/mihubot-reference.md) for the full command
-syntax and options.
-
-### Step 3: Interpret Results
-
-EgorBot and MihuBot post results as PR comments. Look for:
-
-- **Ratio column** — values >1.0 indicate the PR is slower, <1.0 indicate it's
-  faster
-- **Statistical significance** — if a `--statisticalTest` column is present,
-  look for `Faster`, `Slower`, or `Same` annotations
-- **Memory/allocation changes** — check `Allocated` column if
-  `[MemoryDiagnoser]` is enabled
-
----
-
-## Workflow 2: Regression Investigation
-
-Use this when a performance regression has been reported — whether from
-`performanceautofiler[bot]`, a customer report, or a cross-release comparison.
-
-### Overview
+## Investigation Workflow
 
 The investigation follows three phases:
 
@@ -143,25 +50,6 @@ For details on building the runtime, using CoreRun, and running BenchmarkDotNet
 against private builds, see the
 [local benchmarking guide](references/local-benchmarking.md).
 
-### Quick Path: Use Bots Instead of Local Bisection
-
-If the regression range is narrow (a few commits) or the environment doesn't
-support local builds, you can use bots to validate specific commits without
-building locally:
-
-```
-@EgorBot -amd -commits {good-sha},{bad-sha}
-```
-
-Or with @MihuBot for existing benchmarks:
-
-```
-@MihuBot benchmark <filter> https://github.com/dotnet/runtime/compare/{good-sha}...{bad-sha}
-```
-
-This won't perform a full bisect, but it can confirm whether the regression
-exists and help narrow the range.
-
 ### Reporting Results
 
 After completing the investigation, include in your report:
@@ -174,48 +62,10 @@ After completing the investigation, include in your report:
 
 ---
 
-## Workflow 3: JIT Diff Analysis
-
-Use this when a change affects JIT code generation and you want to see how it
-changes the emitted machine code across the entire BCL.
-
-### Invoke MihuBot for JIT Diffs
-
-Post a comment on the PR:
-
-```
-@MihuBot
-```
-
-MihuBot generates comprehensive JIT diffs showing codegen regressions and
-improvements. For ARM64-specific diffs or tier-0 analysis:
-
-```
-@MihuBot -arm -tier0
-```
-
-See [MihuBot reference](references/mihubot-reference.md) for the full set of JIT
-diff options and usage guidance.
-
-### Interpreting JIT Diffs
-
-MihuBot reports include:
-
-- **Code size changes** — total bytes added/removed across all methods
-- **Per-method diffs** — individual methods that changed, with before/after
-  assembly
-- **Regressions vs improvements** — clearly separated sections
-
-A small increase in code size across many methods may indicate a JIT change with
-broad impact. A large increase in a few methods may indicate a targeted
-optimization that trades code size for speed (or a regression).
-
----
-
 ## Writing Good Benchmarks
 
-These guidelines apply whether you're writing a benchmark for EgorBot, for
-local validation, or for contribution to the dotnet/performance repo.
+These guidelines apply whether you're writing a benchmark for local validation
+or for contribution to the dotnet/performance repo.
 
 For comprehensive guidance, see the
 [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md).
@@ -240,16 +90,7 @@ For comprehensive guidance, see the
 - Must not be `sealed`
 - Must not be `static`
 
-### Avoid `[DisassemblyDiagnoser]`
-
-It causes crashes on Linux. To get disassembly, use the `--envvars` option
-instead:
-
-```
-@EgorBot -amd --envvars DOTNET_JitDisasm:MethodName
-```
-
-### Example: Comparing Two Implementations
+### Example: Standalone Investigation Benchmark
 
 ```csharp
 using BenchmarkDotNet.Attributes;
@@ -296,6 +137,7 @@ public class Bench
 
 | Condition | Skill | When to use |
 |-----------|-------|-------------|
+| Need to benchmark a PR via @EgorBot | **performance-benchmark** | For ad hoc PR benchmarking on dedicated hardware |
 | Triaging a performance regression issue | **issue-triage** | For the full triage workflow (assessment, recommendation, labels) |
 | Fix PR linked to the regression | **code-review** | To review the fix for correctness and consistency |
 | JIT regression test needed | **jit-regression-test** | To extract a JIT regression test from the issue |
diff --git a/.github/skills/performance-investigation/evals/evals.json b/.github/skills/performance-investigation/evals/evals.json
index bfe6a78f99d9e6..21d363829af87e 100644
--- a/.github/skills/performance-investigation/evals/evals.json
+++ b/.github/skills/performance-investigation/evals/evals.json
@@ -3,42 +3,15 @@
   "evals": [
     {
       "id": 1,
-      "name": "pr-benchmark-request",
-      "prompt": "Can you benchmark PR https://github.com/dotnet/runtime/pull/121223 to check for performance impact?",
-      "expected_output": "Should follow Workflow 1 (PR Benchmark Validation). Should write a BenchmarkDotNet benchmark targeting the changed code and invoke @EgorBot to run it on the PR.",
-      "assertions": [
-        {
-          "name": "uses-workflow-1",
-          "description": "Follows the PR benchmark validation workflow",
-          "type": "contains_any",
-          "check": ["Workflow 1", "PR Benchmark", "benchmark"]
-        },
-        {
-          "name": "writes-benchmark",
-          "description": "Creates or references a BenchmarkDotNet benchmark",
-          "type": "contains_any",
-          "check": ["[Benchmark]", "BenchmarkDotNet", "BenchmarkSwitcher"]
-        },
-        {
-          "name": "invokes-bot",
-          "description": "Invokes EgorBot or MihuBot to run the benchmark",
-          "type": "contains_any",
-          "check": ["@EgorBot", "@MihuBot"]
-        }
-      ],
-      "files": []
-    },
-    {
-      "id": 2,
       "name": "perf-regression-autobot",
       "prompt": "Investigate this performance regression: https://github.com/dotnet/runtime/issues/114625",
-      "expected_output": "Should follow Workflow 2 (Regression Investigation). Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and attempt validation or bisection.",
+      "expected_output": "Should follow the regression investigation workflow. Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and plan validation or bisection using local builds.",
       "assertions": [
         {
-          "name": "uses-workflow-2",
-          "description": "Follows the regression investigation workflow",
+          "name": "identifies-regression",
+          "description": "Recognizes and follows the regression investigation workflow",
           "type": "contains_any",
-          "check": ["Workflow 2", "Regression", "regression", "investigate"]
+          "check": ["regression", "investigate", "Regression"]
         },
         {
           "name": "identifies-commits",
@@ -56,28 +29,7 @@
       "files": []
     },
     {
-      "id": 3,
-      "name": "jit-diff-request",
-      "prompt": "Can you generate JIT diffs for my PR that changes the JIT compiler?",
-      "expected_output": "Should follow Workflow 3 (JIT Diff Analysis). Should invoke @MihuBot to generate JIT diffs and explain how to interpret the results.",
-      "assertions": [
-        {
-          "name": "uses-mihubot",
-          "description": "Invokes MihuBot for JIT diffs",
-          "type": "contains",
-          "check": "@MihuBot"
-        },
-        {
-          "name": "mentions-jit-diffs",
-          "description": "References JIT diff generation",
-          "type": "contains_any",
-          "check": ["JIT diff", "jit-diff", "codegen", "code size"]
-        }
-      ],
-      "files": []
-    },
-    {
-      "id": 4,
+      "id": 2,
       "name": "benchmark-with-corerun",
       "prompt": "How do I benchmark my local runtime changes against the main branch?",
       "expected_output": "Should explain how to build dotnet/runtime, obtain CoreRun from the testhost folder, and run BenchmarkDotNet with the --coreRun argument to compare private builds.",
@@ -104,31 +56,10 @@
       "files": []
     },
     {
-      "id": 5,
-      "name": "existing-benchmarks-request",
-      "prompt": "Run the existing Regex benchmarks from dotnet/performance against PR https://github.com/dotnet/runtime/pull/124628",
-      "expected_output": "Should use @MihuBot benchmark command to run existing benchmarks from the dotnet/performance repo rather than writing custom benchmark code.",
-      "assertions": [
-        {
-          "name": "uses-mihubot-benchmark",
-          "description": "Uses MihuBot's benchmark command for existing benchmarks",
-          "type": "contains",
-          "check": "@MihuBot benchmark"
-        },
-        {
-          "name": "references-perf-repo",
-          "description": "References the dotnet/performance repository",
-          "type": "contains_any",
-          "check": ["dotnet/performance", "performance repo"]
-        }
-      ],
-      "files": []
-    },
-    {
-      "id": 6,
+      "id": 3,
       "name": "cross-release-regression",
       "prompt": "A user reports that string.IndexOf is 2x slower in .NET 10 compared to .NET 9. How should we investigate?",
-      "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression. Should reference both local investigation and bot-based approaches.",
+      "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression locally using CoreRun builds.",
       "assertions": [
         {
           "name": "mentions-merge-base",
@@ -152,28 +83,28 @@
       "files": []
     },
     {
-      "id": 7,
-      "name": "compare-specific-commits",
-      "prompt": "Compare the performance of commits abc1234 and def5678 for the System.Text.Json benchmarks",
-      "expected_output": "Should invoke @EgorBot with -commits to compare the two specific commits, or use @MihuBot benchmark with a compare URL.",
+      "id": 4,
+      "name": "compare-commits-locally",
+      "prompt": "Compare the performance of two specific commits locally for System.Text.Json serialization",
+      "expected_output": "Should explain how to build dotnet/runtime at both commits, save testhost/CoreRun artifacts, and run BenchmarkDotNet with --coreRun pointing to both builds for a side-by-side comparison.",
       "assertions": [
         {
-          "name": "uses-commits-flag",
-          "description": "Uses the -commits option or compare URL to specify the commits",
+          "name": "mentions-corerun",
+          "description": "References CoreRun or testhost for running against private builds",
           "type": "contains_any",
-          "check": ["-commits", "compare", "abc1234", "def5678"]
+          "check": ["CoreRun", "coreRun", "--coreRun", "testhost"]
         },
         {
-          "name": "invokes-bot",
-          "description": "Invokes EgorBot or MihuBot to run the comparison",
+          "name": "mentions-both-builds",
+          "description": "Explains building at both commits for comparison",
           "type": "contains_any",
-          "check": ["@EgorBot", "@MihuBot"]
+          "check": ["both commits", "good", "bad", "baseline", "two builds", "each commit"]
         }
       ],
       "files": []
     },
     {
-      "id": 8,
+      "id": 5,
       "name": "not-applicable-bug-issue",
       "prompt": "Can you check the performance impact of https://github.com/dotnet/runtime/issues/46088",
       "expected_output": "Should recognize this is a functional bug (System.Text.Json does not support constructors with byref parameters), not a performance issue. Should indicate that performance benchmarking is not applicable here.",
@@ -188,7 +119,7 @@
       "files": []
     },
     {
-      "id": 9,
+      "id": 6,
       "name": "not-applicable-doc-pr",
       "prompt": "Benchmark the changes in PR https://github.com/dotnet/runtime/pull/124592 to validate performance",
       "expected_output": "Should recognize this is a documentation-only PR (adding XML docs to DI extension methods) and that benchmarking is not applicable or meaningful for documentation changes.",
diff --git a/.github/skills/performance-investigation/references/egorbot-reference.md b/.github/skills/performance-investigation/references/egorbot-reference.md
deleted file mode 100644
index 39ed2e8ab81774..00000000000000
--- a/.github/skills/performance-investigation/references/egorbot-reference.md
+++ /dev/null
@@ -1,71 +0,0 @@
-# EgorBot Reference
-
-[EgorBot](https://github.com/EgorBo/EgorBot) is a benchmark-as-a-service bot for
-[dotnet/runtime](https://github.com/dotnet/runtime). It runs BenchmarkDotNet
-microbenchmarks on dedicated hardware and posts results back as GitHub comments.
-Its primary use case is comparing performance before and after a change — either
-across a PR or between specific commits.
-
-For the full and up-to-date command reference (targets, options, defaults),
-see the [EgorBot manual](https://github.com/EgorBo/EgorBot).
-
-## Command Format
-
-Mention `@EgorBot` in a PR or issue comment. The benchmark source goes in a
-fenced C# code block (a code fence that begins with three backticks followed
-by `cs`) in the same comment.
-
-```
-@EgorBot [targets...] [options...] [BDN arguments...]
-```
-
-> **Formatting rules:**
-> - The `@EgorBot` command must be **outside** the code block.
-> - Only benchmark source code belongs inside the code block.
-> - Do not place text between the `@EgorBot` line and the code block — EgorBot
->   treats it as additional command arguments.
-
-## Examples
-
-Compare a PR against its base branch on AMD and Apple Silicon:
-
-```
-@EgorBot -amd -arm
-```
-
-Compare two specific commits:
-
-```
-@EgorBot -amd -commits abc1234,def5678
-```
-
-Compare a commit against its parent:
-
-```
-@EgorBot -arm -commits abc1234,abc1234~1
-```
-
-Compare a range of commits for a specific benchmark filter:
-
-```
-@EgorBot -arm -commits abc1234...def5678 --filter "*MyBench*"
-```
-
-## Practical Notes
-
-- **Default target:** If no target is specified, runs on Apple Silicon via Helix.
-- **PR mode:** When posting in a PR without `-commits`, EgorBot automatically
-  compares the PR branch against the base branch.
-- **No code block:** If no code block is provided, EgorBot runs benchmarks from
-  the [dotnet/performance](https://github.com/dotnet/performance) repo instead.
-- **Response time:** EgorBot uses polling and may take up to 30 seconds to
-  acknowledge the request.
-- **Supported repositories:** `dotnet/runtime` and `EgorBot/runtime-utils`.
-- **Result variability:** Results can vary between runs due to VM differences.
-  Do not compare results across different architectures or cloud providers.
-
-## Links
-
-- [EgorBot manual](https://github.com/EgorBo/EgorBot) — full target list,
-  options, and usage documentation
-- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html)
diff --git a/.github/skills/performance-investigation/references/mihubot-reference.md b/.github/skills/performance-investigation/references/mihubot-reference.md
deleted file mode 100644
index 458d4fd06bbb4c..00000000000000
--- a/.github/skills/performance-investigation/references/mihubot-reference.md
+++ /dev/null
@@ -1,66 +0,0 @@
-# MihuBot Reference
-
-[MihuBot](https://github.com/MihuBot/runtime-utils) provides several
-performance-related services for dotnet/runtime: JIT diff generation, benchmark
-execution from the [dotnet/performance](https://github.com/dotnet/performance)
-repo, library fuzzing, and regex source generator diffs. It also has a
-[web interface](https://mihubot.xyz/runtime-utils) for submitting jobs.
-
-For full and up-to-date option details, see the
-[MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) repository.
-
-## JIT Diff Generation
-
-Generate JIT diffs between a PR and its base branch to see how a change affects
-the generated machine code across the BCL.
-
-```
-@MihuBot
-@MihuBot -arm -tier0
-```
-
-## Running Benchmarks from dotnet/performance
-
-Run existing benchmarks from the
-[dotnet/performance](https://github.com/dotnet/performance) repository without
-writing custom benchmark code.
-
-```
-@MihuBot benchmark Regex
-@MihuBot benchmark GetUnicodeCategory https://github.com/dotnet/runtime/compare/4bb0bcd...c74440f
-```
-
-## Library Fuzzer
-
-Run fuzz testing on a library:
-
-```
-@MihuBot fuzz SearchValues
-@MihuBot fuzz SearchValues -dependsOn #107206
-```
-
-## Regex Source Generator Diffs
-
-Generate diffs for regex source generator output and JIT diffs for the
-generated code:
-
-```
-@MihuBot regexdiff
-@MihuBot regexdiff -arm
-```
-
-## Common Options
-
-Most MihuBot job types support options like `-arm`, `-intel`, `-fast`,
-`-dependsOn <prs>`, and `-combineWith <prs>`. For example:
-
-```
-@MihuBot -arm -hetzner -combineWith #1000,#1001
-```
-
-## Links
-
-- [MihuBot runtime-utils](https://github.com/MihuBot/runtime-utils) — full
-  documentation and option reference
-- [Web interface](https://mihubot.xyz/runtime-utils) for submitting jobs
-  directly