diff --git a/.github/skills/issue-triage/SKILL.md b/.github/skills/issue-triage/SKILL.md
index a3c920021376e7..bdc16692ad98ff 100644
--- a/.github/skills/issue-triage/SKILL.md
+++ b/.github/skills/issue-triage/SKILL.md
@@ -232,7 +232,7 @@ Based on the issue type classified in Step 1, follow the appropriate guide:
 |------|-------|---------------|
 | **Bug report** | [Bug triage](references/bug-triage.md) | Reproduction, regression validation, minimal repro derivation, root cause analysis |
 | **API proposal** | [API proposal triage](references/api-proposal-triage.md) | Merit evaluation, complexity estimation |
-| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression with BenchmarkDotNet, git bisect to culprit commit |
+| **Performance regression** | [Performance regression triage](references/perf-regression-triage.md) | Validate regression, assess severity and impact. For detailed investigation methodology (benchmarking, bisection), use the `performance-investigation` skill. |
 | **Question** | [Question triage](references/question-triage.md) | Research and answer the question, verify if low confidence |
 | **Enhancement** | [Enhancement triage](references/enhancement-triage.md) | Subcategory classification, feasibility analysis, trade-off assessment (includes performance improvement requests) |
 
@@ -521,5 +521,6 @@ depending on the outcome:
 |-----------|-------|-----------------|
 | API proposal recommended as KEEP | **api-proposal** | Offer to draft a formal API proposal with working prototype |
 | Bug report with root cause identified | **jit-regression-test** | If the bug is JIT-related, offer to create a regression test |
-| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks |
+| Performance regression confirmed | **performance-investigation** | Offer to investigate the regression locally (CoreRun builds, bisection) |
+| Performance regression confirmed | **performance-benchmark** | Offer to validate the regression with ad hoc benchmarks via @EgorBot |
 | Fix PR linked to the issue | **code-review** | Offer to review the fix PR for correctness and consistency |
diff --git a/.github/skills/issue-triage/references/perf-regression-triage.md b/.github/skills/issue-triage/references/perf-regression-triage.md
index 1382c09ee8e1e6..4fb2b37732548d 100644
--- a/.github/skills/issue-triage/references/perf-regression-triage.md
+++ b/.github/skills/issue-triage/references/perf-regression-triage.md
@@ -1,13 +1,14 @@
 # Performance Regression Triage
 
-Guidance for investigating and triaging performance regressions in
-dotnet/runtime. Referenced from the main [SKILL.md](../SKILL.md) during Step 5.
+Triage-specific guidance for assessing and recommending action on performance
+regressions in dotnet/runtime. Referenced from the main
+[SKILL.md](../SKILL.md) during Step 5.
 
-> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd`
-> on Windows or `./build.sh` on Linux/macOS. Other shell commands use
-> Linux/macOS syntax (`cp -r`, forward-slash paths, `\` line continuation).
-> On Windows, adapt accordingly: use `Copy-Item` or `xcopy`, backslash paths,
-> and backtick (`` ` ``) line continuation.
+For detailed investigation methodology (benchmarking, bisection, bot usage),
+use the `performance-investigation` skill. This document covers only the
+triage-specific assessment and recommendation criteria.
+
+## Sources of Performance Regressions
 
 A performance regression is a report that something got measurably slower (or
 uses more memory/allocations) compared to a previous .NET version or a recent
@@ -21,307 +22,19 @@ commit. These reports come from several sources:
 - **Cross-release regressions** -- a regression observed between two stable
   releases (e.g., .NET 9 → .NET 10) without a specific commit range.
 
-The goals of this triage are to:
-
-1. **Validate** that the regression is real and reproducible.
-2. **Bisect** to the exact commit that introduced it.
-
-## Feasibility Check
-
-Before investing time in benchmarking and bisection, assess whether the current
-environment can support the investigation. Full bisection requires building
-dotnet/runtime at multiple commits (each build takes 5-40 minutes) and running
-benchmarks, which is resource-intensive.
-
-| Factor | Feasible | Not feasible |
-|--------|----------|--------------|
-| **Disk space** | >50 GB free (for multiple builds) | <20 GB free |
-| **Build time budget** | User is willing to wait 30-60+ min | Quick-turnaround triage expected |
-| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) |
-| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits |
-| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware |
-
-### When full bisection is not feasible
-
-Use the **lightweight analysis** path instead:
-
-1. **Analyze `git log`** -- Review commits in the regression range
-   (`git log --oneline {good}..{bad}`) and identify changes to the affected
-   code path. Look for algorithmic changes, removed optimizations, added
-   validation, or new allocations.
-2. **Check PR descriptions** -- For each suspicious commit, read the associated
-   PR description and review comments. Performance trade-offs are often
-   discussed there.
-3. **Narrow by code path** -- Use `git log --oneline {good}..{bad} -- path/`
-   to filter commits to the affected library or component.
-4. **Report the narrowed range** -- Include the list of candidate commits/PRs
-   in the triage report with an explanation of why each is suspicious. This
-   gives maintainers a head start even without a definitive bisect result.
-
-Note in the triage report that full bisection was not attempted and why
-(e.g., "environment mismatch", "time constraint"), so maintainers know to
-verify independently.
-
-## Identifying the Bisect Range
-
-Before benchmarking, determine the good and bad commits that bound the
-regression.
-
-### Automated bot issues (`performanceautofiler`)
-
-Issues from `performanceautofiler[bot]` follow a standard format:
-
-- **Run Information** -- Baseline commit, Compare commit, diff link, OS, arch,
-  and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`).
-- **Regression tables** -- Each table shows benchmark name, Baseline time,
-  Test time, and Test/Base ratio. A ratio >1.0 indicates a regression.
-- **Repro commands** -- Typically:
-  ```
-  git clone https://github.com/dotnet/performance.git
-  python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*'
-  ```
-- **Graphs** -- Time-series graphs showing when the regression appeared.
-
-Key fields to extract:
-
-- The **Baseline** and **Compare** commit SHAs -- these define the bisect range.
-- The **benchmark filter** -- the `--filter` argument to reproduce the benchmark.
-- The **Test/Base ratio** -- how severe the regression is (>1.5× is significant).
-
-### Customer reports
-
-When a customer reports a regression (e.g., "X is slower on .NET 10 than
-.NET 9"), there are no pre-defined commit SHAs. You need to determine the
-bisect range yourself -- see [Cross-release regressions](#cross-release-regressions)
-below.
-
-Also identify the **scenario to benchmark** from the customer's report -- the
-specific API call, code pattern, or workload that regressed.
-
-### Cross-release regressions
-
-When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect
-on the `main` branch between the commits from which the release branches were
-snapped. Release branches in dotnet/runtime are
-[snapped from main](../../../../docs/project/branching-guide.md).
-
-Find the snap points with `git merge-base`:
-
-```
-git merge-base main release/9.0    # → good commit (last common ancestor)
-git merge-base main release/10.0   # → bad commit
-```
-
-Use the resulting SHAs as the good/bad boundaries for bisection on `main`.
-This avoids bisecting across release branches where cherry-picks and backports
-make the history non-linear.
-
-## Phase 1: Create a Standalone Benchmark
-
-Before investing time in bisection, create a standalone BenchmarkDotNet
-project that reproduces the regressing scenario. This project will be used
-for both validation (Phase 1) and bisection (Phase 3).
-
-### Why a standalone project?
-
-The full [dotnet/performance](https://github.com/dotnet/performance) repo
-has many dependencies and can be fragile across different runtime commits.
-A standalone project with only the impacted benchmark is faster to build,
-easier to iterate on, and more reliable during `git bisect`.
-
-### Creating the benchmark project
-
-**From an automated bot issue** -- copy the relevant benchmark class and its
-dependencies from the `dotnet/performance` repo into a new standalone project:
-
-1. Clone `dotnet/performance` and locate the benchmark class referenced in the
-   issue's `--filter` argument.
-2. Create a new console project and add a reference to
-   `BenchmarkDotNet` (NuGet):
-   ```
-   mkdir PerfRepro && cd PerfRepro
-   dotnet new console
-   dotnet add package BenchmarkDotNet
-   ```
-3. Copy the benchmark class (and any helper types it depends on) into the
-   project. Adjust namespaces and usings as needed.
-4. Add a `Program.cs` entry point:
-   ```csharp
-   BenchmarkDotNet.Running.BenchmarkSwitcher
-       .FromAssembly(typeof(Program).Assembly)
-       .Run(args);
-   ```
-
-**From a customer report** -- write a minimal BenchmarkDotNet benchmark that
-exercises the reported code path:
-
-1. Create a new console project with `BenchmarkDotNet` as above.
-2. Write a `[Benchmark]` method that calls the API or runs the workload the
-   customer identified as slow.
-3. If the customer provided sample code, adapt it into a proper BDN benchmark
-   with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path.
-
-### Building dotnet/runtime and obtaining CoreRun
-
-Build dotnet/runtime at the commit you want to test:
-
-```
-build.cmd/sh clr+libs -c release
-```
-
-The key artifact is the **testhost** folder containing **CoreRun** at:
-
-```
-artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/
-```
-
-BenchmarkDotNet uses CoreRun to load the locally-built runtime and libraries,
-meaning you can benchmark private builds without installing them as SDKs.
-
-### Validating the regression
-
-Build dotnet/runtime at both the good and bad commits, saving each testhost
-folder:
-
-```
-git checkout {bad-sha}
-build.cmd/sh clr+libs -c release
-cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad
-
-git checkout {good-sha}
-build.cmd/sh clr+libs -c release
-cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good
-```
-
-Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares
-them side-by-side when given multiple `--coreRun` paths (the first is treated
-as the baseline):
-
-```
-cd PerfRepro
-dotnet run -c Release -f net{ver} -- \
-    --filter '*' \
-    --coreRun /tmp/corerun-good/.../CoreRun \
-              /tmp/corerun-bad/.../CoreRun
-```
-
-To add a statistical significance column, append `--statisticalTest 5%`.
-This performs a Mann–Whitney U test and marks results as `Faster`, `Slower`,
-or `Same`.
-
-### Interpret the results
-
-| Outcome | Meaning | Next step |
-|---------|---------|-----------|
-| `Slower` with ratio >1.10 | Regression confirmed | Proceed to Phase 2 |
-| `Slower` with ratio between 1.05 and 1.10 | Small regression -- likely real but needs confirmation | Re-run with more iterations (`--iterationCount 30`). If it persists, treat as confirmed and proceed to Phase 2. |
-| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. |
-| `Slower` but ratio <1.05 | Marginal -- may be noise | Re-run with more iterations (`--iterationCount 30`). If still marginal, note as inconclusive. |
-
-For a thorough comparison of saved BDN result files, use the
-[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer)
-tool:
-
-```
-dotnet run --project performance/src/tools/ResultsComparer \
-    --base /path/to/baseline-results \
-    --diff /path/to/compare-results \
-    --threshold 5%
-```
-
-## Phase 2: Narrow the Commit Range
-
-If the bisect range spans many commits, narrow it before running a full
-bisect:
-
-1. **Check `git log --oneline {good}..{bad}`** -- how many commits are in the
-   range? If it is more than ~200, try to narrow it first.
-2. **Test midpoint commits manually** -- pick a commit in the middle of the
-   range, build, run the benchmark, and determine if it is good or bad.
-   This halves the range in one step.
-3. **For cross-release regressions** -- use the `git merge-base` snap points
-   described above. If the range between two release snap points is still
-   large, test at intermediate release preview tags to narrow further.
-
-## Phase 3: Git Bisect
-
-Once you have a manageable commit range (good commit and bad commit), use
-`git bisect` to binary-search for the culprit.
-
-### Bisect workflow
-
-At each step of the bisect, you need to:
-
-1. **Rebuild the affected component** -- use incremental builds where possible
-   (see [Incremental Rebuilds](#incremental-rebuilds-during-bisect) below).
-2. **Run the standalone benchmark** with the freshly-built CoreRun:
-   ```
-   cd PerfRepro
-   dotnet run -c Release -f net{ver} -- \
-       --filter '*' \
-       --coreRun {runtime}/artifacts/bin/testhost/.../CoreRun
-   ```
-3. **Determine good or bad** -- compare the result against your threshold.
-
-**Exit codes for `git bisect run`:**
-- `0` -- good (no regression at this commit)
-- `1`–`124` -- bad (regression present)
-- `125` -- skip (build failure or untestable commit)
-
-The standalone benchmark project must be **outside the dotnet/runtime tree**
-since `git bisect` checks out different commits, which would overwrite
-in-tree files. Place it in a stable location (e.g., `/tmp/bisect/`).
-
-### Run the bisect
-
-```
-cd /path/to/runtime
-git bisect start {bad-sha} {good-sha}
-git bisect run /path/to/bisect-script.sh
-```
-
-**Time estimate:** Each bisect step requires a rebuild + benchmark run.
-For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect
-roughly 50 minutes for the full bisect.
-
-### After bisect completes
-
-`git bisect` will output the first bad commit. Run `git bisect reset` to
-return to the original branch.
-
-### Root cause analysis and triage report
-
-Include the following in the triage report:
-
-1. **The culprit commit or PR** -- link to the specific commit SHA and its
-   associated PR. Explain how the change relates to the regressing benchmark.
-2. **Root cause analysis** -- describe *why* the change caused the regression
-   (e.g., an algorithm change, a removed optimization, additional validation
-   overhead).
-3. **If the root cause spans multiple PRs** -- sometimes a regression results
-   from the combined effect of several changes and `git bisect` lands on a
-   commit that is only one contributing factor. In this case, report the
-   narrowest commit range that introduced the regression and list the PRs or
-   commits within that range that appear relevant to the affected code path.
-
-## Incremental Rebuilds During Bisect
-
-Full rebuilds are slow. Minimize per-step build time:
+## Investigation
 
-| Component changed | Fast rebuild command |
-|-------------------|---------------------|
-| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` |
-| CoreLib | `build.cmd/sh clr.corelib -c Release` |
-| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` |
-| All libraries | `build.cmd/sh libs -c Release` |
+The investigation goal is to validate that the regression is real and, if
+possible, bisect to the exact commit that introduced it.
 
-After an incremental library rebuild, the updated DLL is placed in the
-testhost folder automatically. CoreRun will pick up the new version on the
-next benchmark run.
+Use the `performance-investigation` skill (Workflow 2: Regression Investigation)
+for the full methodology, which includes:
 
-**Caveat:** If bisect crosses a commit that changes the build infrastructure
-(e.g., SDK version bump in `global.json`), the incremental build may fail.
-Use exit code `125` (skip) to handle this gracefully.
+- Feasibility checks for local vs. bot-based investigation
+- Building dotnet/runtime at specific commits and using CoreRun
+- Comparing good/bad commits with BenchmarkDotNet
+- Git bisect workflow for finding the culprit commit
+- Using @EgorBot and @MihuBot for remote validation
 
 ## Performance-Specific Assessment
 
diff --git a/.github/skills/performance-investigation/SKILL.md b/.github/skills/performance-investigation/SKILL.md
new file mode 100644
index 00000000000000..416c2a2278c0b7
--- /dev/null
+++ b/.github/skills/performance-investigation/SKILL.md
@@ -0,0 +1,143 @@
+---
+name: performance-investigation
+description: >
+  Investigate performance regressions locally in dotnet/runtime. Use this skill
+  when asked to investigate a performance regression, bisect to find a culprit
+  commit, validate a regression with local builds, compare performance between
+  commits using CoreRun, or benchmark private runtime builds with
+  BenchmarkDotNet. Also use when asked about CoreRun, testhost, or local
+  benchmarking against private builds. DO NOT USE FOR ad hoc PR benchmarking
+  with @EgorBot or @MihuBot (use the performance-benchmark skill instead).
+---
+
+# Local Performance Investigation for dotnet/runtime
+
+Investigate performance regressions locally by building the runtime at specific
+commits, running BenchmarkDotNet with CoreRun, and using git bisect to find
+culprit commits. This skill covers the full local investigation workflow from
+validation to root-causing.
+
+## When to Use This Skill
+
+- Asked to **investigate a performance regression** (from an issue, bot report,
+  or customer report)
+- Asked to **compare performance** between commits, branches, or releases using
+  local builds
+- Asked to **bisect** to find the commit that introduced a regression
+- Asked to **benchmark private runtime builds** using CoreRun
+- Asked to **triage a performance issue** (use alongside the `issue-triage`
+  skill for full triage)
+- Given a `tenet-performance` or `tenet-performance-benchmarks` labeled issue
+  that requires local investigation
+
+> **Note:** For ad hoc PR benchmarking via @EgorBot or @MihuBot, use the
+> `performance-benchmark` skill instead. This skill focuses on local builds,
+> CoreRun, and git bisect.
+
+## Investigation Workflow
+
+The investigation follows three phases:
+
+1. **Validate** — Confirm the regression is real and reproducible
+2. **Narrow** — Reduce the commit range to a manageable size
+3. **Bisect** — Binary-search for the culprit commit
+
+For the full methodology, including feasibility checks, commit range
+identification, and step-by-step bisection instructions, see the
+[bisection guide](references/bisection-guide.md).
+
+For details on building the runtime, using CoreRun, and running BenchmarkDotNet
+against private builds, see the
+[local benchmarking guide](references/local-benchmarking.md).
+
+### Reporting Results
+
+After completing the investigation, include in your report:
+
+- Whether the regression was **confirmed** or **not reproduced**
+- The **culprit commit/PR** (if bisection was performed)
+- **Root cause analysis** — why the change caused the regression
+- **Severity assessment** — Test/Base ratio, number of affected benchmarks,
+  user impact
+
+---
+
+## Writing Good Benchmarks
+
+These guidelines apply whether you're writing a benchmark for local validation
+or for contribution to the dotnet/performance repo.
+
+For comprehensive guidance, see the
+[Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md).
+
+### Key Principles
+
+- **Move initialization to `[GlobalSetup]`** — separate setup from the measured
+  code to avoid measuring allocation/initialization overhead
+- **Return values** from benchmark methods to prevent dead code elimination
+- **Avoid manual loops** — BenchmarkDotNet invokes the benchmark many times
+  automatically; adding loops distorts measurements
+- **No side effects** — benchmarks should be pure and produce consistent results
+- **Focus on common cases** — benchmark hot paths and typical usage, not edge
+  cases
+- **Use consistent input data** — always use the same test data for reproducible
+  comparisons
+
+### Benchmark Class Requirements
+
+- Must be `public`
+- Must be a `class` (not struct)
+- Must not be `sealed`
+- Must not be `static`
+
+### Example: Standalone Investigation Benchmark
+
+```csharp
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Running;
+
+BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
+
+[MemoryDiagnoser]
+public class Bench
+{
+    private string _testString = default!;
+
+    [Params(10, 100, 1000)]
+    public int Length { get; set; }
+
+    [GlobalSetup]
+    public void Setup()
+    {
+        _testString = new string('a', Length);
+    }
+
+    [Benchmark]
+    public int StringOperation()
+    {
+        return _testString.IndexOf('z');
+    }
+}
+```
+
+---
+
+## External Resources
+
+- [dotnet/performance repository](https://github.com/dotnet/performance) —
+  central location for all .NET runtime benchmarks
+- [Benchmarking workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
+- [Profiling workflow for dotnet/runtime](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md)
+- [Microbenchmark Design Guidelines](https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md)
+- [BenchmarkDotNet CLI arguments](https://benchmarkdotnet.org/articles/guides/console-args.html)
+- [Performance guidelines](../../../../docs/project/performance-guidelines.md) —
+  project-wide performance policy
+
+## Related Skills
+
+| Condition | Skill | When to use |
+|-----------|-------|-------------|
+| Need to benchmark a PR via @EgorBot | **performance-benchmark** | For ad hoc PR benchmarking on dedicated hardware |
+| Triaging a performance regression issue | **issue-triage** | For the full triage workflow (assessment, recommendation, labels) |
+| Fix PR linked to the regression | **code-review** | To review the fix for correctness and consistency |
+| JIT regression test needed | **jit-regression-test** | To extract a JIT regression test from the issue |
diff --git a/.github/skills/performance-investigation/evals/evals.json b/.github/skills/performance-investigation/evals/evals.json
new file mode 100644
index 00000000000000..21d363829af87e
--- /dev/null
+++ b/.github/skills/performance-investigation/evals/evals.json
@@ -0,0 +1,137 @@
+{
+  "skill_name": "performance-investigation",
+  "evals": [
+    {
+      "id": 1,
+      "name": "perf-regression-autobot",
+      "prompt": "Investigate this performance regression: https://github.com/dotnet/runtime/issues/114625",
+      "expected_output": "Should follow the regression investigation workflow. Should identify baseline/compare commits from the performanceautofiler report, assess severity from the Test/Base ratio, and plan validation or bisection using local builds.",
+      "assertions": [
+        {
+          "name": "identifies-regression",
+          "description": "Recognizes and follows the regression investigation workflow",
+          "type": "contains_any",
+          "check": ["regression", "investigate", "Regression"]
+        },
+        {
+          "name": "identifies-commits",
+          "description": "Identifies or references baseline/compare commits from the bot report",
+          "type": "contains_any",
+          "check": ["commit", "SHA", "baseline", "compare", "bisect"]
+        },
+        {
+          "name": "assesses-severity",
+          "description": "Assesses the regression severity using the ratio",
+          "type": "contains_any",
+          "check": ["ratio", "severity", "Test/Base", "slower", "regression"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 2,
+      "name": "benchmark-with-corerun",
+      "prompt": "How do I benchmark my local runtime changes against the main branch?",
+      "expected_output": "Should explain how to build dotnet/runtime, obtain CoreRun from the testhost folder, and run BenchmarkDotNet with the --coreRun argument to compare private builds.",
+      "assertions": [
+        {
+          "name": "mentions-corerun",
+          "description": "Explains CoreRun as the mechanism for benchmarking private builds",
+          "type": "contains_any",
+          "check": ["CoreRun", "coreRun", "--coreRun", "testhost"]
+        },
+        {
+          "name": "mentions-build",
+          "description": "References building the runtime",
+          "type": "contains_any",
+          "check": ["clr+libs", "build.cmd", "build.sh"]
+        },
+        {
+          "name": "mentions-bdn",
+          "description": "References BenchmarkDotNet for running the benchmarks",
+          "type": "contains_any",
+          "check": ["BenchmarkDotNet", "BDN", "[Benchmark]"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 3,
+      "name": "cross-release-regression",
+      "prompt": "A user reports that string.IndexOf is 2x slower in .NET 10 compared to .NET 9. How should we investigate?",
+      "expected_output": "Should explain how to identify the bisect range for cross-release regressions using git merge-base, create a standalone benchmark, and validate the regression locally using CoreRun builds.",
+      "assertions": [
+        {
+          "name": "mentions-merge-base",
+          "description": "Explains using git merge-base for cross-release bisection",
+          "type": "contains_any",
+          "check": ["merge-base", "release branch", "snap point"]
+        },
+        {
+          "name": "mentions-benchmark-creation",
+          "description": "Suggests creating a benchmark for the reported scenario",
+          "type": "contains_any",
+          "check": ["benchmark", "BenchmarkDotNet", "[Benchmark]", "standalone"]
+        },
+        {
+          "name": "mentions-bisect",
+          "description": "References git bisect as part of the investigation",
+          "type": "contains_any",
+          "check": ["bisect", "git bisect", "binary search"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 4,
+      "name": "compare-commits-locally",
+      "prompt": "Compare the performance of two specific commits locally for System.Text.Json serialization",
+      "expected_output": "Should explain how to build dotnet/runtime at both commits, save testhost/CoreRun artifacts, and run BenchmarkDotNet with --coreRun pointing to both builds for a side-by-side comparison.",
+      "assertions": [
+        {
+          "name": "mentions-corerun",
+          "description": "References CoreRun or testhost for running against private builds",
+          "type": "contains_any",
+          "check": ["CoreRun", "coreRun", "--coreRun", "testhost"]
+        },
+        {
+          "name": "mentions-both-builds",
+          "description": "Explains building at both commits for comparison",
+          "type": "contains_any",
+          "check": ["both commits", "good", "bad", "baseline", "two builds", "each commit"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 5,
+      "name": "not-applicable-bug-issue",
+      "prompt": "Can you check the performance impact of https://github.com/dotnet/runtime/issues/46088",
+      "expected_output": "Should recognize this is a functional bug (System.Text.Json does not support constructors with byref parameters), not a performance issue. Should indicate that performance benchmarking is not applicable here.",
+      "assertions": [
+        {
+          "name": "identifies-not-perf",
+          "description": "Recognizes this is not a performance issue",
+          "type": "contains_any",
+          "check": ["not a performance", "not performance-related", "no performance", "functional", "not applicable", "does not apply", "isn't a performance"]
+        }
+      ],
+      "files": []
+    },
+    {
+      "id": 6,
+      "name": "not-applicable-doc-pr",
+      "prompt": "Benchmark the changes in PR https://github.com/dotnet/runtime/pull/124592 to validate performance",
+      "expected_output": "Should recognize this is a documentation-only PR (adding XML docs to DI extension methods) and that benchmarking is not applicable or meaningful for documentation changes.",
+      "assertions": [
+        {
+          "name": "identifies-doc-only",
+          "description": "Recognizes this is a documentation/non-functional change where benchmarking is not meaningful",
+          "type": "contains_any",
+          "check": ["documentation", "doc", "no functional", "no code change", "not applicable", "does not apply", "no performance impact", "not meaningful", "wouldn't affect", "won't affect", "no runtime"]
+        }
+      ],
+      "files": []
+    }
+  ]
+}
diff --git a/.github/skills/performance-investigation/references/bisection-guide.md b/.github/skills/performance-investigation/references/bisection-guide.md
new file mode 100644
index 00000000000000..152858f18dd524
--- /dev/null
+++ b/.github/skills/performance-investigation/references/bisection-guide.md
@@ -0,0 +1,176 @@
+# Git Bisect for Performance Regressions
+
+This guide covers how to use `git bisect` to find the exact commit that
+introduced a performance regression. It's a 3-phase process: validate the
+regression, narrow the commit range, then bisect.
+
+## Feasibility Check
+
+Before investing time in bisection, assess whether the current environment can
+support the investigation. Full bisection requires building dotnet/runtime at
+multiple commits (each build takes 5–40 minutes) and running benchmarks, which
+is resource-intensive.
+
+| Factor | Feasible | Not feasible |
+|--------|----------|--------------|
+| **Disk space** | >50 GB free (multiple builds) | <20 GB free |
+| **Build time budget** | Willing to wait 30–60+ min | Quick-turnaround expected |
+| **OS/arch match** | Current environment matches the regression's OS/arch | Regression is Linux-only but running on Windows (or vice versa) |
+| **SDK availability** | Can build dotnet/runtime at the relevant commits | Build infrastructure has changed too much between commits |
+| **Benchmark complexity** | Simple, self-contained benchmark | Requires external services, databases, or specialized hardware |
+
+### When full bisection is not feasible
+
+Use a **lightweight analysis** path instead:
+
+1. **Analyze `git log`** — Review commits in the regression range
+   (`git log --oneline {good}..{bad}`) and identify changes to the affected code
+   path. Look for algorithmic changes, removed optimizations, added validation,
+   or new allocations.
+2. **Check PR descriptions** — For each suspicious commit, read the associated
+   PR description and review comments. Performance trade-offs are often discussed
+   there.
+3. **Narrow by code path** — Use `git log --oneline {good}..{bad} -- path/` to
+   filter commits to the affected library or component.
+4. **Report the narrowed range** — Include the list of candidate commits/PRs with
+   an explanation of why each is suspicious. This gives maintainers a head start
+   even without a definitive bisect result.
+
+Note in the report that full bisection was not attempted and why.
+
+## Identifying the Bisect Range
+
+Determine the good and bad commits that bound the regression.
+
+### Automated bot issues (`performanceautofiler`)
+
+Issues from `performanceautofiler[bot]` follow a standard format:
+
+- **Run Information** — Baseline commit, Compare commit, diff link, OS, arch,
+  and configuration (e.g., `CompilationMode:tiered`, `RunKind:micro`).
+- **Regression tables** — Each table shows benchmark name, Baseline time, Test
+  time, and Test/Base ratio. A ratio >1.0 indicates a regression.
+- **Repro commands** — Typically:
+  ```
+  git clone https://github.com/dotnet/performance.git
+  python3 .\performance\scripts\benchmarks_ci.py -f net10.0 --filter 'SomeBenchmark*'
+  ```
+- **Graphs** — Time-series graphs showing when the regression appeared.
+
+Key fields to extract:
+
+- The **Baseline** and **Compare** commit SHAs — these define the bisect range.
+- The **benchmark filter** — the `--filter` argument to reproduce the benchmark.
+- The **Test/Base ratio** — how severe the regression is (>1.5× is significant).
+
+### Customer reports
+
+When a customer reports a regression (e.g., "X is slower on .NET 10 than
+.NET 9"), there are no pre-defined commit SHAs. Determine the bisect range using
+the cross-release approach below.
+
+### Cross-release regressions
+
+When a regression spans two .NET releases (e.g., .NET 9 → .NET 10), bisect on
+the `main` branch between the commits from which the release branches were
+snapped. Release branches in dotnet/runtime are
+[snapped from main](../../../../docs/project/branching-guide.md).
+
+Find the snap points with `git merge-base`:
+
+```
+git merge-base main release/9.0    # → good commit (last common ancestor)
+git merge-base main release/10.0   # → bad commit
+```
+
+Use the resulting SHAs as the good/bad boundaries for bisection on `main`. This
+avoids bisecting across release branches where cherry-picks and backports make
+the history non-linear.
+
+## Phase 1: Validate the Regression
+
+Before bisecting, confirm the regression is reproducible. Create a standalone
+BenchmarkDotNet project (see
+[local benchmarking guide](local-benchmarking.md#creating-a-standalone-benchmark-project)),
+build the runtime at the good and bad commits, and compare results.
+
+If the regression is not reproducible locally, check for environment differences
+(OS, arch, CPU model) and note this in your report. Consider using
+[@EgorBot](egorbot-reference.md) to validate on dedicated hardware instead.
+
+## Phase 2: Narrow the Commit Range
+
+If the bisect range spans many commits, narrow it before running a full bisect:
+
+1. **Check `git log --oneline {good}..{bad}`** — how many commits are in the
+   range? If more than ~200, narrow first.
+2. **Test midpoint commits manually** — pick a commit in the middle of the range,
+   build, run the benchmark, and determine if it is good or bad. This halves the
+   range in one step.
+3. **For cross-release regressions** — use the `git merge-base` snap points. If
+   the range between two release snap points is still large, test at intermediate
+   release preview tags to narrow further.
+
+## Phase 3: Git Bisect
+
+Once you have a manageable commit range, use `git bisect` to binary-search for
+the culprit.
+
+### Bisect workflow
+
+At each step:
+
+1. **Rebuild the affected component** — use incremental builds where possible
+   (see [incremental rebuilds](local-benchmarking.md#incremental-rebuilds)).
+2. **Run the standalone benchmark** with the freshly-built CoreRun from the
+   testhost folder (see
+   [local benchmarking guide](local-benchmarking.md#building-dotnet-runtime-and-obtaining-corerun)
+   for the exact path):
+   ```
+   cd PerfRepro
+   dotnet run -c Release -f net{ver} -- \
+       --filter '*' \
+       --coreRun {runtime}/artifacts/bin/testhost/net{ver}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{ver}/CoreRun
+   ```
+3. **Determine good or bad** — compare the result against your threshold.
+
+**Exit codes for `git bisect run`:**
+- `0` — good (no regression at this commit)
+- `1`–`124` — bad (regression present)
+- `125` — skip (build failure or untestable commit)
+
+The standalone benchmark project must be **outside the dotnet/runtime tree**
+since `git bisect` checks out different commits which would overwrite in-tree
+files. Place it in a stable location (e.g., `/tmp/bisect/`).
+
+### Run the bisect
+
+```
+cd /path/to/runtime
+git bisect start {bad-sha} {good-sha}
+git bisect run /path/to/bisect-script.sh
+```
+
+**Time estimate:** Each bisect step requires a rebuild + benchmark run.
+For ~1000 commits (log₂(1000) ≈ 10 steps) with a 5-minute rebuild, expect
+roughly 50 minutes for the full bisect.
+
+### After bisect completes
+
+`git bisect` outputs the first bad commit. Run `git bisect reset` to return to
+the original branch.
+
+## Root Cause Analysis
+
+Include the following in your report:
+
+1. **The culprit commit or PR** — link to the specific commit SHA and its
+   associated PR. Explain how the change relates to the regressing benchmark.
+2. **Root cause analysis** — describe *why* the change caused the regression
+   (e.g., an algorithm change, a removed optimization, additional validation
+   overhead).
+3. **If the root cause spans multiple PRs** — sometimes a regression results
+   from the combined effect of several changes and `git bisect` lands on a
+   commit that is only one contributing factor. In this case, report the
+   narrowest commit range and list the PRs within that range that appear
+   relevant to the affected code path.
diff --git a/.github/skills/performance-investigation/references/local-benchmarking.md b/.github/skills/performance-investigation/references/local-benchmarking.md
new file mode 100644
index 00000000000000..d4b2ff38329aea
--- /dev/null
+++ b/.github/skills/performance-investigation/references/local-benchmarking.md
@@ -0,0 +1,148 @@
+# Local Benchmarking with Private Runtime Builds
+
+This guide covers how to benchmark dotnet/runtime changes locally using
+BenchmarkDotNet and privately-built runtime binaries (CoreRun). This approach
+lets you measure performance without installing a custom SDK — BenchmarkDotNet
+loads the locally-built runtime directly.
+
+> **Note:** Build commands use the `build.cmd/sh` shorthand — run `build.cmd`
+> on Windows or `./build.sh` on Linux/macOS. Other shell commands use
+> Linux/macOS syntax. On Windows, adapt accordingly (use `Copy-Item` or `xcopy`,
+> backslash paths, backtick line continuation).
+
+## Building dotnet/runtime and Obtaining CoreRun
+
+Build the runtime at the commit you want to test:
+
+```
+build.cmd/sh clr+libs -c release
+```
+
+The key artifact is the **testhost** folder containing **CoreRun** at:
+
+```
+artifacts/bin/testhost/net{version}-{os}-Release-{arch}/shared/Microsoft.NETCore.App/{version}/
+```
+
+> **Note:** This is different from the bare `corerun` binary under
+> `artifacts/bin/coreclr/`. BenchmarkDotNet needs the testhost layout because
+> it contains both CoreRun and the complete framework assemblies side-by-side.
+
+CoreRun is a lightweight host that loads the locally-built runtime and
+libraries. BenchmarkDotNet uses it via the `--coreRun` argument to benchmark
+private builds without installing them as SDKs.
+
+## Creating a Standalone Benchmark Project
+
+For regression validation and bisection, use a standalone BenchmarkDotNet
+project rather than the full [dotnet/performance](https://github.com/dotnet/performance)
+repo. Standalone projects are faster to build, easier to iterate on, and more
+reliable across different runtime commits.
+
+### From an automated bot issue
+
+Copy the relevant benchmark class from the `dotnet/performance` repo:
+
+1. Clone `dotnet/performance` and locate the benchmark class referenced in the
+   issue's `--filter` argument.
+2. Create a new console project:
+   ```
+   mkdir PerfRepro && cd PerfRepro
+   dotnet new console
+   dotnet add package BenchmarkDotNet
+   ```
+3. Copy the benchmark class (and any helper types) into the project. Adjust
+   namespaces and usings as needed.
+4. Add a `Program.cs` entry point:
+   ```csharp
+   BenchmarkDotNet.Running.BenchmarkSwitcher
+       .FromAssembly(typeof(Program).Assembly)
+       .Run(args);
+   ```
+
+### From a customer report
+
+Write a minimal BenchmarkDotNet benchmark that exercises the reported code path:
+
+1. Create a new console project with `BenchmarkDotNet` as above.
+2. Write a `[Benchmark]` method that calls the API or runs the workload the
+   customer identified as slow.
+3. If the customer provided sample code, adapt it into a proper BDN benchmark
+   with `[GlobalSetup]` for initialization and `[Benchmark]` for the hot path.
+
+## Comparing Good and Bad Commits
+
+Build dotnet/runtime at both the good and bad commits, saving each testhost
+folder:
+
+```
+git checkout {bad-sha}
+build.cmd/sh clr+libs -c release
+cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-bad
+
+git checkout {good-sha}
+build.cmd/sh clr+libs -c release
+cp -r artifacts/bin/testhost/net{ver}-{os}-Release-{arch} /tmp/corerun-good
+```
+
+Run the standalone benchmark with both CoreRuns. BenchmarkDotNet compares them
+side-by-side when given multiple `--coreRun` paths (the first is treated as the
+baseline):
+
+```
+cd PerfRepro
+dotnet run -c Release -f net{ver} -- \
+    --filter '*' \
+    --coreRun /tmp/corerun-good/.../CoreRun \
+              /tmp/corerun-bad/.../CoreRun
+```
+
+To add a statistical significance column, append `--statisticalTest 5%`. This
+performs a Mann–Whitney U test and marks results as `Faster`, `Slower`, or
+`Same`.
+
+## Interpreting Results
+
+| Outcome | Meaning | Next step |
+|---------|---------|-----------|
+| `Slower` with ratio >1.10 | Regression confirmed | Proceed to bisection |
+| `Slower` with ratio 1.05–1.10 | Small regression — likely real but needs confirmation | Re-run with `--iterationCount 30`. If it persists, treat as confirmed. |
+| `Same` or within noise | Not reproduced locally | Check environment differences (OS, arch, CPU). Note in the report. |
+| `Slower` but ratio <1.05 | Marginal — may be noise | Re-run with `--iterationCount 30`. If still marginal, note as inconclusive. |
+
+## Using ResultsComparer
+
+For a thorough comparison of saved BDN result files, use the
+[ResultsComparer](https://github.com/dotnet/performance/tree/main/src/tools/ResultsComparer)
+tool:
+
+```
+dotnet run --project performance/src/tools/ResultsComparer \
+    --base /path/to/baseline-results \
+    --diff /path/to/compare-results \
+    --threshold 5%
+```
+
+## Incremental Rebuilds
+
+Full rebuilds are slow. Minimize per-step build time by rebuilding only the
+affected component:
+
+| Component changed | Fast rebuild command |
+|-------------------|---------------------|
+| A single library (e.g., System.Text.Json) | `cd src/libraries/System.Text.Json/src && dotnet build -c Release --no-restore` |
+| CoreLib | `build.cmd/sh clr.corelib -c Release` followed by `build.cmd/sh libs.pretest -c Release` |
+| CoreCLR (JIT, GC, runtime) | `build.cmd/sh clr -c Release` |
+| All libraries | `build.cmd/sh libs -c Release` |
+
+After an incremental library rebuild (other than System.Private.CoreLib), the
+updated DLL is placed in the testhost folder automatically. CoreRun picks up
+the new version on the next benchmark run.
+
+For System.Private.CoreLib, you must run `build.cmd/sh libs.pretest -c Release`
+after rebuilding to copy the updated CoreLib into the testhost layout;
+otherwise benchmarks may silently run against the older CoreLib.
+
+**Caveat:** If a rebuild crosses a commit that changes the build infrastructure
+(e.g., SDK version bump in `global.json`), the incremental build may fail. In a
+`git bisect` context, use exit code `125` (skip) to handle this gracefully.