Malkovsky · Malkovsky · Apr 18, 2026 · Apr 18, 2026
diff --git a/.kilo/command/benchmarks-affected.md b/.kilo/command/benchmarks-affected.md
@@ -0,0 +1,34 @@
+---
+description: Scan current branch and report impacted benchmark targets/functions.
+---
+
+# Benchmarks Affected
+
+Identify which benchmark binaries and benchmark functions are affected by changes on the current branch.
+
+Use the `benchmarks-affected` skill as the single source of truth for workflow details and guardrails.
+Do not duplicate or override the skill instructions in this command.
+
+## Inputs
+
+- Optional `--baseline <ref>` (default: `main`)
+- Optional `--compile-commands <path>`
+- Optional `--no-include-working-tree`
+- Optional `--format <text|json>` (default: `text`)
+
+## Workflow
+
+1. Execute the `benchmarks-affected` skill workflow.
+2. Pass through command inputs to the analyzer invocation defined by the skill.
+3. Report results with these sections:
+   - Changed files
+   - Affected benchmark targets
+   - Affected benchmark functions
+   - Suggested `--benchmark_filter` regex
+   - Any warnings/failures
+
+## Output rules
+
+1. If `affected_benchmarks` is non-empty, prioritize those names.
+2. If `affected_benchmarks` is empty but benchmark targets are affected, mark result as partial and include target-level impact.
+3. Do not run full benchmark suites in this command; this command is for impact discovery only.
diff --git a/.kilo/command/perf-review.md b/.kilo/command/perf-review.md
@@ -19,119 +19,43 @@ Non-negotiable requirements:
 If arguments are omitted:
 - Default target branch to PR base branch from `gh pr view --json baseRefName` when available.
 - Fall back target branch to `main`.
-- Default filter must be **targeted**, not full-suite:
-  - Derive from changed files and changed symbols.
-  - If `include/pixie/bitvector.h` changed in select path, default to `BM_Select` and add `BM_RankNonInterleaved` as control.
-  - Run full selected suites only as last resort when mapping fails.
-
-## Step 1 - Resolve Branches and Revisions
-
-1. Identify contender branch and hash:
-   - Contender branch: current checked-out branch (or `HEAD` if detached).
-   - Contender hash: `git rev-parse --short HEAD`.
-2. Identify baseline branch:
-   - Use `--target` if provided.
-   - Else use PR base branch from GitHub CLI when available.
-   - Else use `main`.
-3. Resolve baseline hash with `git rev-parse --short <baseline-ref>`.
-4. Print branch and hash mapping before running benchmarks.
-
-## Step 2 - Select Relevant Benchmark Binaries
-
-Inspect changed files with:
-
-`git diff --name-only <baseline-ref>...HEAD`
-
-Map file paths to benchmark binaries:
-
-| Changed path pattern | Benchmark binary | Coverage |
-|---|---|---|
-| `include/pixie/bitvector*`, `include/*bit_vector*`, `include/interleaved*` | `benchmarks` | BitVector rank/select |
-| `include/rmm*` | `bench_rmm` | RmM tree operations |
-| `include/louds*` | `louds_tree_benchmarks` | LOUDS traversal |
-| `include/simd*`, `include/aligned*` | `alignment_comparison` | SIMD and alignment |
-| `include/misc/*` | all relevant | Differential helpers |
-| `CMakeLists.txt`, benchmark infra, broad/unknown changes | all benchmarks | Conservative full run |
-
-Available benchmark binaries:
-- `benchmarks`
-- `bench_rmm`
-- `bench_rmm_sdsl`
-- `louds_tree_benchmarks`
-- `alignment_comparison`
-
-If the mapping is ambiguous, run all benchmark binaries but still apply a focused filter first.
-If `--filter` is provided, pass it through as `--benchmark_filter`.
-Print selected binaries and why they were selected.
+
+Filter handling:
+- If `--filter` is provided, pass it through.
+- Else use the filter produced by `benchmarks-affected` through `benchmarks-compare-revisions`.
+- If no filter can be derived, run conservative full-binary compare for impacted binaries.
+
+## Step 1 - Resolve branches and hashes
+
+1. Resolve contender from current checkout (`HEAD`) and compute short hash.
+2. Resolve baseline branch using precedence: `--target` -> PR base from `gh pr view --json baseRefName` -> `main`.
+3. Resolve baseline short hash.
+4. Print branch/hash mapping before benchmark execution.
+
+## Step 2 - Run timing comparison via skill (single source of truth)
+
+Use `benchmarks-compare-revisions` as the single source of truth for revision builds, benchmark scope, compare.py flow, retry policy, and guardrails.
+
+Do not duplicate or override its internal build/run steps in this command.
+
+Pass-through inputs:
+- Baseline ref/hash from Step 1.
+- Contender ref/hash from Step 1.
+- Optional `--filter` override.
+
+Consume outputs from `benchmarks-compare-revisions`:
+- Baseline and contender benchmark JSON artifacts.
+- compare.py output per binary.
+- Effective filter used.
+- Scope metadata from `benchmarks-affected` (`affected_benchmark_targets`, `affected_benchmarks`) when available.
 
 Execution guardrails:
-- Do not use background jobs (`nohup`, `&`) for benchmark runs in CI.
-- Do not interleave multiple benchmark runs into one shell command stream.
-- Run one benchmark command at a time and wait for completion.
-
-## Step 3 - Build Both Revisions (Timing and Profiling Builds)
-
-Use isolated build directories per short hash.
-
-1. Capture original ref (`git rev-parse --abbrev-ref HEAD` or detached `HEAD`).
-2. If worktree is dirty, stash safely with untracked files:
-   - `git stash push -u -m "perf-review-auto-stash"`
-3. Build baseline revision:
-   - `git checkout <baseline-hash-or-ref>`
-   - Timing build (required):
-     - `cmake -B build/benchmarks-all_bench_<baseline_hash> -DCMAKE_BUILD_TYPE=Release -DPIXIE_BENCHMARKS=ON`
-     - `cmake --build build/benchmarks-all_bench_<baseline_hash> --config Release -j`
-   - Profiling build (Linux only, recommended):
-     - `cmake -B build/benchmarks-diagnostic_bench_<baseline_hash> -DCMAKE_BUILD_TYPE=RelWithDebInfo -DPIXIE_BENCHMARKS=ON -DBENCHMARK_ENABLE_LIBPFM=ON -DPIXIE_DIAGNOSTICS=ON`
-     - `cmake --build build/benchmarks-diagnostic_bench_<baseline_hash> --config RelWithDebInfo -j`
-4. Build contender revision:
-   - `git checkout <contender-hash-or-original-ref>`
-   - Repeat timing and profiling build with contender hash suffix.
-5. Restore original ref and restore stashed state if a stash was created.
-
-Critical guardrails:
-- Never use Debug binaries for timing review.
-- Timing comparisons must use `benchmarks-all` Release builds.
-- Profiling counters should use `benchmarks-diagnostic` RelWithDebInfo builds.
-
-## Step 4 - Resolve Binary Paths
-
-Support both generator layouts:
-
-- Multi-config: `build/<dir>/Release/<binary>` or `build/<dir>/RelWithDebInfo/<binary>`
-- Single-config: `build/<dir>/<binary>`
-
-For each needed binary, detect the existing executable path before running.
-If a required binary is missing, report failure and stop with a blocked verdict.
-
-## Step 5 - Run Timing Comparison (Primary Judgment)
-
-Use a deterministic JSON-first workflow. Do not rely on long-running `compare.py` binary-vs-binary mode.
-
-1. Verify Python benchmark tooling once before runs:
-   - `python3 -c "import numpy, scipy"`
-2. For each selected benchmark binary, run baseline then contender sequentially, each with explicit JSON out:
-   - `--benchmark_filter="<filter>"`
-   - `--benchmark_format=json`
-   - `--benchmark_out=<file>.json`
-   - `--benchmark_report_aggregates_only=true`
-   - `--benchmark_display_aggregates_only=true`
-3. Suppress benchmark stdout/stderr noise when generating JSON artifacts so files stay valid:
-   - `> <file>.log 2>&1`
-4. Validate both JSON files before comparison:
-   - `python3 -m json.tool <file>.json > /dev/null`
-5. Compare using one of:
-   - `python3 <compare.py> -a benchmarks <baseline.json> <contender.json>`
-   - or a deterministic local Python diff script over aggregate means.
-6. Keep raw JSON files and comparison output for auditability.
-
-Timeout and retry policy:
-- Use command timeouts that match benchmark scope.
-- If a run times out once, narrow filter immediately and retry once.
-- Maximum retry count per benchmark group: 1.
-- If still timing out, produce a blocked/partial verdict with explicit scope limitations.
-
-## Step 6 - Collect Hardware Counter Profiles (Linux Only)
+- Run benchmarks sequentially.
+- No background jobs (`nohup`, `&`).
+- Use Release timing builds only.
+- If timing comparison fails, return blocked verdict with exact failure points.
+
+## Step 3 - Collect hardware counter profiles (Linux only, optional)
 
 Run a preflight first to avoid wasted attempts:
 1. Execute one tiny benchmark with perf counters (e.g. one benchmark case) and inspect output for counter availability.
@@ -157,7 +81,7 @@ Compute derived metrics when denominators are non-zero:
 
 If profiling is unavailable (non-Linux, libpfm missing, or perf permissions blocked), continue with timing-only review and explicitly mark profiling as unavailable in the report.
 
-## Step 7 - Analyze Timing and Counter Data
+## Step 4 - Analyze timing and counter data
 
 Timing classification per benchmark entry:
 - Improvement: time delta < -5%
@@ -181,7 +105,7 @@ Noise-control expectations:
 - Include at least one control benchmark family expected to be unaffected by the code change.
 - Treat isolated swings without pattern as noise unless reproduced across related sizes/fill ratios.
 
-## Step 8 - Produce Final Markdown Report
+## Step 5 - Produce final markdown report
 
 Return a structured markdown report with this shape:
 

diff --git a/.kilo/skills/benchmarks-affected/SKILL.md b/.kilo/skills/benchmarks-affected/SKILL.md
@@ -0,0 +1,77 @@
+---
+name: benchmarks-affected
+description: Analyze current branch versus a baseline and extract affected benchmark targets and benchmark functions using compile_commands and clang AST.
+---
+
+# Benchmarks Affected Skill
+
+Use this skill to identify exactly which benchmark binaries and benchmark functions are affected by code changes on the current branch.
+
+It implements a two-stage workflow:
+
+1. `compile_commands.json` analysis to determine affected compile targets.
+2. Clang AST analysis to determine affected benchmark functions.
+
+## Goal
+
+Given `HEAD` and a baseline branch (default `main`), produce:
+
+- Changed files.
+- Affected targets (with emphasis on benchmark targets).
+- Exact benchmark functions impacted by the changes.
+- A ready-to-use Google Benchmark filter regex.
+
+## Prerequisites
+
+1. Build tree with benchmarks enabled and compile database exported:
+
+```bash
+BUILD_SUFFIX=local
+cmake -B build/benchmarks-all_${BUILD_SUFFIX} \
+  -DCMAKE_BUILD_TYPE=Release \
+  -DPIXIE_BENCHMARKS=ON \
+  -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
+cmake --build build/benchmarks-all_${BUILD_SUFFIX} --config Release -j
+```
+
+2. `clang++` must be available on `PATH` (used for AST dump).
+
+## Run
+
+```bash
+python3 .kilo/skills/benchmarks-affected/analyze_benchmarks_affected.py \
+  --baseline main \
+  --compile-commands build/benchmarks-all_local/compile_commands.json \
+  --format json
+```
+
+If `--compile-commands` is omitted, the script auto-selects the most recently modified `build/**/compile_commands.json`.
+Working tree changes are included by default. Use `--no-include-working-tree` to restrict analysis to `<baseline>...HEAD` only.
+
+## Output
+
+The analyzer reports:
+
+- `affected_targets`: impacted CMake targets inferred from compile dependency analysis.
+- `affected_benchmark_targets`: subset of benchmark binaries impacted.
+- `affected_benchmarks`: precise benchmark function names from AST-level call analysis.
+- `suggested_filter_regex`: regex to pass as `--benchmark_filter`.
+
+## How to Use Findings
+
+1. Build only impacted benchmark binaries where feasible.
+2. Run benchmark binaries with the suggested filter:
+
+```bash
+FILTER='^(BM_RankNonInterleaved|BM_SelectNonInterleaved)(/|$)'
+build/benchmarks-all_local/benchmarks --benchmark_filter="${FILTER}"
+```
+
+3. If impact mapping is broad/uncertain, run full binary for selected benchmark target(s).
+
+## Guardrails
+
+1. Keep baseline comparison at merge-base style diff: `<baseline>...HEAD`.
+2. Use Release binaries for timing runs.
+3. If AST parse fails for a TU, still trust compile target impact and mark benchmark-function scope as partial.
+4. If benchmark infra (`CMakeLists.txt`, benchmark source layout) changed, fall back to conservative benchmark selection.