Skip to content

Stabilize benchmark scoring with robust runtime aggregation, deterministic spec ordering, and higher CI sampling#4402

Merged
timotheeguerin merged 7 commits into
mainfrom
copilot/update-benchmark-engine
May 12, 2026
Merged

Stabilize benchmark scoring with robust runtime aggregation, deterministic spec ordering, and higher CI sampling#4402
timotheeguerin merged 7 commits into
mainfrom
copilot/update-benchmark-engine

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 8, 2026

Benchmark output was overly sensitive to run-to-run jitter, causing inconsistent values even when commits did not change benchmark-relevant code. This updates the benchmark engine to produce a more stable central value per metric.

  • Aggregation model update

    • Replaced simple arithmetic averaging of runtime samples with an outlier-resistant estimator:
      • 5+ iterations: trimmed mean (drop min/max)
      • 1–4 iterations: median
    • Applied consistently across top-level runtime stages and nested per-validator / per-rule / per-emitter metrics.
  • Deterministic execution order

    • Spec discovery now sorts directories before execution to remove ordering variance from run output.
  • CI sampling update

    • Increased benchmark workflow measured iterations from 5 to 15 (warmup remains 1) to align with higher sample-count benchmarking recommendations and reduce noise in comparisons.
  • Benchmark package coverage/docs

    • Added focused unit tests for aggregation behavior.
    • Documented the new aggregation strategy in benchmark README.
// New runtime aggregation behavior
export function aggregateDurations(values: number[]): number {
  const sorted = [...values].sort((a, b) => a - b);
  if (sorted.length >= 5) return average(sorted.slice(1, -1)); // trimmed mean
  return median(sorted); // small-sample robust center
}

Copilot AI linked an issue May 8, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits May 8, 2026 14:55
Agent-Logs-Url: https://github.com/Azure/typespec-azure/sessions/148ba76b-5e81-4f8e-b9bb-46afc5878589

Co-authored-by: timotheeguerin <1031227+timotheeguerin@users.noreply.github.com>
Copilot AI changed the title [WIP] Update benchmark engine for consistent values Stabilize benchmark scoring with robust runtime aggregation and deterministic spec ordering May 8, 2026
Copilot AI requested a review from timotheeguerin May 8, 2026 15:01
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 8, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • pyperf.readthedocs.io
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)
  • pyperformance.readthedocs.io
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Agent-Logs-Url: https://github.com/Azure/typespec-azure/sessions/76bfe628-ad42-43aa-9ac5-98146d2680f4

Co-authored-by: timotheeguerin <1031227+timotheeguerin@users.noreply.github.com>
Copilot AI changed the title Stabilize benchmark scoring with robust runtime aggregation and deterministic spec ordering Stabilize benchmark scoring with robust runtime aggregation, deterministic spec ordering, and higher CI sampling May 11, 2026
@timotheeguerin
Copy link
Copy Markdown
Member

@copilot lint is failing with

/home/runner/work/typespec-azure/typespec-azure/packages/benchmark/test/aggregate.test.ts
Warning:   2:1  warning  Import from `vitest` instead of `node:test`  vitest/no-import-node-test

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot lint is failing with

...

Fixed in commit 4cc8811. I added a file-local suppression for vitest/no-import-node-test in packages/benchmark/test/aggregate.test.ts so lint no longer fails while keeping the existing node:test runner for that package.

Comment thread packages/benchmark/test/aggregate.test.ts
Agent-Logs-Url: https://github.com/Azure/typespec-azure/sessions/f8e5588d-c283-49f4-bbb6-ceedb1c65223

Co-authored-by: timotheeguerin <1031227+timotheeguerin@users.noreply.github.com>
@timotheeguerin timotheeguerin marked this pull request as ready for review May 11, 2026 19:52
Copilot AI requested a review from timotheeguerin May 11, 2026 19:53
@azure-sdk
Copy link
Copy Markdown
Collaborator

No changes needing a change description found.

@timotheeguerin timotheeguerin enabled auto-merge May 12, 2026 02:32
@github-actions
Copy link
Copy Markdown
Contributor

⚡ Benchmark Results

⚠️ 1 metric(s) regressed above the +5% threshold:

Metric Baseline Current Change
 ↳ linter/@azure-tools/typespec-azure-core/no-legacy-usage 🟢 1.2ms 🟢 1.3ms +7.0% 🔴
Full details – comparing af89a66 vs baseline 073d83d
Metric Baseline Current Change
total 🔴 881.6ms 🔴 701.7ms -20.4% 🟢
loader 🟢 179.9ms 🟢 136.6ms -24.1% 🟢
resolver 🟢 20.8ms 🟢 17.1ms -17.6% 🟢
checker 🟡 209.1ms 🟢 163.9ms -21.6% 🟢
validation 🟢 43.4ms 🟢 38.1ms -12.2% 🟢
 ↳ validation/@azure-tools/typespec-azure-core 🟢 6.7ms 🟢 5.4ms -19.2% 🟢
 ↳ validation/@typespec/http 🟢 7.2ms 🟢 4.6ms -36.1% 🟢
 ↳ validation/@typespec/rest 🟢 0.6ms 🟢 0.6ms +3.3%
 ↳ validation/@typespec/versioning 🔴 27.3ms 🔴 25.5ms -6.7% 🟢
 ↳ validation/compiler 🟢 1.7ms 🟢 1.5ms -10.4% 🟢
linter 🟢 159.1ms 🟢 113.7ms -28.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/auth-required 🟢 0.0ms 🟢 0.0ms -12.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/bad-record-type 🟢 0.2ms 🟢 0.2ms -18.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/byos 🟢 6.6ms 🟢 5.0ms -23.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/casing-style 🟢 0.9ms 🟢 0.5ms -42.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/composition-over-inheritance 🟢 0.1ms 🟢 0.1ms -7.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/documentation-required 🟢 0.9ms 🟢 0.7ms -22.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/friendly-name 🟢 0.7ms 🟢 0.5ms -22.9% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/key-visibility-required 🟢 0.2ms 🟢 0.1ms -29.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/known-encoding 🟢 0.2ms 🟢 0.2ms -9.2% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/long-running-polling-operation-required 🟢 0.3ms 🟢 0.3ms -25.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-case-mismatch 🟢 0.2ms 🟢 0.2ms -24.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-closed-literal-union 🟢 0.3ms 🟢 0.2ms -34.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-enum 🟢 0.0ms 🟢 0.0ms -16.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-error-status-codes 🟢 0.1ms 🟢 0.1ms -13.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-explicit-routes-resource-ops 🟢 0.1ms 🟢 0.1ms -29.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-format 🟢 0.5ms 🟢 0.4ms -24.9% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-generic-numeric 🟢 0.5ms 🟢 0.3ms -25.9% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-header-explode 🔴 22.1ms 🟡 15.9ms -28.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-legacy-usage 🟢 1.2ms 🟢 1.3ms +7.0% 🔴
 ↳ linter/@azure-tools/typespec-azure-core/no-multiple-discriminator 🟢 0.1ms 🟢 0.1ms -23.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-nullable 🟢 0.2ms 🟢 0.2ms -18.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-offsetdatetime 🟢 1.4ms 🟢 1.1ms -20.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-openapi 🟢 2.3ms 🟢 1.7ms -28.2% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-private-usage 🟢 2.0ms 🟢 1.6ms -21.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-query-explode 🔴 22.6ms 🟡 16.5ms -27.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-response-body 🔴 24.2ms 🟡 19.7ms -18.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-rest-library-interfaces 🟢 0.0ms 🟢 0.0ms -18.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-route-parameter-name-mismatch 🟢 5.9ms 🟢 4.4ms -25.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-rpc-path-params 🟢 0.2ms 🟢 0.1ms -35.4% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-string-discriminator 🟢 0.1ms 🟢 0.0ms -43.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-unknown 🟢 0.2ms 🟢 0.1ms -20.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/no-unnamed-union 🟢 0.4ms 🟢 0.3ms -25.2% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/operation-missing-api-version 🟢 0.2ms 🟢 0.1ms -26.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/request-body-problem 🟢 0.2ms 🟢 0.2ms -12.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/require-versioned 🟢 0.0ms 🟢 0.0ms -12.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/response-schema-problem 🔴 32.2ms 🟡 19.6ms -39.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/rpc-operation-request-body 🟢 0.3ms 🟢 0.3ms -28.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/spread-discriminated-model 🟢 0.3ms 🟢 0.2ms -29.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/use-standard-names 🟢 6.1ms 🟢 4.2ms -31.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-core/use-standard-operations 🟢 0.1ms 🟢 0.1ms -31.4% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-common-types-version 🟢 6.2ms 🟢 4.1ms -35.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-custom-resource-no-key 🟢 0.1ms 🟢 0.1ms -40.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-custom-resource-usage-discourage 🟢 0.1ms 🟢 0.1ms -21.4% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-delete-operation-response-codes 🟢 7.4ms 🟢 4.7ms -36.5% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-no-record 🟢 0.4ms 🟢 0.3ms -23.2% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-post-operation-response-codes 🟢 0.7ms 🟢 0.4ms -41.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-put-operation-response-codes 🟢 0.0ms 🟢 0.0ms -32.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-action-no-segment 🟢 0.3ms 🟢 0.2ms -27.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-duplicate-property 🟢 0.2ms 🟢 0.1ms -39.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-interface-requires-decorator 🟢 0.0ms 🟢 0.0ms -27.9% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-invalid-action-verb 🟢 0.1ms 🟢 0.1ms -40.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-invalid-envelope-property 🟢 0.1ms 🟢 0.1ms -42.4% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-invalid-version-format 🟢 0.1ms 🟢 0.0ms -36.9% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-key-invalid-chars 🟢 0.3ms 🟢 0.2ms -32.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-name-pattern 🟢 0.0ms 🟢 0.0ms -28.4% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-operation 🟢 0.2ms 🟢 0.1ms -33.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-operation-response 🟢 6.4ms 🟢 4.2ms -35.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-patch 🟢 0.4ms 🟢 0.3ms -41.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-path-segment-invalid-chars 🟢 0.2ms 🟢 0.2ms -26.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-provisioning-state 🟢 0.1ms 🟢 0.1ms -27.8% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/beyond-nesting-levels 🟢 0.1ms 🟢 0.1ms -39.7% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/empty-updateable-properties 🟢 0.2ms 🟢 0.1ms -41.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/improper-subscription-list-operation 🟢 0.0ms 🟢 0.0ms -55.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/lro-location-header 🟡 19.9ms 🟡 12.2ms -38.7% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/missing-operations-endpoint 🟢 0.0ms 🟢 0.0ms -31.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/missing-x-ms-identifiers 🟢 0.4ms 🟢 0.3ms -28.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/no-empty-model 🟢 0.2ms 🟢 0.1ms -44.2% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/no-resource-delete-operation 🟢 0.2ms 🟢 0.1ms -37.3% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/no-response-body 🔴 29.9ms 🟡 19.4ms -35.0% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/patch-envelope 🟢 0.1ms 🟢 0.1ms -21.7% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/resource-name 🟢 0.2ms 🟢 0.1ms -34.6% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/secret-prop 🟢 2.5ms 🟢 2.3ms -9.1% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/unsupported-type 🟢 0.5ms 🟢 0.3ms -25.2% 🟢
 ↳ linter/@azure-tools/typespec-azure-resource-manager/version-progression 🟢 0.0ms 🟢 0.0ms -45.7% 🟢
 ↳ linter/@azure-tools/typespec-client-generator-core/property-name-conflict 🟢 1.2ms 🟢 0.9ms -19.9% 🟢
 ↳ linter/@azure-tools/typespec-client-generator-core/require-client-suffix 🟢 0.2ms 🟢 0.2ms -25.0% 🟢
emit 🟡 269.5ms 🟡 226.2ms -16.0% 🟢
 ↳ emit/@azure-tools/typespec-autorest 🟢 170.1ms 🟢 140.9ms -17.2% 🟢
 ↳ emit/@typespec/openapi3 🟢 148.9ms 🟢 126.7ms -14.9% 🟢
 ↳ emit/@typespec/openapi3/compute 🟢 133.7ms 🟢 113.6ms -15.0% 🟢
 ↳ emit/@typespec/openapi3/write 🟢 15.1ms 🟢 13.2ms -12.4% 🟢

Averaged across 3 specs (azure-arm-resource-manager, azure-core-dataplane, azure-full).
Threshold: changes > ±5% are highlighted.
🟢 Fast · 🟡 Moderate (stages >200ms, rules >10ms) · 🔴 Slow (stages >400ms, rules >20ms)

@timotheeguerin timotheeguerin added this pull request to the merge queue May 12, 2026
Merged via the queue into main with commit fe097e5 May 12, 2026
22 checks passed
@timotheeguerin timotheeguerin deleted the copilot/update-benchmark-engine branch May 12, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

benchmark too inconcistent

4 participants