Skip to content

Update vLLM-SR RouterArena submission#131

Merged
yl231 merged 1 commit into
RouteWorks:mainfrom
Xunzhuo:vllm/vllm-sr-v350-routerarena
Jun 4, 2026
Merged

Update vLLM-SR RouterArena submission#131
yl231 merged 1 commit into
RouteWorks:mainfrom
Xunzhuo:vllm/vllm-sr-v350-routerarena

Conversation

@Xunzhuo
Copy link
Copy Markdown
Contributor

@Xunzhuo Xunzhuo commented Jun 4, 2026

Summary

Update the vllm-sr RouterArena submission artifacts for the vLLM Semantic Router

This submission updates:

  • router_inference/config/vllm-sr.json
  • router_inference/predictions/vllm-sr.json
  • router_inference/predictions/vllm-sr-robustness.json

Notes

  • The router is not trained, fit, or tuned on RouterArena data.
  • The routing policy is a general vLLM Semantic Router recipe using deterministic signals/projections; it does not encode RouterArena sample IDs, gold answers, or generated-result lookup tables.
  • Full prediction generated_result fields are populated for all 8,400 regular entries with success=true.
  • Robustness predictions include 420 entries; per README, no robustness generated_result fields are required.
  • The vLLM Semantic Router service used for generation was served on AMD

Copilot AI review requested due to automatic review settings June 4, 2026 06:46
@Xunzhuo
Copy link
Copy Markdown
Contributor Author

Xunzhuo commented Jun 4, 2026

/evaluate

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the vLLM Semantic Router configuration to use a new RouterArena “recipe” with a refreshed model list and adds an explicit description, while removing endpoint and category-mapping fields.

Changes:

  • Replaces the previous model set with updated provider/model identifiers and changes the default model.
  • Removes router_endpoint, base_url, and category_model_mapping from the config.
  • Adds a descriptive description field clarifying the recipe and data-embedding constraints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread router_inference/config/vllm-sr.json Outdated
Comment thread router_inference/config/vllm-sr.json Outdated
Comment thread router_inference/config/vllm-sr.json
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Router Evaluation Results

Router: vllm-sr
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7420
Accuracy 75.07%
Total Cost $1.340293
Avg Cost per Query $0.000160
Avg Cost per 1K Queries $0.1596
Number of Queries 8400
Abnormal Entries 0
Robustness Score 0.7690

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.1809
Opt.Cost (Cost Efficiency) 0.2407
Opt.Acc (Accuracy vs Optimal) 0.8969

Evaluation completed by RouterArena automated workflow

Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
@Xunzhuo Xunzhuo force-pushed the vllm/vllm-sr-v350-routerarena branch from 16c1395 to 6f63fea Compare June 4, 2026 11:09
@Xunzhuo
Copy link
Copy Markdown
Contributor Author

Xunzhuo commented Jun 4, 2026

/evaluate

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Router Evaluation Results

Router: vllm-sr
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7538
Accuracy 75.97%
Total Cost $0.921463
Avg Cost per Query $0.000110
Avg Cost per 1K Queries $0.1097
Number of Queries 8400
Abnormal Entries 0
Robustness Score 0.7310

Optimality Metrics

Metric Value
Opt.Sel (Optimal Selection) 0.2012
Opt.Cost (Cost Efficiency) 0.2452
Opt.Acc (Accuracy vs Optimal) 0.8987

Evaluation completed by RouterArena automated workflow

@yl231 yl231 merged commit b7bd454 into RouteWorks:main Jun 4, 2026
6 checks passed
yl231 added a commit that referenced this pull request Jun 4, 2026
vLLM Semantic Router resubmission (#131) re-evaluated at RouterArena
score 0.7538 (was 0.6723). Updated its row and re-sorted ranks 1-9:

  Arena    67.23 -> 75.38
  Accuracy 66.53 -> 75.97
  Cost/1K  $0.06 -> $0.11
  Opt.Sel  84.66 -> 20.12
  Opt.Cost 90.71 -> 24.52
  Opt.Acc  89.24 -> 89.87
  Robust   90.95 -> 73.10

At 75.38 vLLM-SR overtakes Sqwish (75.27) for #1; Sqwish, AgentForge,
Nadir, Weave, OrcaRouter-Adaptive, Azure, R2-Router and Auto each shift
down one rank. Ranks 10-20 unchanged. Metrics taken from the final
/evaluate run on the merged submission (verified byte-identical to main).

Co-authored-by: Louie Lu <yl231@datalab2.cs.rice.edu>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants