ENG-3780: use model id in body for routing by eexwhyzee · Pull Request #40 · PrimeIntellect-ai/router

eexwhyzee · 2026-05-28T18:01:22Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Route typed HTTP requests by the effective model ID, falling back to the request body's model when callers pass model_id = None.
Use the same effective model ID for per-model policy lookup, so LoRA/model-specific requests are selected from the correct worker pool (avoid sending requests to inference servers that don't have the lora adapter loaded yet)
Add router-level regression coverage proving a chat request with model: "rft-run-1" only reaches a worker indexed for that LoRA, not a base-
model-only worker.

Test Plan

able to repoduce the model not found issue on develop during scale up events:

Location: cluster rft-e2e-spk, namespace rft-stack-develop-laguna-xs-2-e2e-spk.

  Error in env-server logs (wordle env_index=0):
  ERROR verifiers.envs.environment.TextArenaEnv
  Aborted rollout due to ModelError() -> NotFoundError("Error code: 404 -
    {'error': {'message': 'The model `rft-nib4mu3obkdaxgh9jrrq7vvu` does not exist.',
               'type': 'NotFoundError', 'param': 'model', 'code': 404}}")
  verifiers/envs/environment.py:657 in _render_stop

  The errors are continuous from ~06:54:26 through ~07:04:06 UTC on 2026-05-28 (hundreds of identical entries — every rollout aborted). The vLLM
  inference server on that namespace doesn't recognize the per-run served model name rft-nib4mu3obkdaxgh9jrrq7vvu.

will test the changes on this branch against the same run setup on develop to confirm fix

Test Result

https://dev-cloud-run-abruy2xuia-ew.a.run.app/dashboard/training/zzf3t7zji3gcb2xc1ji0kmub << ran the same run with the updated router tag and had claude monitor the logs for the entire run:

— 0 NotFoundError / "model does not exist" matches over the full run.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

[!NOTE]
Medium Risk
Changes core inference request routing and worker/model indexing; misconfiguration could still misroute traffic, but behavior is guarded by registry checks and extensive new tests.

Overview
Typed HTTP routing now derives an effective model ID from the explicit route parameter or, when absent, from the JSON model field—using that value for worker selection and per-model load-balancing policy only when the worker registry already indexes that model (or when a run-scoped run_id is set). Unindexed body models no longer force a model filter, preserving generic upstream validation while LoRA / per-run adapter names route only to pods that have loaded them.

Worker startup and add_worker paths call sync_worker_models with every model returned from /v1/models, not just the primary label, so secondary adapters are routable immediately after registration. /v1/rerank ignores the default model placeholder for this fallback.

Regression tests cover LoRA-only worker selection, filter resolution edge cases, end-to-end chat routing, and discovery-time indexing.

^{Reviewed by Cursor Bugbot for commit 0fc156a. Bugbot is set up for automated code reviews on this repo. Configure here.}

[!NOTE]

Route requests using model ID from request body when no explicit model ID is supplied

When no explicit model_id query parameter is present, the router now uses the body's model field as a routing filter, but only if that model is already indexed in the worker registry or the request is run-scoped.

Workers are immediately indexed with their discovered models on registration (both at Router::new time and during dynamic service-discovery), so model-filtered routing is available right away.

The DEFAULT_MODEL_NAME value in /v1/rerank request bodies is ignored as a routing filter to avoid misrouting.

Behavioral Change: requests with an unindexed body model and no run_id now route without a model filter instead of attempting to filter by that model.

^{Macroscope summarized 0fc156a.}

macroscopeapp · 2026-05-28T18:08:09Z

Approvability

Verdict: Needs human review

This change modifies request routing behavior to filter workers based on the model specified in the request body. While framed as a bug fix, it changes which workers receive requests in production, affecting routing decisions that weren't previously model-filtered in this code path.

^{You can customize Macroscope's approvability policy. Learn more.}

Use request body models as worker filters only when the router registry has indexed that model, while keeping run-scoped requests strict. Treat rerank's "default" sentinel as unspecified so omitted-model rerank requests continue to route normally. Also sync all advertised worker models during startup and add_worker so workers serving multiple models or LoRA adapters are routable immediately, before the next health refresh. Adds regression coverage for indexed LoRA routing, rerank default handling, and multi-model worker registration.

use model id in body for routing

9c14856

eexwhyzee changed the title ~~fix: use model id in body for routing~~ ENG-3780: use model id in body for routing May 28, 2026

eexwhyzee requested a review from JannikSt May 28, 2026 23:16

JannikSt approved these changes May 29, 2026

View reviewed changes

JannikSt merged commit c97c774 into main May 29, 2026
9 of 10 checks passed

JannikSt mentioned this pull request May 29, 2026

release: v0.1.27 #41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG-3780: use model id in body for routing#40

ENG-3780: use model id in body for routing#40
JannikSt merged 2 commits into
mainfrom
fix/route-by-model-from-body

eexwhyzee commented May 28, 2026 •

edited

Loading

Uh oh!

macroscopeapp Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eexwhyzee commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Route requests using model ID from request body when no explicit model ID is supplied

Uh oh!

macroscopeapp Bot commented May 28, 2026

Approvability

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eexwhyzee commented May 28, 2026 •

edited

Loading