Skip to content

ENG-3780: use model id in body for routing#40

Merged
JannikSt merged 2 commits into
mainfrom
fix/route-by-model-from-body
May 29, 2026
Merged

ENG-3780: use model id in body for routing#40
JannikSt merged 2 commits into
mainfrom
fix/route-by-model-from-body

Conversation

@eexwhyzee
Copy link
Copy Markdown

@eexwhyzee eexwhyzee commented May 28, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

  • Route typed HTTP requests by the effective model ID, falling back to the request body's model when callers pass model_id = None.
  • Use the same effective model ID for per-model policy lookup, so LoRA/model-specific requests are selected from the correct worker pool (avoid sending requests to inference servers that don't have the lora adapter loaded yet)
  • Add router-level regression coverage proving a chat request with model: "rft-run-1" only reaches a worker indexed for that LoRA, not a base-
    model-only worker.

Test Plan

able to repoduce the model not found issue on develop during scale up events:

Location: cluster rft-e2e-spk, namespace rft-stack-develop-laguna-xs-2-e2e-spk.

  Error in env-server logs (wordle env_index=0):
  ERROR verifiers.envs.environment.TextArenaEnv
  Aborted rollout due to ModelError() -> NotFoundError("Error code: 404 -
    {'error': {'message': 'The model `rft-nib4mu3obkdaxgh9jrrq7vvu` does not exist.',
               'type': 'NotFoundError', 'param': 'model', 'code': 404}}")
  verifiers/envs/environment.py:657 in _render_stop

  The errors are continuous from ~06:54:26 through ~07:04:06 UTC on 2026-05-28 (hundreds of identical entries — every rollout aborted). The vLLM
  inference server on that namespace doesn't recognize the per-run served model name rft-nib4mu3obkdaxgh9jrrq7vvu.
Screenshot 2026-05-28 at 11 03 53 AM

will test the changes on this branch against the same run setup on develop to confirm fix

Test Result

https://dev-cloud-run-abruy2xuia-ew.a.run.app/dashboard/training/zzf3t7zji3gcb2xc1ji0kmub << ran the same run with the updated router tag and had claude monitor the logs for the entire run:

— 0 NotFoundError / "model does not exist" matches over the full run.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results

[!NOTE]
Medium Risk
Changes core inference request routing and worker/model indexing; misconfiguration could still misroute traffic, but behavior is guarded by registry checks and extensive new tests.

Overview
Typed HTTP routing now derives an effective model ID from the explicit route parameter or, when absent, from the JSON model field—using that value for worker selection and per-model load-balancing policy only when the worker registry already indexes that model (or when a run-scoped run_id is set). Unindexed body models no longer force a model filter, preserving generic upstream validation while LoRA / per-run adapter names route only to pods that have loaded them.

Worker startup and add_worker paths call sync_worker_models with every model returned from /v1/models, not just the primary label, so secondary adapters are routable immediately after registration. /v1/rerank ignores the default model placeholder for this fallback.

Regression tests cover LoRA-only worker selection, filter resolution edge cases, end-to-end chat routing, and discovery-time indexing.

Reviewed by Cursor Bugbot for commit 0fc156a. Bugbot is set up for automated code reviews on this repo. Configure here.

[!NOTE]

Route requests using model ID from request body when no explicit model ID is supplied

  • When no explicit model_id query parameter is present, the router now uses the body's model field as a routing filter, but only if that model is already indexed in the worker registry or the request is run-scoped.
  • Workers are immediately indexed with their discovered models on registration (both at Router::new time and during dynamic service-discovery), so model-filtered routing is available right away.
  • The DEFAULT_MODEL_NAME value in /v1/rerank request bodies is ignored as a routing filter to avoid misrouting.
  • Behavioral Change: requests with an unindexed body model and no run_id now route without a model filter instead of attempting to filter by that model.

Macroscope summarized 0fc156a.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 28, 2026

Approvability

Verdict: Needs human review

This change modifies request routing behavior to filter workers based on the model specified in the request body. While framed as a bug fix, it changes which workers receive requests in production, affecting routing decisions that weren't previously model-filtered in this code path.

You can customize Macroscope's approvability policy. Learn more.

Use request body models as worker filters only when the router registry
has indexed that model, while keeping run-scoped requests strict. Treat
rerank's "default" sentinel as unspecified so omitted-model rerank
requests continue to route normally.

Also sync all advertised worker models during startup and add_worker so
workers serving multiple models or LoRA adapters are routable
immediately, before the next health refresh.

Adds regression coverage for indexed LoRA routing, rerank default
handling, and multi-model worker registration.
@eexwhyzee eexwhyzee changed the title fix: use model id in body for routing ENG-3780: use model id in body for routing May 28, 2026
@eexwhyzee eexwhyzee requested a review from JannikSt May 28, 2026 23:16
@JannikSt JannikSt merged commit c97c774 into main May 29, 2026
9 of 10 checks passed
@JannikSt JannikSt mentioned this pull request May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants