Skip to content

feat: fallback targets for provider errors #899

@christso

Description

@christso

Problem

When a target provider returns errors (rate limits, outages, transient failures), the eval fails immediately. The existing retry config (retry_initial_delay_ms, retry_backoff_factor, retry_status_codes) retries the same provider, but if that provider is down or heavily rate-limited, retries just burn time.

Proposal

Add a fallback_targets field to target definitions in targets.yaml:

- name: default
  provider: openai
  base_url: https://models.github.ai/inference/v1
  api_key: ${{ GH_MODELS_TOKEN }}
  model: ${{ GH_MODELS_MODEL }}
  fallback_targets:
    - gemini-flash
    - azure-llm

When the primary target returns a retryable error (429, 503, connection timeout), the runner should:

  1. Retry with exponential backoff on the primary (existing behavior)
  2. After exhausting retries, try fallback_targets in order
  3. Record which target actually served the response in the result JSONL

Similarly for the agent target:

- name: agent
  provider: ${{ AGENT_PROVIDER }}
  model: ${{ AGENT_MODEL }}
  grader_target: grader
  fallback_targets:
    - claude
    - copilot-cli

Design considerations

  • Grader fallback: The grader target used by grader_target should also support fallbacks, since LLM-as-judge calls hit the same rate limits
  • Status codes: Should be configurable which errors trigger fallback (default: 429, 503, 502)
  • Result attribution: The result JSONL should record target_used so users know which provider actually ran
  • Scope: Fallback applies per-request, not per-eval — different test cases in the same eval could hit different targets

Prior art

  • OpenRouter does provider fallback automatically across their model pool
  • Vercel AI SDK supports fallback() provider wrapper
  • LiteLLM has fallbacks config for provider failover

Metadata

Metadata

Assignees

Labels

in-progressClaimed by an agent — do not duplicate work

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions