Skip to content

fix(llm-gateway): refresh provider sets when model_cost reloads#60634

Merged
richardsolomou merged 3 commits into
masterfrom
posthog-code/fix-litellm-provider-sets-on-refresh
May 29, 2026
Merged

fix(llm-gateway): refresh provider sets when model_cost reloads#60634
richardsolomou merged 3 commits into
masterfrom
posthog-code/fix-litellm-provider-sets-on-refresh

Conversation

@richardsolomou
Copy link
Copy Markdown
Member

Problem

LiteLLM's get_llm_provider resolves bare model names against module-level sets (anthropic_models, open_ai_chat_completion_models, etc.) populated only at import time. Refreshing litellm.model_cost doesn't touch them, so models added upstream after process start (e.g. claude-opus-4-8 added 7h ago) raise LLM Provider NOT provided on the gateway until a redeploy.

Changes

Call litellm.add_known_models(model_cost) after every litellm.model_cost = … reassignment in CostRefreshService.refresh and ModelCostService._refresh_cache. This is what LiteLLM itself does at import.

How did you test this code?

Agent-authored. Ran uv run pytest tests/test_cost_refresh.py and the broader test_model_registry.py/test_models_api.py/callbacks/test_rate_limiting.py suite (80 tests) — all green. New regression test asserts a freshly fetched model lands in both litellm.anthropic_models and litellm.open_ai_chat_completion_models.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Docs update

N/A

🤖 Agent context

Authored by PostHog Code. Diagnosis path: traced the gateway's 400 from litellm.anthropic_messages for claude-opus-4-8 to get_llm_provider_logic.py:395, which checks litellm.anthropic_models rather than litellm.model_cost. Confirmed the v1.83.7-stable bundled model_prices_and_context_window_backup.json lacks both claude-opus-4-7 and claude-opus-4-8 (the live import-time fetch is what seeds known-good models today), and that neither refresh path called add_known_models after reassignment.

Considered prefixing anthropic/ on the model in _handle_anthropic_messages (mirroring the bedrock/ prefix already used). Rejected: only patches the Anthropic route; the same staleness affects any provider whose models land in the upstream JSON after startup. Fixing at the refresh boundary covers every provider.

Aside for a future PR: CostRefreshService and ModelCostService are two singletons racing on the same litellm.model_cost global — worth consolidating, out of scope here.


Created with PostHog Code

litellm.get_llm_provider resolves bare model names against provider sets
(anthropic_models, open_ai_chat_completion_models, …), which are populated
only at import time. Refreshing litellm.model_cost without re-running
add_known_models leaves those sets stale, so models added upstream after
process start (e.g. claude-opus-4-8) raise "LLM Provider NOT provided".

Generated-By: PostHog Code
Task-Id: 35ad8683-ced4-4a98-9721-fa6606b6dca0
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team May 29, 2026 09:49
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
services/llm-gateway/tests/test_cost_refresh.py:46-73
**Missing test coverage for `ModelCostService._refresh_cache`**

The regression test verifies `add_known_models` is called via `CostRefreshService.refresh()`, but the same change was made to `ModelCostService._refresh_cache()` with no corresponding test. If `ModelCostService` follows a different code path (e.g., its `_refresh_cache` silently swallows an error before or after `add_known_models`), there is no test to catch the regression.

Reviews (1): Last reviewed commit: "fix(llm-gateway): refresh provider sets ..." | Re-trigger Greptile

Comment thread services/llm-gateway/tests/test_cost_refresh.py Outdated
Comment thread services/llm-gateway/tests/test_cost_refresh.py Outdated
Address PR review:
- Add TestModelCostServiceRefresh covering the second refresh path.
- Snapshot/restore litellm.model_cost and provider sets in an autouse
  fixture so tests don't leak state into siblings.

Generated-By: PostHog Code
Task-Id: 35ad8683-ced4-4a98-9721-fa6606b6dca0
@richardsolomou richardsolomou enabled auto-merge (squash) May 29, 2026 09:56
Previous regression test asserted on the post-state of litellm's provider
sets, which is sensitive to test ordering in CI (other tests had already
populated the same global). Mock add_known_models and assert it's invoked
with the freshly fetched map — that's the contract our change makes.

Generated-By: PostHog Code
Task-Id: 35ad8683-ced4-4a98-9721-fa6606b6dca0
@richardsolomou richardsolomou merged commit f6ef9a3 into master May 29, 2026
144 checks passed
@richardsolomou richardsolomou deleted the posthog-code/fix-litellm-provider-sets-on-refresh branch May 29, 2026 10:26
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 29, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-29 10:52 UTC Run
prod-us ✅ Deployed 2026-05-29 11:15 UTC Run
prod-eu ✅ Deployed 2026-05-29 11:18 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants