fix(providers): mark allowlist mode as authoritative inventory in runtime status#326
Conversation
…time status When CONFIGURED_PROVIDER_MODELS_MODE=allowlist applies a configured model list and intentionally skips the upstream `/models` call, the registry left LastModelFetchSuccessAt unset. The admin status classifier interprets that combination (Registered + DiscoveredModelCount>0 + no fetch error + no success timestamp) as "still serving cached inventory while live refresh finishes" and reports `status: degraded, label: Starting`. The result was a fully functional allowlist-mode provider — serving real traffic against AWS Bedrock during smoke-testing of #324 — appearing unhealthy on dashboards and to any health-keyed monitoring. Set lastModelFetchSuccessAt for the allowlist-applied case too, because in that mode the allowlist IS the authoritative inventory: there is no pending refresh to wait for. Upstream-failure fallbacks (configured_models_upstream_ error/nil/empty) still leave SuccessAt unset, so health correctly surfaces "live refresh failed, serving configured fallback" for that distinct scenario. Tests: - Existing TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUses ConfiguredModels updated: it now asserts LastModelFetchSuccessAt is set, UsingCachedModels is false, and DiscoveredModelCount reflects the allowlist. The previous nil assertion was codifying the bug. - New TestClassifyProviderStatus_HealthyForAllowlistInventory in internal/admin/ pins the end-to-end classifier outcome: an allowlist provider with one model and a SuccessAt timestamp is healthy, not degraded. - All other registry / admin tests pass under -race. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR updates provider model fetch success tracking to treat both configured and allowlist inventory sources as authoritative populated states. The core logic now sets ChangesProvider Model Fetch and Health Classification for Allowlist Mode
🎯 2 (Simple) | ⏱️ ~10 minutes Possibly Related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a false-degraded status for providers running in
Confidence Score: 5/5Safe to merge — the change is a one-line condition expansion that only widens when The fix is minimal and surgical: it touches exactly the one guard that was causing the misclassification, leaves every fallback/error branch untouched, and is backed by a direct unit test for the registry and an end-to-end classifier test. No API surface changes, no concurrency model changes, and no behavioral impact for operators running in the default No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[fetchProviderInventory] -->|allowlist mode + configured models| B[applyConfiguredProviderModels\nno upstream call]
A -->|any other mode| C[provider.ListModels]
B --> D{resp non-nil\nand non-empty?}
C --> E{err / nil / empty?}
D -->|yes, reason=allowlist| F[runtimeUpdate set]
D -->|no| G[failedProviders++\nor empty-list branch\nSuccessAt unset]
E -->|err != nil| G
E -->|nil resp| G
E -->|empty| G
E -->|success, reason=notApplied| F
E -->|fallback used, reason=upstreamError/Nil/Empty| H[runtimeUpdate set\nSuccessAt unset]
F --> I{configuredReason?}
I -->|notApplied or allowlist| J[lastModelFetchSuccessAt = fetchAt\nclassifier: healthy]
I -->|fallback reasons| K[lastModelFetchSuccessAt unset\nclassifier: degraded/Starting]
Reviews (1): Last reviewed commit: "fix(providers): mark allowlist mode as a..." | Re-trigger Greptile |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
When
CONFIGURED_PROVIDER_MODELS_MODE=allowlistapplies a configured model list and intentionally skips the upstream/modelscall, the registry leftlastModelFetchSuccessAtunset. The admin status classifier interprets that combination (Registered + DiscoveredModelCount > 0 + no fetch error + no success timestamp) as "still serving cached inventory while live refresh finishes" and reportsstatus: degraded, label: Starting. The result: a fully functional allowlist-mode provider — serving real traffic against AWS Bedrock during smoke-testing of #324 — appears unhealthy on dashboards and to any health-keyed monitoring.This PR sets
lastModelFetchSuccessAtfor the allowlist-applied case too, because in that mode the allowlist is the authoritative inventory: there is no pending refresh to wait for. Upstream-failure fallbacks (configured_models_upstream_error/nil/empty) still leave SuccessAt unset, so health correctly surfaces "live refresh failed, serving configured fallback" for that distinct scenario.Why this surfaces now
Smoke-testing the provider-naming PR (#324) with
BEDROCK_MODELS=us.amazon.nova-lite-v1:0andCONFIGURED_PROVIDER_MODELS_MODE=allowlist:/v1/modelscorrectly returnedbedrock/us.amazon.nova-lite-v1:0andbedrock-us/us.amazon.nova-lite-v1:0./admin/providers/statusreportedstatus: degraded, label: Startingfor both.The 0-model count was a separate misread on my part (jq lookup against the wrong field), but the
degradedstatus was real and reproducible.Why the existing test asserted the buggy behavior
TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModels(registry_test.go:212) had an explicitLastModelFetchSuccessAt != nil → failassertion. That assertion was codifying the original design choice ("SuccessAt strictly means upstream succeeded") rather than catching a regression. Updated to reflect the corrected semantics: when allowlist mode authoritatively populates the inventory, that is a successful fetch.Test plan
TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModelsnow asserts:LastModelFetchSuccessAt != nilDiscoveredModelCount > 0UsingCachedModels == falseTestClassifyProviderStatus_HealthyForAllowlistInventoryininternal/admin/pins the end-to-end classifier outcome: an allowlist provider with one model and a SuccessAt timestamp ishealthy, notdegraded.ConfiguredModelsFallback*) still pass — they assertSuccessAt == nilbecause that case represents a real upstream failure, which this PR does not change.make test-race,make lint,go mod tidy,mint validate— all green via pre-commit hooks../internal/providers/and./internal/admin/test suites pass under-race.Compat
No API changes. The
lastModelFetchSuccessAtfield onProviderRuntimeSnapshotwas already present; only its population condition expands. Operators usingCONFIGURED_PROVIDER_MODELS_MODE=allowlistwill see their dashboards and/admin/providers/statusresponses flip fromdegraded/Startingtohealthy/Healthyonce this lands. Operators in the defaultfallbackmode see no change.🤖 Generated with Claude Code
Summary by CodeRabbit