Skip to content

fix(providers): mark allowlist mode as authoritative inventory in runtime status#326

Merged
SantiagoDePolonia merged 1 commit into
mainfrom
fix/provider-status-allowlist-models
May 12, 2026
Merged

fix(providers): mark allowlist mode as authoritative inventory in runtime status#326
SantiagoDePolonia merged 1 commit into
mainfrom
fix/provider-status-allowlist-models

Conversation

@SantiagoDePolonia
Copy link
Copy Markdown
Contributor

@SantiagoDePolonia SantiagoDePolonia commented May 12, 2026

Summary

When CONFIGURED_PROVIDER_MODELS_MODE=allowlist applies a configured model list and intentionally skips the upstream /models call, the registry left lastModelFetchSuccessAt unset. The admin status classifier interprets that combination (Registered + DiscoveredModelCount > 0 + no fetch error + no success timestamp) as "still serving cached inventory while live refresh finishes" and reports status: degraded, label: Starting. The result: a fully functional allowlist-mode provider — serving real traffic against AWS Bedrock during smoke-testing of #324 — appears unhealthy on dashboards and to any health-keyed monitoring.

This PR sets lastModelFetchSuccessAt for the allowlist-applied case too, because in that mode the allowlist is the authoritative inventory: there is no pending refresh to wait for. Upstream-failure fallbacks (configured_models_upstream_error/nil/empty) still leave SuccessAt unset, so health correctly surfaces "live refresh failed, serving configured fallback" for that distinct scenario.

Why this surfaces now

Smoke-testing the provider-naming PR (#324) with BEDROCK_MODELS=us.amazon.nova-lite-v1:0 and CONFIGURED_PROVIDER_MODELS_MODE=allowlist:

  • /v1/models correctly returned bedrock/us.amazon.nova-lite-v1:0 and bedrock-us/us.amazon.nova-lite-v1:0.
  • Both providers served real Nova Lite chat completions through AWS.
  • /admin/providers/status reported status: degraded, label: Starting for both.

The 0-model count was a separate misread on my part (jq lookup against the wrong field), but the degraded status was real and reproducible.

Why the existing test asserted the buggy behavior

TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModels (registry_test.go:212) had an explicit LastModelFetchSuccessAt != nil → fail assertion. That assertion was codifying the original design choice ("SuccessAt strictly means upstream succeeded") rather than catching a regression. Updated to reflect the corrected semantics: when allowlist mode authoritatively populates the inventory, that is a successful fetch.

Test plan

  • Updated TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModels now asserts:
    • LastModelFetchSuccessAt != nil
    • DiscoveredModelCount > 0
    • UsingCachedModels == false
  • New TestClassifyProviderStatus_HealthyForAllowlistInventory in internal/admin/ pins the end-to-end classifier outcome: an allowlist provider with one model and a SuccessAt timestamp is healthy, not degraded.
  • Existing fallback-mode tests (ConfiguredModelsFallback*) still pass — they assert SuccessAt == nil because that case represents a real upstream failure, which this PR does not change.
  • make test-race, make lint, go mod tidy, mint validate — all green via pre-commit hooks.
  • Full ./internal/providers/ and ./internal/admin/ test suites pass under -race.

Compat

No API changes. The lastModelFetchSuccessAt field on ProviderRuntimeSnapshot was already present; only its population condition expands. Operators using CONFIGURED_PROVIDER_MODELS_MODE=allowlist will see their dashboards and /admin/providers/status responses flip from degraded/Starting to healthy/Healthy once this lands. Operators in the default fallback mode see no change.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Fixed provider inventory fetch success tracking to properly mark allowlist mode inventories as successfully populated when configured models are applied.

Review Change Stack

…time status

When CONFIGURED_PROVIDER_MODELS_MODE=allowlist applies a configured model list
and intentionally skips the upstream `/models` call, the registry left
LastModelFetchSuccessAt unset. The admin status classifier interprets that
combination (Registered + DiscoveredModelCount>0 + no fetch error + no success
timestamp) as "still serving cached inventory while live refresh finishes" and
reports `status: degraded, label: Starting`. The result was a fully functional
allowlist-mode provider — serving real traffic against AWS Bedrock during
smoke-testing of #324 — appearing unhealthy on dashboards and to any
health-keyed monitoring.

Set lastModelFetchSuccessAt for the allowlist-applied case too, because in
that mode the allowlist IS the authoritative inventory: there is no pending
refresh to wait for. Upstream-failure fallbacks (configured_models_upstream_
error/nil/empty) still leave SuccessAt unset, so health correctly surfaces
"live refresh failed, serving configured fallback" for that distinct scenario.

Tests:
- Existing TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUses
  ConfiguredModels updated: it now asserts LastModelFetchSuccessAt is set,
  UsingCachedModels is false, and DiscoveredModelCount reflects the allowlist.
  The previous nil assertion was codifying the bug.
- New TestClassifyProviderStatus_HealthyForAllowlistInventory in
  internal/admin/ pins the end-to-end classifier outcome: an allowlist
  provider with one model and a SuccessAt timestamp is healthy, not degraded.
- All other registry / admin tests pass under -race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 899c9332-c655-4035-bd48-920bccd3d5de

📥 Commits

Reviewing files that changed from the base of the PR and between d3d1c10 and 21116a1.

📒 Files selected for processing (3)
  • internal/admin/handler_providers_test.go
  • internal/providers/registry_init.go
  • internal/providers/registry_test.go

📝 Walkthrough

Walkthrough

The PR updates provider model fetch success tracking to treat both configured and allowlist inventory sources as authoritative populated states. The core logic now sets lastModelFetchSuccessAt for configured allowlist models, tests are updated to expect this timestamp, and a new test validates correct health classification when the timestamp is present.

Changes

Provider Model Fetch and Health Classification for Allowlist Mode

Layer / File(s) Summary
Core model fetch success tracking for allowlist mode
internal/providers/registry_init.go
fetchAllProviderModels now sets lastModelFetchSuccessAt when configuredReason is either configuredProviderModelsNotApplied or configuredProviderModelsAllowlist, with expanded comments explaining when fallback outcomes leave the timestamp unset.
Registry test expectations for allowlist success tracking
internal/providers/registry_test.go
TestConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModels now expects LastModelFetchSuccessAt non-nil, DiscoveredModelCount non-zero, and UsingCachedModels false; comment for TestApplyProviderRuntimeUpdates_ClearsStaleErrorOnSuccessfulRefresh clarifies stale-error survival constraints for refresh paths.
Health classification test for allowlist inventory
internal/admin/handler_providers_test.go
New test TestClassifyProviderStatus_HealthyForAllowlistInventory validates that classifyProviderStatus returns "healthy" status and "Healthy" label when a provider runtime snapshot has LastModelFetchSuccessAt set.

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly Related PRs

  • ENTERPILOT/GoModel#266: Updates registry to treat "configuredProviderModelsAllowlist" as an authoritative successful fetch by setting LastModelFetchSuccessAt, directly related to this PR's changes on the same provider model-fetching and allowlist code path.

🐰 The models now speak their truth with grace,
Allowlist and configured, both hold their place,
No more confusion when upstream sleeps,
The health status now its promises keeps!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: marking allowlist mode inventory as authoritative in runtime status.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/provider-status-allowlist-models

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR fixes a false-degraded status for providers running in CONFIGURED_PROVIDER_MODELS_MODE=allowlist. Because allowlist mode intentionally skips the upstream /models call, lastModelFetchSuccessAt was never set; the admin classifier then read that combination as "still loading cached inventory" and reported status=degraded / label=Starting.

  • registry_init.go: Extends the lastModelFetchSuccessAt population guard to include configuredProviderModelsAllowlist alongside the existing configuredProviderModelsNotApplied case; fallback/error-originated reasons remain unset, preserving the "live refresh failed" signal.
  • registry_test.go: Inverts the allowlist-mode assertion (SuccessAt != nil) and adds DiscoveredModelCount > 0 / UsingCachedModels == false checks; fallback tests are untouched.
  • handler_providers_test.go: New integration-level test pins the end-to-end classifier output for an allowlist snapshot to healthy/Healthy.

Confidence Score: 5/5

Safe to merge — the change is a one-line condition expansion that only widens when lastModelFetchSuccessAt is populated; all existing fallback/error paths are explicitly preserved.

The fix is minimal and surgical: it touches exactly the one guard that was causing the misclassification, leaves every fallback/error branch untouched, and is backed by a direct unit test for the registry and an end-to-end classifier test. No API surface changes, no concurrency model changes, and no behavioral impact for operators running in the default fallback mode.

No files require special attention.

Important Files Changed

Filename Overview
internal/providers/registry_init.go Extends lastModelFetchSuccessAt population to cover configuredProviderModelsAllowlist; fallback/error cases are correctly left unset. Change is minimal, well-scoped, and safe.
internal/providers/registry_test.go Flips the allowlist-mode assertion from SuccessAt == nil to SuccessAt != nil and adds DiscoveredModelCount > 0 / UsingCachedModels == false checks; existing fallback tests remain unchanged and still assert SuccessAt == nil.
internal/admin/handler_providers_test.go New end-to-end test pins classifyProviderStatus output for an allowlist snapshot; the unused RegistryInitialized field in the snapshot is harmless since the classifier does not read it.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[fetchProviderInventory] -->|allowlist mode + configured models| B[applyConfiguredProviderModels\nno upstream call]
    A -->|any other mode| C[provider.ListModels]
    B --> D{resp non-nil\nand non-empty?}
    C --> E{err / nil / empty?}
    D -->|yes, reason=allowlist| F[runtimeUpdate set]
    D -->|no| G[failedProviders++\nor empty-list branch\nSuccessAt unset]
    E -->|err != nil| G
    E -->|nil resp| G
    E -->|empty| G
    E -->|success, reason=notApplied| F
    E -->|fallback used, reason=upstreamError/Nil/Empty| H[runtimeUpdate set\nSuccessAt unset]
    F --> I{configuredReason?}
    I -->|notApplied or allowlist| J[lastModelFetchSuccessAt = fetchAt\nclassifier: healthy]
    I -->|fallback reasons| K[lastModelFetchSuccessAt unset\nclassifier: degraded/Starting]
Loading

Reviews (1): Last reviewed commit: "fix(providers): mark allowlist mode as a..." | Re-trigger Greptile

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@SantiagoDePolonia SantiagoDePolonia merged commit 2f13f68 into main May 12, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants