fix(providers): mark allowlist mode as authoritative inventory in runtime status by SantiagoDePolonia · Pull Request #326 · ENTERPILOT/GoModel

SantiagoDePolonia · 2026-05-12T14:39:47Z

Summary

When CONFIGURED_PROVIDER_MODELS_MODE=allowlist applies a configured model list and intentionally skips the upstream /models call, the registry left lastModelFetchSuccessAt unset. The admin status classifier interprets that combination (Registered + DiscoveredModelCount > 0 + no fetch error + no success timestamp) as "still serving cached inventory while live refresh finishes" and reports status: degraded, label: Starting. The result: a fully functional allowlist-mode provider — serving real traffic against AWS Bedrock during smoke-testing of #324 — appears unhealthy on dashboards and to any health-keyed monitoring.

This PR sets lastModelFetchSuccessAt for the allowlist-applied case too, because in that mode the allowlist is the authoritative inventory: there is no pending refresh to wait for. Upstream-failure fallbacks (configured_models_upstream_error/nil/empty) still leave SuccessAt unset, so health correctly surfaces "live refresh failed, serving configured fallback" for that distinct scenario.

Why this surfaces now

Smoke-testing the provider-naming PR (#324) with BEDROCK_MODELS=us.amazon.nova-lite-v1:0 and CONFIGURED_PROVIDER_MODELS_MODE=allowlist:

/v1/models correctly returned bedrock/us.amazon.nova-lite-v1:0 and bedrock-us/us.amazon.nova-lite-v1:0.
Both providers served real Nova Lite chat completions through AWS.
/admin/providers/status reported status: degraded, label: Starting for both.

The 0-model count was a separate misread on my part (jq lookup against the wrong field), but the degraded status was real and reproducible.

Why the existing test asserted the buggy behavior

TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModels (registry_test.go:212) had an explicit LastModelFetchSuccessAt != nil → fail assertion. That assertion was codifying the original design choice ("SuccessAt strictly means upstream succeeded") rather than catching a regression. Updated to reflect the corrected semantics: when allowlist mode authoritatively populates the inventory, that is a successful fetch.

Test plan

Updated TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModels now asserts:
- LastModelFetchSuccessAt != nil
- DiscoveredModelCount > 0
- UsingCachedModels == false
New TestClassifyProviderStatus_HealthyForAllowlistInventory in internal/admin/ pins the end-to-end classifier outcome: an allowlist provider with one model and a SuccessAt timestamp is healthy, not degraded.
Existing fallback-mode tests (ConfiguredModelsFallback*) still pass — they assert SuccessAt == nil because that case represents a real upstream failure, which this PR does not change.
make test-race, make lint, go mod tidy, mint validate — all green via pre-commit hooks.
Full ./internal/providers/ and ./internal/admin/ test suites pass under -race.

Compat

No API changes. The lastModelFetchSuccessAt field on ProviderRuntimeSnapshot was already present; only its population condition expands. Operators using CONFIGURED_PROVIDER_MODELS_MODE=allowlist will see their dashboards and /admin/providers/status responses flip from degraded/Starting to healthy/Healthy once this lands. Operators in the default fallback mode see no change.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Fixed provider inventory fetch success tracking to properly mark allowlist mode inventories as successfully populated when configured models are applied.

…time status When CONFIGURED_PROVIDER_MODELS_MODE=allowlist applies a configured model list and intentionally skips the upstream `/models` call, the registry left LastModelFetchSuccessAt unset. The admin status classifier interprets that combination (Registered + DiscoveredModelCount>0 + no fetch error + no success timestamp) as "still serving cached inventory while live refresh finishes" and reports `status: degraded, label: Starting`. The result was a fully functional allowlist-mode provider — serving real traffic against AWS Bedrock during smoke-testing of #324 — appearing unhealthy on dashboards and to any health-keyed monitoring. Set lastModelFetchSuccessAt for the allowlist-applied case too, because in that mode the allowlist IS the authoritative inventory: there is no pending refresh to wait for. Upstream-failure fallbacks (configured_models_upstream_ error/nil/empty) still leave SuccessAt unset, so health correctly surfaces "live refresh failed, serving configured fallback" for that distinct scenario. Tests: - Existing TestModelRegistry/ConfiguredModelsAllowlistModeSkipsUpstreamAndUses ConfiguredModels updated: it now asserts LastModelFetchSuccessAt is set, UsingCachedModels is false, and DiscoveredModelCount reflects the allowlist. The previous nil assertion was codifying the bug. - New TestClassifyProviderStatus_HealthyForAllowlistInventory in internal/admin/ pins the end-to-end classifier outcome: an allowlist provider with one model and a SuccessAt timestamp is healthy, not degraded. - All other registry / admin tests pass under -race. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-12T14:40:01Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 899c9332-c655-4035-bd48-920bccd3d5de

📥 Commits

Reviewing files that changed from the base of the PR and between d3d1c10 and 21116a1.

📒 Files selected for processing (3)

internal/admin/handler_providers_test.go
internal/providers/registry_init.go
internal/providers/registry_test.go

📝 Walkthrough

Walkthrough

The PR updates provider model fetch success tracking to treat both configured and allowlist inventory sources as authoritative populated states. The core logic now sets lastModelFetchSuccessAt for configured allowlist models, tests are updated to expect this timestamp, and a new test validates correct health classification when the timestamp is present.

Changes

Provider Model Fetch and Health Classification for Allowlist Mode

Layer / File(s)	Summary
Core model fetch success tracking for allowlist mode `internal/providers/registry_init.go`	`fetchAllProviderModels` now sets `lastModelFetchSuccessAt` when `configuredReason` is either `configuredProviderModelsNotApplied` or `configuredProviderModelsAllowlist`, with expanded comments explaining when fallback outcomes leave the timestamp unset.
Registry test expectations for allowlist success tracking `internal/providers/registry_test.go`	`TestConfiguredModelsAllowlistModeSkipsUpstreamAndUsesConfiguredModels` now expects `LastModelFetchSuccessAt` non-nil, `DiscoveredModelCount` non-zero, and `UsingCachedModels` false; comment for `TestApplyProviderRuntimeUpdates_ClearsStaleErrorOnSuccessfulRefresh` clarifies stale-error survival constraints for refresh paths.
Health classification test for allowlist inventory `internal/admin/handler_providers_test.go`	New test `TestClassifyProviderStatus_HealthyForAllowlistInventory` validates that `classifyProviderStatus` returns `"healthy"` status and `"Healthy"` label when a provider runtime snapshot has `LastModelFetchSuccessAt` set.

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly Related PRs

ENTERPILOT/GoModel#266: Updates registry to treat "configuredProviderModelsAllowlist" as an authoritative successful fetch by setting LastModelFetchSuccessAt, directly related to this PR's changes on the same provider model-fetching and allowlist code path.

🐰 The models now speak their truth with grace,
Allowlist and configured, both hold their place,
No more confusion when upstream sleeps,
The health status now its promises keeps!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: marking allowlist mode inventory as authoritative in runtime status.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/provider-status-allowlist-models

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-05-12T14:41:35Z

Greptile Summary

This PR fixes a false-degraded status for providers running in CONFIGURED_PROVIDER_MODELS_MODE=allowlist. Because allowlist mode intentionally skips the upstream /models call, lastModelFetchSuccessAt was never set; the admin classifier then read that combination as "still loading cached inventory" and reported status=degraded / label=Starting.

registry_init.go: Extends the lastModelFetchSuccessAt population guard to include configuredProviderModelsAllowlist alongside the existing configuredProviderModelsNotApplied case; fallback/error-originated reasons remain unset, preserving the "live refresh failed" signal.
registry_test.go: Inverts the allowlist-mode assertion (SuccessAt != nil) and adds DiscoveredModelCount > 0 / UsingCachedModels == false checks; fallback tests are untouched.
handler_providers_test.go: New integration-level test pins the end-to-end classifier output for an allowlist snapshot to healthy/Healthy.

Confidence Score: 5/5

Safe to merge — the change is a one-line condition expansion that only widens when lastModelFetchSuccessAt is populated; all existing fallback/error paths are explicitly preserved.

The fix is minimal and surgical: it touches exactly the one guard that was causing the misclassification, leaves every fallback/error branch untouched, and is backed by a direct unit test for the registry and an end-to-end classifier test. No API surface changes, no concurrency model changes, and no behavioral impact for operators running in the default fallback mode.

No files require special attention.

Important Files Changed

Filename	Overview
internal/providers/registry_init.go	Extends `lastModelFetchSuccessAt` population to cover `configuredProviderModelsAllowlist`; fallback/error cases are correctly left unset. Change is minimal, well-scoped, and safe.
internal/providers/registry_test.go	Flips the allowlist-mode assertion from `SuccessAt == nil` to `SuccessAt != nil` and adds `DiscoveredModelCount > 0` / `UsingCachedModels == false` checks; existing fallback tests remain unchanged and still assert `SuccessAt == nil`.
internal/admin/handler_providers_test.go	New end-to-end test pins `classifyProviderStatus` output for an allowlist snapshot; the unused `RegistryInitialized` field in the snapshot is harmless since the classifier does not read it.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[fetchProviderInventory] -->|allowlist mode + configured models| B[applyConfiguredProviderModels\nno upstream call]
    A -->|any other mode| C[provider.ListModels]
    B --> D{resp non-nil\nand non-empty?}
    C --> E{err / nil / empty?}
    D -->|yes, reason=allowlist| F[runtimeUpdate set]
    D -->|no| G[failedProviders++\nor empty-list branch\nSuccessAt unset]
    E -->|err != nil| G
    E -->|nil resp| G
    E -->|empty| G
    E -->|success, reason=notApplied| F
    E -->|fallback used, reason=upstreamError/Nil/Empty| H[runtimeUpdate set\nSuccessAt unset]
    F --> I{configuredReason?}
    I -->|notApplied or allowlist| J[lastModelFetchSuccessAt = fetchAt\nclassifier: healthy]
    I -->|fallback reasons| K[lastModelFetchSuccessAt unset\nclassifier: degraded/Starting]

_{Reviews (1): Last reviewed commit: "fix(providers): mark allowlist mode as a..." | Re-trigger Greptile}

codecov-commenter · 2026-05-12T14:43:34Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

SantiagoDePolonia merged commit 2f13f68 into main May 12, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(providers): mark allowlist mode as authoritative inventory in runtime status#326

fix(providers): mark allowlist mode as authoritative inventory in runtime status#326
SantiagoDePolonia merged 1 commit into
mainfrom
fix/provider-status-allowlist-models

SantiagoDePolonia commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Possibly Related PRs

Uh oh!

greptile-apps Bot commented May 12, 2026

Uh oh!

codecov-commenter commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

SantiagoDePolonia commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this surfaces now

Why the existing test asserted the buggy behavior

Test plan

Compat

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly Related PRs

Uh oh!

greptile-apps Bot commented May 12, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

codecov-commenter commented May 12, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SantiagoDePolonia commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading