Skip to content

Model list endpoint caching#17

Merged
ScriptSmith merged 3 commits intomainfrom
model-list-cache
Mar 20, 2026
Merged

Model list endpoint caching#17
ScriptSmith merged 3 commits intomainfrom
model-list-cache

Conversation

@ScriptSmith
Copy link
Owner

No description provided.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR introduces an in-memory cache for the /v1/models endpoint that serves model lists from static (config-file) providers. The cache is warmed at startup in AppState::new, refreshed periodically via a background task in server.rs, and is configurable via features.static_models_cache in TOML. All four issues flagged in the prior review round have been addressed — stale provider entries are pruned with cache.retain(), the refresh task uses task_tracker.spawn, log fields are now consistent, and live-fetch fallback errors are warned on.

Key issues found:

  • The background refresh task runs an infinite loop with no cancellation path. Since it is registered with task_tracker, the 30-second task_tracker.wait() timeout will always expire on every graceful shutdown rather than completing cleanly.
  • Successful live-fetch results for cache-miss providers are not written back to static_models_cache. Under concurrent load, multiple requests can pile up on the same uncached provider between background refresh cycles (thundering herd).
  • In the WASM build path, AppState is constructed directly without ever calling warm_static_models_cache(), so the cache is perpetually empty despite caching being enabled by default. Every WASM request falls through to live-fetch with the extra overhead of the empty cache check.

Confidence Score: 3/5

  • Functionally correct for the happy path, but the infinite-loop background task will reliably degrade graceful shutdowns to the full 30-second timeout on every deploy.
  • The core caching logic in app.rs is sound and all prior review feedback was addressed. However the infinite background loop registered with the task tracker is a consistent operational issue (every shutdown stalls), and the missing cache writeback creates a latency regression under failure conditions. The WASM gap is a correctness concern for that build target.
  • src/cli/server.rs (infinite refresh loop, shutdown impact) and src/routes/api/models.rs (no writeback of live-fetched results).

Important Files Changed

Filename Overview
src/app.rs Adds static_models_cache field to AppState and implements warm_static_models_cache(). The implementation correctly addresses all prior review concerns: cache.retain() removes stale entries for deleted providers, the write lock covers the full update, and total model count is computed across the whole cache. The startup warm in AppState::new is sound.
src/cli/server.rs Spawns the background refresh task via task_tracker.spawn (addressing the prior review comment). However, the task body is an infinite loop with no exit condition, which will cause every graceful shutdown to block until the 30-second task_tracker.wait() timeout expires.
src/routes/api/models.rs Refactors the models endpoint to serve from the in-memory cache for cache hits and live-fetch only on misses. Fallback warnings and the hit/miss logic are correct, but successful live-fetch results are not written back to the cache, creating a potential thundering-herd on cache-miss providers between background refresh cycles.
src/config/features.rs Adds StaticModelsCacheConfig with sensible defaults (300s refresh interval), enabled() and refresh_interval() helpers, and TOML doc examples. Clean and well-documented addition.
src/wasm.rs Adds static_models_cache to the WASM AppState construction with an empty default. Since warm_static_models_cache() is never called here and there is no background refresh task, the cache is always empty in WASM mode even though caching is enabled by default, causing all requests to fall through to live-fetch with extra overhead.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ModelsRoute as /v1/models handler
    participant Cache as static_models_cache (RwLock)
    participant Provider as Upstream Provider(s)
    participant BgTask as Background Refresh Task

    Note over BgTask: Spawned at server start (task_tracker)
    BgTask->>Provider: list_models_for_config (parallel)
    Provider-->>BgTask: ModelsResponse / Err
    BgTask->>Cache: write lock → retain + insert

    Client->>ModelsRoute: GET /v1/models
    ModelsRoute->>Cache: read lock → check each provider
    alt All hits
        Cache-->>ModelsRoute: cached ModelsResponse × N
    else Some misses
        Cache-->>ModelsRoute: hits (cached)
        ModelsRoute->>Provider: live-fetch miss providers (parallel)
        Provider-->>ModelsRoute: ModelsResponse / Err (warn on err)
        Note over ModelsRoute: ⚠️ results NOT written back to Cache
    end
    ModelsRoute-->>Client: CombinedModelsResponse (enriched)

    loop Every refresh_interval_secs (default 300s)
        BgTask->>Provider: warm_static_models_cache
        Provider-->>BgTask: updated responses
        BgTask->>Cache: write lock → retain + insert
    end
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/cli/server.rs
Line: 344-351

Comment:
**Infinite loop blocks graceful shutdown for full 30-second timeout**

The background refresh task runs an unconditional `loop { ... }` with no exit condition. The server's shutdown handler calls `task_tracker.close()` followed by `task_tracker.wait()` with a **30-second timeout** (`tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())`). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.

Every other `task_tracker.spawn` usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.

The fix is to drive the loop via a [`CancellationToken`](https://docs.rs/tokio-util/latest/tokio_util/sync/struct.CancellationToken.html) that is signalled during shutdown:

```rust
let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown
let state_ref = state.clone();
task_tracker.spawn(async move {
    let mut ticker = tokio::time::interval(interval);
    ticker.tick().await;
    loop {
        tokio::select! {
            _ = cancel.cancelled() => break,
            _ = ticker.tick() => {}
        }
        state_ref.warm_static_models_cache().await;
    }
});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/routes/api/models.rs
Line: 76-81

Comment:
**Successful live-fetch results not written back to cache**

When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to `hits` for this single request but is never stored in `static_models_cache`. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.

Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to `/v1/models` will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.

Consider writing successful live-fetch results back to the cache:

```rust
let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new();
for (name, result) in join_all(futures).await {
    match result {
        Ok(resp) => {
            live_fetched.push((name.clone(), resp.clone()));
            hits.push((name, resp));
        }
        Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
    }
}
// Write successful live-fetches back to the cache so future requests benefit
if !live_fetched.is_empty() {
    let mut cache = state.static_models_cache.write().await;
    for (name, resp) in live_fetched {
        cache.insert(name, resp);
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/wasm.rs
Line: 159

Comment:
**WASM path: cache always empty despite feature being enabled by default**

The WASM `HadrianGateway` constructs `AppState` directly (bypassing `AppState::new`), so `warm_static_models_cache()` is never called. Since `StaticModelsCacheConfig` defaults to `refresh_interval_secs = 300`, `enabled()` returns `true`. Every call to `api_v1_models` from WASM will enter the `cache_enabled` branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a `misses` list before falling through to live-fetch).

If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:

```rust
// Option A: disable for WASM in config defaults
static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 },

// Option B: warm the cache in the WASM constructor (if async is available)
if state.config.features.static_models_cache.enabled() {
    state.warm_static_models_cache().await;
}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "Review fixes"

@ScriptSmith
Copy link
Owner Author

@greptile-apps

Comment on lines +344 to +351
task_tracker.spawn(async move {
let mut ticker = tokio::time::interval(interval);
ticker.tick().await; // skip the immediate first tick (already warmed)
loop {
ticker.tick().await;
state_ref.warm_static_models_cache().await;
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Infinite loop blocks graceful shutdown for full 30-second timeout

The background refresh task runs an unconditional loop { ... } with no exit condition. The server's shutdown handler calls task_tracker.close() followed by task_tracker.wait() with a 30-second timeout (tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.

Every other task_tracker.spawn usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.

The fix is to drive the loop via a CancellationToken that is signalled during shutdown:

let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown
let state_ref = state.clone();
task_tracker.spawn(async move {
    let mut ticker = tokio::time::interval(interval);
    ticker.tick().await;
    loop {
        tokio::select! {
            _ = cancel.cancelled() => break,
            _ = ticker.tick() => {}
        }
        state_ref.warm_static_models_cache().await;
    }
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/server.rs
Line: 344-351

Comment:
**Infinite loop blocks graceful shutdown for full 30-second timeout**

The background refresh task runs an unconditional `loop { ... }` with no exit condition. The server's shutdown handler calls `task_tracker.close()` followed by `task_tracker.wait()` with a **30-second timeout** (`tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())`). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.

Every other `task_tracker.spawn` usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.

The fix is to drive the loop via a [`CancellationToken`](https://docs.rs/tokio-util/latest/tokio_util/sync/struct.CancellationToken.html) that is signalled during shutdown:

```rust
let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown
let state_ref = state.clone();
task_tracker.spawn(async move {
    let mut ticker = tokio::time::interval(interval);
    ticker.tick().await;
    loop {
        tokio::select! {
            _ = cancel.cancelled() => break,
            _ = ticker.tick() => {}
        }
        state_ref.warm_static_models_cache().await;
    }
});
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +76 to +81
for (name, result) in join_all(futures).await {
match result {
Ok(resp) => hits.push((name, resp)),
Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Successful live-fetch results not written back to cache

When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to hits for this single request but is never stored in static_models_cache. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.

Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to /v1/models will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.

Consider writing successful live-fetch results back to the cache:

let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new();
for (name, result) in join_all(futures).await {
    match result {
        Ok(resp) => {
            live_fetched.push((name.clone(), resp.clone()));
            hits.push((name, resp));
        }
        Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
    }
}
// Write successful live-fetches back to the cache so future requests benefit
if !live_fetched.is_empty() {
    let mut cache = state.static_models_cache.write().await;
    for (name, resp) in live_fetched {
        cache.insert(name, resp);
    }
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/api/models.rs
Line: 76-81

Comment:
**Successful live-fetch results not written back to cache**

When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to `hits` for this single request but is never stored in `static_models_cache`. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.

Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to `/v1/models` will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.

Consider writing successful live-fetch results back to the cache:

```rust
let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new();
for (name, result) in join_all(futures).await {
    match result {
        Ok(resp) => {
            live_fetched.push((name.clone(), resp.clone()));
            hits.push((name, resp));
        }
        Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
    }
}
// Write successful live-fetches back to the cache so future requests benefit
if !live_fetched.is_empty() {
    let mut cache = state.static_models_cache.write().await;
    for (name, resp) in live_fetched {
        cache.insert(name, resp);
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

default_org_id,
provider_metrics: Arc::new(services::ProviderMetricsService::new()),
model_catalog: catalog::ModelCatalogRegistry::new(),
static_models_cache: Arc::new(tokio::sync::RwLock::new(Default::default())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 WASM path: cache always empty despite feature being enabled by default

The WASM HadrianGateway constructs AppState directly (bypassing AppState::new), so warm_static_models_cache() is never called. Since StaticModelsCacheConfig defaults to refresh_interval_secs = 300, enabled() returns true. Every call to api_v1_models from WASM will enter the cache_enabled branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a misses list before falling through to live-fetch).

If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:

// Option A: disable for WASM in config defaults
static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 },

// Option B: warm the cache in the WASM constructor (if async is available)
if state.config.features.static_models_cache.enabled() {
    state.warm_static_models_cache().await;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/wasm.rs
Line: 159

Comment:
**WASM path: cache always empty despite feature being enabled by default**

The WASM `HadrianGateway` constructs `AppState` directly (bypassing `AppState::new`), so `warm_static_models_cache()` is never called. Since `StaticModelsCacheConfig` defaults to `refresh_interval_secs = 300`, `enabled()` returns `true`. Every call to `api_v1_models` from WASM will enter the `cache_enabled` branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a `misses` list before falling through to live-fetch).

If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:

```rust
// Option A: disable for WASM in config defaults
static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 },

// Option B: warm the cache in the WASM constructor (if async is available)
if state.config.features.static_models_cache.enabled() {
    state.warm_static_models_cache().await;
}
```

How can I resolve this? If you propose a fix, please make it concise.

@ScriptSmith ScriptSmith merged commit 35aaca0 into main Mar 20, 2026
18 of 19 checks passed
@ScriptSmith ScriptSmith deleted the model-list-cache branch March 20, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant