Model list endpoint caching by ScriptSmith · Pull Request #17 · ScriptSmith/hadrian

ScriptSmith · 2026-03-20T12:01:13Z

No description provided.

greptile-apps · 2026-03-20T12:04:36Z

Greptile Summary

This PR introduces an in-memory cache for the /v1/models endpoint that serves model lists from static (config-file) providers. The cache is warmed at startup in AppState::new, refreshed periodically via a background task in server.rs, and is configurable via features.static_models_cache in TOML. All four issues flagged in the prior review round have been addressed — stale provider entries are pruned with cache.retain(), the refresh task uses task_tracker.spawn, log fields are now consistent, and live-fetch fallback errors are warned on.

Key issues found:

The background refresh task runs an infinite loop with no cancellation path. Since it is registered with task_tracker, the 30-second task_tracker.wait() timeout will always expire on every graceful shutdown rather than completing cleanly.
Successful live-fetch results for cache-miss providers are not written back to static_models_cache. Under concurrent load, multiple requests can pile up on the same uncached provider between background refresh cycles (thundering herd).
In the WASM build path, AppState is constructed directly without ever calling warm_static_models_cache(), so the cache is perpetually empty despite caching being enabled by default. Every WASM request falls through to live-fetch with the extra overhead of the empty cache check.

Confidence Score: 3/5

Functionally correct for the happy path, but the infinite-loop background task will reliably degrade graceful shutdowns to the full 30-second timeout on every deploy.
The core caching logic in app.rs is sound and all prior review feedback was addressed. However the infinite background loop registered with the task tracker is a consistent operational issue (every shutdown stalls), and the missing cache writeback creates a latency regression under failure conditions. The WASM gap is a correctness concern for that build target.
src/cli/server.rs (infinite refresh loop, shutdown impact) and src/routes/api/models.rs (no writeback of live-fetched results).

Important Files Changed

Filename	Overview
src/app.rs	Adds `static_models_cache` field to `AppState` and implements `warm_static_models_cache()`. The implementation correctly addresses all prior review concerns: `cache.retain()` removes stale entries for deleted providers, the write lock covers the full update, and total model count is computed across the whole cache. The startup warm in `AppState::new` is sound.
src/cli/server.rs	Spawns the background refresh task via `task_tracker.spawn` (addressing the prior review comment). However, the task body is an infinite loop with no exit condition, which will cause every graceful shutdown to block until the 30-second `task_tracker.wait()` timeout expires.
src/routes/api/models.rs	Refactors the models endpoint to serve from the in-memory cache for cache hits and live-fetch only on misses. Fallback warnings and the hit/miss logic are correct, but successful live-fetch results are not written back to the cache, creating a potential thundering-herd on cache-miss providers between background refresh cycles.
src/config/features.rs	Adds `StaticModelsCacheConfig` with sensible defaults (300s refresh interval), `enabled()` and `refresh_interval()` helpers, and TOML doc examples. Clean and well-documented addition.
src/wasm.rs	Adds `static_models_cache` to the WASM `AppState` construction with an empty default. Since `warm_static_models_cache()` is never called here and there is no background refresh task, the cache is always empty in WASM mode even though caching is enabled by default, causing all requests to fall through to live-fetch with extra overhead.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ModelsRoute as /v1/models handler
    participant Cache as static_models_cache (RwLock)
    participant Provider as Upstream Provider(s)
    participant BgTask as Background Refresh Task

    Note over BgTask: Spawned at server start (task_tracker)
    BgTask->>Provider: list_models_for_config (parallel)
    Provider-->>BgTask: ModelsResponse / Err
    BgTask->>Cache: write lock → retain + insert

    Client->>ModelsRoute: GET /v1/models
    ModelsRoute->>Cache: read lock → check each provider
    alt All hits
        Cache-->>ModelsRoute: cached ModelsResponse × N
    else Some misses
        Cache-->>ModelsRoute: hits (cached)
        ModelsRoute->>Provider: live-fetch miss providers (parallel)
        Provider-->>ModelsRoute: ModelsResponse / Err (warn on err)
        Note over ModelsRoute: ⚠️ results NOT written back to Cache
    end
    ModelsRoute-->>Client: CombinedModelsResponse (enriched)

    loop Every refresh_interval_secs (default 300s)
        BgTask->>Provider: warm_static_models_cache
        Provider-->>BgTask: updated responses
        BgTask->>Cache: write lock → retain + insert
    end

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/cli/server.rs
Line: 344-351

Comment:
**Infinite loop blocks graceful shutdown for full 30-second timeout**

The background refresh task runs an unconditional `loop { ... }` with no exit condition. The server's shutdown handler calls `task_tracker.close()` followed by `task_tracker.wait()` with a **30-second timeout** (`tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())`). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.

Every other `task_tracker.spawn` usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.

The fix is to drive the loop via a [`CancellationToken`](https://docs.rs/tokio-util/latest/tokio_util/sync/struct.CancellationToken.html) that is signalled during shutdown:

```rust
let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown
let state_ref = state.clone();
task_tracker.spawn(async move {
    let mut ticker = tokio::time::interval(interval);
    ticker.tick().await;
    loop {
        tokio::select! {
            _ = cancel.cancelled() => break,
            _ = ticker.tick() => {}
        }
        state_ref.warm_static_models_cache().await;
    }
});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/routes/api/models.rs
Line: 76-81

Comment:
**Successful live-fetch results not written back to cache**

When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to `hits` for this single request but is never stored in `static_models_cache`. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.

Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to `/v1/models` will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.

Consider writing successful live-fetch results back to the cache:

```rust
let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new();
for (name, result) in join_all(futures).await {
    match result {
        Ok(resp) => {
            live_fetched.push((name.clone(), resp.clone()));
            hits.push((name, resp));
        }
        Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
    }
}
// Write successful live-fetches back to the cache so future requests benefit
if !live_fetched.is_empty() {
    let mut cache = state.static_models_cache.write().await;
    for (name, resp) in live_fetched {
        cache.insert(name, resp);
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/wasm.rs
Line: 159

Comment:
**WASM path: cache always empty despite feature being enabled by default**

The WASM `HadrianGateway` constructs `AppState` directly (bypassing `AppState::new`), so `warm_static_models_cache()` is never called. Since `StaticModelsCacheConfig` defaults to `refresh_interval_secs = 300`, `enabled()` returns `true`. Every call to `api_v1_models` from WASM will enter the `cache_enabled` branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a `misses` list before falling through to live-fetch).

If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:

```rust
// Option A: disable for WASM in config defaults
static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 },

// Option B: warm the cache in the WASM constructor (if async is available)
if state.config.features.static_models_cache.enabled() {
    state.warm_static_models_cache().await;
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "Review fixes"}

src/app.rs

src/cli/server.rs

src/app.rs

src/routes/api/models.rs

ScriptSmith · 2026-03-20T12:20:46Z

@greptile-apps

greptile-apps · 2026-03-20T12:25:41Z

src/cli/server.rs

+        task_tracker.spawn(async move {
+            let mut ticker = tokio::time::interval(interval);
+            ticker.tick().await; // skip the immediate first tick (already warmed)
+            loop {
+                ticker.tick().await;
+                state_ref.warm_static_models_cache().await;
+            }
+        });


Infinite loop blocks graceful shutdown for full 30-second timeout

The background refresh task runs an unconditional loop { ... } with no exit condition. The server's shutdown handler calls task_tracker.close() followed by task_tracker.wait() with a 30-second timeout (tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.

Every other task_tracker.spawn usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.

The fix is to drive the loop via a CancellationToken that is signalled during shutdown:

let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown let state_ref = state.clone(); task_tracker.spawn(async move { let mut ticker = tokio::time::interval(interval); ticker.tick().await; loop { tokio::select! { _ = cancel.cancelled() => break, _ = ticker.tick() => {} } state_ref.warm_static_models_cache().await; } });

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cli/server.rs Line: 344-351 Comment: **Infinite loop blocks graceful shutdown for full 30-second timeout** The background refresh task runs an unconditional `loop { ... }` with no exit condition. The server's shutdown handler calls `task_tracker.close()` followed by `task_tracker.wait()` with a **30-second timeout** (`tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())`). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment. Every other `task_tracker.spawn` usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce. The fix is to drive the loop via a [`CancellationToken`](https://docs.rs/tokio-util/latest/tokio_util/sync/struct.CancellationToken.html) that is signalled during shutdown: ```rust let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown let state_ref = state.clone(); task_tracker.spawn(async move { let mut ticker = tokio::time::interval(interval); ticker.tick().await; loop { tokio::select! { _ = cancel.cancelled() => break, _ = ticker.tick() => {} } state_ref.warm_static_models_cache().await; } }); ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-20T12:25:42Z

src/routes/api/models.rs

+        for (name, result) in join_all(futures).await {
+            match result {
+                Ok(resp) => hits.push((name, resp)),
+                Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
+            }
+        }


Successful live-fetch results not written back to cache

When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to hits for this single request but is never stored in static_models_cache. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.

Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to /v1/models will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.

Consider writing successful live-fetch results back to the cache:

let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new(); for (name, result) in join_all(futures).await { match result { Ok(resp) => { live_fetched.push((name.clone(), resp.clone())); hits.push((name, resp)); } Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"), } } // Write successful live-fetches back to the cache so future requests benefit if !live_fetched.is_empty() { let mut cache = state.static_models_cache.write().await; for (name, resp) in live_fetched { cache.insert(name, resp); } }

Prompt To Fix With AI

This is a comment left during a code review. Path: src/routes/api/models.rs Line: 76-81 Comment: **Successful live-fetch results not written back to cache** When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to `hits` for this single request but is never stored in `static_models_cache`. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry. Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to `/v1/models` will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide. Consider writing successful live-fetch results back to the cache: ```rust let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new(); for (name, result) in join_all(futures).await { match result { Ok(resp) => { live_fetched.push((name.clone(), resp.clone())); hits.push((name, resp)); } Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"), } } // Write successful live-fetches back to the cache so future requests benefit if !live_fetched.is_empty() { let mut cache = state.static_models_cache.write().await; for (name, resp) in live_fetched { cache.insert(name, resp); } } ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-20T12:25:43Z

src/wasm.rs

            default_org_id,
            provider_metrics: Arc::new(services::ProviderMetricsService::new()),
            model_catalog: catalog::ModelCatalogRegistry::new(),
+            static_models_cache: Arc::new(tokio::sync::RwLock::new(Default::default())),


WASM path: cache always empty despite feature being enabled by default

The WASM HadrianGateway constructs AppState directly (bypassing AppState::new), so warm_static_models_cache() is never called. Since StaticModelsCacheConfig defaults to refresh_interval_secs = 300, enabled() returns true. Every call to api_v1_models from WASM will enter the cache_enabled branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a misses list before falling through to live-fetch).

If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:

// Option A: disable for WASM in config defaults static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 }, // Option B: warm the cache in the WASM constructor (if async is available) if state.config.features.static_models_cache.enabled() { state.warm_static_models_cache().await; }

Prompt To Fix With AI

This is a comment left during a code review. Path: src/wasm.rs Line: 159 Comment: **WASM path: cache always empty despite feature being enabled by default** The WASM `HadrianGateway` constructs `AppState` directly (bypassing `AppState::new`), so `warm_static_models_cache()` is never called. Since `StaticModelsCacheConfig` defaults to `refresh_interval_secs = 300`, `enabled()` returns `true`. Every call to `api_v1_models` from WASM will enter the `cache_enabled` branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a `misses` list before falling through to live-fetch). If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer: ```rust // Option A: disable for WASM in config defaults static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 }, // Option B: warm the cache in the WASM constructor (if async is available) if state.config.features.static_models_cache.enabled() { state.warm_static_models_cache().await; } ``` How can I resolve this? If you propose a fix, please make it concise.

Model list endpoint caching

ec833d6

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

src/app.rs Show resolved Hide resolved

src/cli/server.rs Show resolved Hide resolved

src/app.rs Show resolved Hide resolved

src/routes/api/models.rs Show resolved Hide resolved

Review fixes

9fb6527

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

Review fixes

7690b83

ScriptSmith merged commit 35aaca0 into main Mar 20, 2026
18 of 19 checks passed

ScriptSmith deleted the model-list-cache branch March 20, 2026 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model list endpoint caching#17

Model list endpoint caching#17
ScriptSmith merged 3 commits intomainfrom
model-list-cache

ScriptSmith commented Mar 20, 2026

Uh oh!

greptile-apps bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ScriptSmith commented Mar 20, 2026

Uh oh!

greptile-apps bot Mar 20, 2026

Uh oh!

greptile-apps bot Mar 20, 2026

Uh oh!

greptile-apps bot Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ScriptSmith commented Mar 20, 2026

Uh oh!

greptile-apps bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ScriptSmith commented Mar 20, 2026

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 20, 2026 •

edited

Loading