Skip to content

fix: serve stale read data on backend errors#217

Merged
szibis merged 1 commit intomainfrom
ss/stale-on-error-read-path
Apr 20, 2026
Merged

fix: serve stale read data on backend errors#217
szibis merged 1 commit intomainfrom
ss/stale-on-error-read-path

Conversation

@szibis
Copy link
Copy Markdown
Collaborator

@szibis szibis commented Apr 20, 2026

Summary

  • stop masking backend failures for detected fields, detected labels, detected field values, volume, and volume range as empty successful responses
  • serve stale last-good local cache data for these read surfaces when a backend refresh fails, and otherwise return a real upstream-style error
  • honor non-success /select/logsql/hits responses in the volume helpers and add regressions for Sand-shaped no-data-after-refresh failures

Validation

  • go test ./internal/cache
  • go test ./internal/proxy -run "Test(Drilldown_Detected(Fields|Labels|FieldValues)_|Contract_Volume(ResponseFormat|Range_ResponseFormat|_BypassesNearNowStaleCache|_ServesStaleCacheWhenNearNowRefreshFails|_ReturnsGatewayErrorWithoutCacheOnBackendFailure))"
  • go test ./internal/proxy

@github-actions github-actions Bot added size/L Large change scope/proxy Proxy core labels Apr 20, 2026
@szibis szibis marked this pull request as ready for review April 20, 2026 10:24
@github-actions github-actions Bot added scope/cache Cache layer scope/docs Documentation scope/tests Tests bugfix Bug fix labels Apr 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Quality Report

Compared against base branch main.

Coverage and tests

Signal Base PR Delta
Test count 1787 1794 7
Coverage 89.9% 89.9% 0.0% (stable)

Compatibility

Track Base PR Delta
Loki API 100.0% 11/11 (100.0%) 0.0% (stable)
Logs Drilldown 100.0% 17/17 (100.0%) 0.0% (stable)
VictoriaLogs 100.0% 11/11 (100.0%) 0.0% (stable)

Compatibility components

Track Component Base PR Delta
Loki API label_values 2/2 (100.0%) 2/2 (100.0%) 0.0% (stable)
Loki API labels 4/4 (100.0%) 4/4 (100.0%) 0.0% (stable)
Loki API metrics 2/2 (100.0%) 2/2 (100.0%) 0.0% (stable)
Loki API otel 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Loki API query_range 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Loki API series 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown detected_fields 11/11 (100.0%) 11/11 (100.0%) 0.0% (stable)
Logs Drilldown label_values 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown level_volume 2/2 (100.0%) 2/2 (100.0%) 0.0% (stable)
Logs Drilldown patterns 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown service_logs 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown service_selection 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs detected_fields 4/4 (100.0%) 4/4 (100.0%) 0.0% (stable)
VictoriaLogs field_values 3/3 (100.0%) 3/3 (100.0%) 0.0% (stable)
VictoriaLogs index_stats 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs stream_translation 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs synthetic_labels 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs volume_range 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)

Performance smoke

Lower CPU cost (ns/op) is better. Lower benchmark memory cost (B/op, allocs/op) is better. Higher throughput is better. Lower load-test memory growth is better. Benchmark rows are medians from repeated samples.

Signal Base PR Delta
QueryRange cache-hit CPU cost 1348.0 ns/op 1358.0 ns/op +0.7% (stable)
QueryRange cache-hit memory 200.0 B/op 200.0 B/op 0.0% (stable)
QueryRange cache-hit allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
QueryRange cache-bypass CPU cost 1669.0 ns/op 1678.0 ns/op +0.5% (stable)
QueryRange cache-bypass memory 272.0 B/op 275.0 B/op +1.1% (stable)
QueryRange cache-bypass allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
Labels cache-hit CPU cost 678.4 ns/op 673.1 ns/op -0.8% (stable)
Labels cache-hit memory 56.0 B/op 56.0 B/op 0.0% (stable)
Labels cache-hit allocations 4.0 allocs/op 4.0 allocs/op 0.0% (stable)
Labels cache-bypass CPU cost 842.9 ns/op 832.4 ns/op -1.2% (stable)
Labels cache-bypass memory 84.0 B/op 84.0 B/op 0.0% (stable)
Labels cache-bypass allocations 4.0 allocs/op 4.0 allocs/op 0.0% (stable)
High-concurrency throughput 102019.0 req/s 107535.0 req/s +5.4% (stable)
High-concurrency memory growth 0.8 MB 0.8 MB 0.0% (stable)

State

  • Coverage, compatibility, and sampled performance are reported here from the same PR workflow.
  • This is a delta report, not a release gate by itself. Required checks still decide merge safety.
  • Performance is a smoke comparison, not a full benchmark lab run.
  • Delta states use the same noise guards as the quality gate (percent + absolute + low-baseline checks), so report labels match merge-gate behavior.

@szibis szibis merged commit 10e3f34 into main Apr 20, 2026
44 checks passed
@szibis szibis deleted the ss/stale-on-error-read-path branch April 20, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fix scope/cache Cache layer scope/docs Documentation scope/proxy Proxy core scope/tests Tests size/L Large change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant