Skip to content

fix(query): break streaming loop on Flush error so DuckDB result buffers free on client disconnect#424

Merged
xe-nvdk merged 5 commits into
mainfrom
fix/query-memory-retention
May 10, 2026
Merged

fix(query): break streaming loop on Flush error so DuckDB result buffers free on client disconnect#424
xe-nvdk merged 5 commits into
mainfrom
fix/query-memory-retention

Conversation

@xe-nvdk
Copy link
Copy Markdown
Member

@xe-nvdk xe-nvdk commented May 10, 2026

Summary

  • Heavy queries against `/api/v1/query` (and `/api/v1/query/arrow`) hold DuckDB result-set memory until query completion or the 5-min timeout when the client disconnects mid-stream — Grafana panel close, browser tab kill, dashboard refresh. The streaming goroutine has no way to learn the client closed the TCP connection because fasthttp's `RequestCtx.Done()` only fires on server shutdown (`server.go:2719-2745`), and `c.UserContext()` is already cancelled by the time the async stream writer runs.
  • The reliable signal in fasthttp's streaming model is the error returned by `bufio.Writer.Flush()` — when the underlying TCP connection is closed, the next flush surfaces EPIPE. The fix captures that error and breaks the streaming loop.
  • Same pattern applied to the three streaming write paths: Arrow IPC (`query_arrow.go`), Arrow-to-JSON (`query_arrow_json.go` — what the default `/query` endpoint uses on `-tags=duckdb_arrow` builds), and the pure database/sql JSON fallback (`query_json_writer.go`).
  • Aligned the four "stream truncated after headers committed" log sites to `Warn` (from `Error`). Headers are already committed when streamErr fires, so the client got a partial result — by definition the server worked. Client disconnect is expected ops noise.

Origin

Filed in response to a user-reported pattern: heavy time-bucket aggregations and wide GROUP BYs cause RSS to climb during query execution. Investigation found multiple potential leak sources; this PR fixes the one with the clearest causal mechanism and reproducer (client-disconnect path).

A throttled `DuckDB.ClearHTTPCache()` from the query path (parquet_metadata_cache eviction) was investigated and tested in the same branch but dropped from this PR — at 1200 queries on a small local-backend dataset, the with-fix and without-fix runs were indistinguishable (both converged to ~180 MB RSS post-settle). Without empirical evidence at the test scale, shipping it would be cargo-cult. May revisit at production scale.

Test plan

  • `go build ./internal/api/... ./cmd/arc/...` clean on both default and `-tags=duckdb_arrow` build configurations.
  • `go test -tags=duckdb_arrow ./internal/api/...` — all existing tests pass plus new `TestStreamArrowJSON_FlushErrorBreaksLoop` (7 tests in the Arrow JSON test file).
  • New regression test uses `errAfterNBytes` (an `io.Writer` that fails after N bytes) and asserts:
    • the returned error wraps the sentinel via `errors.Is`
    • the row count is bounded to within `jsonFlushInterval + batchRows` of the failure point (i.e. break fires at the next 5000-row Flush boundary, not later)
  • Spot-check on Grafana: open a heavy panel against the `/query` endpoint, kill the browser tab mid-load. Pre-fix: DuckDB result reader keeps draining for up to 5 min. Post-fix: reader stops within the next flush window.

Internal review

Four parallel reviewer agents (correctness, security, code-quality, performance) per the CLAUDE.md review protocol. Findings addressed:

  • Same bug in non-Arrow JSON path (code-quality HIGH) → applied identical fix in `query_json_writer.go:169`.
  • Asymmetric severity Error vs Warn (3 agents) → aligned all four streaming-truncation log sites to Warn.
  • Stale test comment + loose assertion (multiple agents) → fixed comment, tightened bound to `jsonFlushInterval + batchRows` instead of `>= totalRows`.
  • No IPC-path test coverage (code-quality MEDIUM) → noted as a follow-up. The IPC path uses structurally identical code to the JSON path that the test exercises. Extracting a testable helper from the Fiber handler is a separate refactor.

Pushed back on:

  • Convert reader.Release() to defer (correctness MEDIUM) — pre-existing pattern, not introduced by this fix, separate hardening.
  • Short-circuit envelope-close on streamErr (correctness HIGH) — the design is intentional ("emit valid JSON regardless" for clients parsing as bytes arrive); cost on disconnect is one no-op JSON marshal.
  • Cancel ctx immediately on flush-error break (performance MEDIUM) — re-read the code, this is already correct: the cancel() at end of closure fires after the break, and `reader.Next()` isn't called again because we broke the loop.
  • Add sentinel error / metric (multiple LOW) — separate observability PR.
  • Convert test to CheckedAllocator (perf NIT) — the existing `buildArrowBatch` helper leaks builder allocations on every test (pre-existing bug); fixing it would scope-creep. Filed mentally as a follow-up.

🤖 Generated with Claude Code

…result buffers on client disconnect

The Arrow-based streaming query handlers (/api/v1/query via the
duckdb_arrow build path, and /api/v1/query/arrow) write results to
the client via Fiber's async SetBodyStreamWriter. When a client
closed the connection mid-stream (Grafana panel close, browser tab
kill, dashboard refresh), the streaming goroutine had no way to
learn the client had gone away.

fasthttp's RequestCtx.Done() only fires on server shutdown, not per-
request disconnect (server.go:2719-2745, "RequestCtx.s.done is only
closed when the server is shutting down"). The streaming code uses
context.Background() deliberately because c.UserContext() is
cancelled when the handler returns — before the async stream writer
runs.

The streaming loop would keep calling reader.Next() on the Arrow
record reader, draining DuckDB result batches into a buffer nobody
was reading, until either the query naturally completed or the per-
request queryTimeout (default 300s) fired. For heavy time-bucket
aggregations and wide GROUP BYs, that's tens of MB of result-set
memory held per cancelled query.

The fix is mechanical: capture the error from bufio.Writer.Flush()
and break the streaming loop on the first failed flush. That's the
canonical signal in fasthttp's streaming model that the underlying
TCP connection has been closed.

Same pattern applied to three call sites:
- internal/api/query_arrow.go (Arrow IPC handler)
- internal/api/query_arrow_json.go (Arrow-to-JSON handler used by
  the default /query endpoint on duckdb_arrow builds)
- internal/api/query_json_writer.go (pure database/sql JSON fallback)

Also align stream-truncation log levels to Warn across all four
caller sites in query_arrow_json.go and query.go. The response
headers were already committed when streamErr fires — by definition
the client got a partial result. Client disconnect is the common
cause and is expected ops noise, not an Arc fault. Warn is the right
severity; Error inflates dashboards.

Regression test in query_arrow_json_test.go uses an errAfterNBytes
io.Writer that fails after 256 bytes and asserts the loop breaks
within jsonFlushInterval rows of the failure point. Verifies the
error propagation chain (errors.Is + wrap), the row-count bound, and
the break-on-Flush behavior.

The Arrow IPC and pure SQL JSON paths use the same code shape but
lack dedicated tests; their behavior is structurally identical to
the JSON-via-Arrow path that the test exercises. Filed as a follow-
up.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements logic to abort streaming query execution when a client disconnects by capturing errors from bufio.Writer.Flush(). This prevents unnecessary memory pressure and processing by breaking the streaming loops early when the underlying TCP connection is closed. Additionally, several log entries for truncated streams were downgraded from Error to Warn to reduce operational noise. The feedback suggests further refining this noise reduction by handling potential errors during ipcWriter.Close() and tightening the row count assertion in the new regression test to ensure the loop breaks exactly at the expected boundary.

Comment thread internal/api/query_arrow.go
Comment thread internal/api/query_arrow_json_test.go Outdated
Two MEDIUM findings from gemini round 1:

- query_arrow.go: ipcWriter.Close() logs failures at Error, but when
  the loop broke because the client disconnected, Close is guaranteed
  to fail flushing trailing IPC metadata over the already-closed
  connection. That's the same client-disconnect event we already
  downgraded to Warn — emitting Error here defeated the ops-noise
  reduction. Downgraded to Warn.

- query_arrow_json_test.go: tightened the test assertion from
  "rowCount > tighterBound" (jsonFlushInterval + batchRows) to
  "rowCount != jsonFlushInterval". Empirically verified that the
  break fires at exactly rowCount=5000: bufio's internal 4 KiB
  auto-flush hits the failing writer well before row 5000 and stores
  a sticky error; the WriteByte/WriteString calls then silently
  no-op until the explicit Flush at row 5000 returns the stored
  error. So the exact-boundary check is correct and gives the
  stronger guarantee.
@xe-nvdk
Copy link
Copy Markdown
Member Author

xe-nvdk commented May 10, 2026

@gemini-code-assist round 1 — both taken in 4109b43.

Taken: `ipcWriter.Close()` log downgrade (internal/api/query_arrow.go:201) — you're right. When the loop broke because the client disconnected, `ipcWriter.Close()` is guaranteed to fail flushing trailing IPC metadata over the already-closed connection. Logging that at `Error` defeated the ops-noise reduction. Now `Warn` with a comment explaining the cause/effect.

Taken: tighter test assertion (internal/api/query_arrow_json_test.go) — I added a temporary `t.Logf` to verify the actual rowCount before tightening:

```
query_arrow_json_test.go:415: actual rowCount=5000, jsonFlushInterval=5000
```

Empirically: bufio's internal 4 KiB auto-flush hits the failing writer well before row 5000 and stores a sticky error; subsequent `WriteByte`/`WriteString` calls silently no-op until the explicit `Flush()` at row 5000 returns the stored error. So the exact-boundary check holds. Changed `rowCount > tighterBound` to `rowCount != jsonFlushInterval`.

Ready for another pass.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces mechanisms to abort streaming query results upon client disconnection by monitoring flush errors, thereby preventing unnecessary memory consumption from draining result sets. It also addresses S3 endpoint normalization for DuckDB and fixes DELETE operations on non-TLS S3 storage. Review feedback suggests that the 5000-row interval for error checking may still allow significant wasted processing and recommends more frequent checks. Furthermore, the feedback notes that downgrading all stream-truncation logs to a warning level might obscure genuine server-side errors and points out inconsistencies in error reporting and metrics across different streaming handlers.

Comment thread internal/api/query_arrow_json.go
Comment thread internal/api/query_json_writer.go
Comment thread internal/api/query.go Outdated
Comment thread internal/api/query_arrow.go
…ymmetry (gemini r2)

Four findings from gemini round 2 on #424:

1. HIGH: jsonFlushInterval=5000 allowed up to 5000 rows of wasted CPU
between an internal bufio auto-flush failure and the next explicit
Flush check. Lowered to 1000 — 5x faster disconnect detection with
negligible overhead (Flush is no-op when no bytes are buffered). The
per-row Write-error check gemini suggested was considered but
rejected as too expensive for hot-path queries; the calibrated
interval change is the better tradeoff.

2. MEDIUM: downgrading ALL stream-truncation logs to Warn hid genuine
server-side stream failures (scanner errors, db iteration failures).
Introduced errClientDisconnected sentinel in query.go and wrapped
the Flush-failure sites with it. Caller log sites now disambiguate:
Warn for client-disconnect / context.DeadlineExceeded /
context.Canceled (expected ops noise), Error for everything else
(real server bug). Four call sites updated: query_arrow_json.go,
query.go (×3), query_arrow.go (new — see #3 below).

3. MEDIUM: Arrow IPC handler logged inline and broke without setting
a streamErr variable, so the closure always logged "Arrow streaming
query completed" even on disconnect, and the IncQueryErrors metric
was never incremented for these events. Refactored to mirror the
JSON handler pattern: capture all error conditions (ctx, decimal
cast, ipcWriter.Write, Flush) into streamErr, log/metric once at
closure end with severity disambiguation matching JSON path.

4. Test update: assert errClientDisconnected is in the error chain
(via errors.Is) so callers can rely on the sentinel for branching.
The existing underlying-sentinel assertion is preserved for future
debugging.

Closes #428 (sentinel introduced ahead of schedule because gemini's
log-level disambiguation finding made the need concrete).
@xe-nvdk
Copy link
Copy Markdown
Member Author

xe-nvdk commented May 10, 2026

@gemini-code-assist round 2 — all four findings addressed in 743736b. The change ended up bigger than expected because finding #2 required introducing the sentinel error to disambiguate cleanly.

HIGH 1+2: wasted work between flush boundaries (query_arrow_json.go:287, query_json_writer.go:176)

Considered per-row Write-error check but rejected as too expensive for hot-path queries — adds a branch per WriteByte across millions of rows, and `bufio.Writer` has no public API to inspect its stored error without forcing a Flush (which incurs a syscall when buffered bytes exist).

Lowered `jsonFlushInterval` from 5000 to 1000. Compromise: 5× faster disconnect detection, ~5× more no-op Flush calls (essentially free), worst-case wasted formatting work bounded to ~1000 rows (~10-100µs at typical query rates). Acceptable.

MEDIUM 1: log level disambiguation (query.go:1536)

You're right — downgrading every stream-truncation error to Warn hid genuine server-side failures (scanner errors, db iteration failures). Fixed by introducing `errClientDisconnected` sentinel (query.go:107-114) and wrapping the Flush-failure sites with it. Caller log sites now disambiguate:

```go
ev := h.logger.Error()
if errors.Is(streamErr, errClientDisconnected) ||
errors.Is(streamErr, context.DeadlineExceeded) ||
errors.Is(streamErr, context.Canceled) {
ev = h.logger.Warn()
}
```

Applied at four call sites: `query_arrow_json.go`, `query.go` (×3 — two streamTypedJSON callers and queryMeasurement), and the newly-refactored `query_arrow.go` (see MEDIUM 2 below).

MEDIUM 2: Arrow IPC handler inconsistency (query_arrow.go:201)

Caught a real asymmetry. The IPC handler logged inline and broke without setting a streamErr, so the closure always logged "Arrow streaming query completed" even on disconnect, AND `IncQueryErrors` was never incremented for those events. Refactored to mirror the JSON handler: capture all error conditions (ctx cancel, decimal cast, ipcWriter.Write, Flush) into `streamErr`, log/metric once at closure end with the same severity disambiguation.

Test update: added an `errors.Is(err, errClientDisconnected)` assertion alongside the existing underlying-I/O-sentinel check. Tests confirm both ancestors are in the error chain.

Also closes #428 (sentinel was deferred there as "defer until a caller actually needs to branch" — gemini's log-disambiguation finding made the need concrete).

Ready for another pass.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to abort streaming queries when a client disconnects by monitoring errors from bufio.Writer.Flush(). This prevents unnecessary resource consumption and memory usage for abandoned requests. The logging logic has been updated to categorize client-side disconnects and timeouts as warnings rather than errors. The PR also includes fixes for S3 endpoint scheme normalization and DELETE operations on non-TLS S3. Review feedback recommends extracting the repeated error-checking logic into a shared helper function and updating a stale comment in the test suite that references the old flush interval.

Comment thread internal/api/query.go
Comment thread internal/api/query.go Outdated
Comment thread internal/api/query.go Outdated
Comment thread internal/api/query.go Outdated
Comment thread internal/api/query_arrow.go Outdated
Comment thread internal/api/query_arrow_json.go Outdated
Comment thread internal/api/query_arrow_json_test.go Outdated
…test comment (gemini r3)

Both gemini round 3 findings taken:

1. Repeated client-error detection across 5 sites:
   - Five identical 4-line blocks of "errors.Is(err, errClientDisconnected)
     || errors.Is(err, context.DeadlineExceeded) || errors.Is(err,
     context.Canceled)" each followed by Warn-vs-Error level selection.
   - Extracted to two helpers in query.go:
     - isClientError(err) bool — the predicate
     - (h *QueryHandler).streamErrEvent(err) *zerolog.Event — picks Warn
       or Error and returns the event ready for .Err().X().Y().Msg()
   - Each site collapses from 5 lines to 1:
       h.streamErrEvent(streamErr).Err(streamErr).Int(...).Msg(...)
   - Net -11 lines across the diff. Same behaviour, less duplication.

2. Stale test comment referenced jsonFlushInterval=5000 after we
   lowered it to 1000 in round 2. Updated to reference the constant
   symbolically so it doesn't drift again.
@xe-nvdk
Copy link
Copy Markdown
Member Author

xe-nvdk commented May 10, 2026

@gemini-code-assist round 3 — both findings taken in aa70be1.

Taken: `isClientError` helper (internal/api/query.go:116) — you're right, the 4-line errors.Is block was repeated 5 times. Extracted two helpers:

```go
func isClientError(err error) bool {
return errors.Is(err, errClientDisconnected) ||
errors.Is(err, context.DeadlineExceeded) ||
errors.Is(err, context.Canceled)
}

func (h *QueryHandler) streamErrEvent(err error) *zerolog.Event {
if isClientError(err) {
return h.logger.Warn()
}
return h.logger.Error()
}
```

Each of the 5 sites now reads as a single line:

```go
h.streamErrEvent(streamErr).Err(streamErr).
Int("rows_sent", rowCount).
Float64("execution_time_ms", ...).
Msg(...)
```

Net -11 lines across the diff. Behaviour is identical (same severity disambiguation, same call shape), just expressed once instead of five times.

Taken: stale test comment — `jsonFlushInterval (5000)` → references the constant symbolically now so it doesn't drift again if we ever retune the interval.

Ready for another pass.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to detect client disconnects during streaming queries by monitoring bufio.Writer.Flush() errors, preventing unnecessary processing and memory usage for abandoned connections. It also includes logic to differentiate between client-side operational noise (logged as warnings) and server-side failures (logged as errors). Other changes include S3 endpoint normalization for DuckDB and a fix for DELETE operations on non-TLS S3. Review feedback suggests standardizing error messages across the various streaming handlers for better consistency.

Comment thread internal/api/query_arrow.go Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@xe-nvdk xe-nvdk merged commit 92dca54 into main May 10, 2026
5 checks passed
xe-nvdk added a commit that referenced this pull request May 10, 2026
Restores tab indentation on the streamErr assignment at line 150
that was lost when applying gemini's suggestion via the GitHub UI
during #424. Behaviour identical; gofmt --diff would otherwise flag
this on every future contributor's pre-commit.

Co-authored-by: Ignacio Van Droogenbroeck <ignacio@vandroogenbroeck.net>
xe-nvdk added a commit that referenced this pull request May 27, 2026
Three small follow-ups that landed alongside the larger v26.06.1
features and bug fixes:

- arc_query_client_disconnects_total Prometheus counter (PR #466,
  closes #426) — observability follow-up to PR #424's streaming-on-
  client-disconnect fix. Placed under the existing #424 entry.
- Arrow streaming tests now leak-detect with CheckedAllocator (PR
  #465, closes #427) — test-hygiene follow-up to the same #424 work;
  buildArrowBatch helper now releases builders + cols defensively so
  CheckedAllocator + AssertSize(t,0) catches future Arrow leaks.
- Line Protocol parser sub-slice indexing in splitOnDelimiter (PR
  #467, closes #354) — ingest hot-path perf: ParseLine -41% ns /
  -52% allocs, ParseBatch -31% ns / -50% allocs. Includes the bench
  table and notes the declined dynamic-cap-8 suggestion from Gemini
  review (caused a +9% regression on the full path).

Doc-only; +35 lines in RELEASE_NOTES_2026.06.1.md.

Co-authored-by: Ignacio Van Droogenbroeck <ignacio@vandroogenbroeck.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant