Skip to content

Streaming SQL API: BigQuery driver does not forward stream errors, killing pod on any mid-stream BQ failure #10875

@tlangton3

Description

@tlangton3

Summary

With CUBESQL_STREAM_MODE=true, any query whose backing BigQuery call returns an HTTP error mid-stream (e.g. No matching signature for operator = for argument types: TIMESTAMP, DATE) causes the cubejs-server / CubeSQL pod to die with a Node unhandled-rejection (processTicksAndRejections). Connected BI sessions are torn down. The underlying error never reaches the wire as a Postgres ErrorResponse — the client just sees server closed the connection unexpectedly. The exact same query with stream mode off returns a structured ErrorResponse carrying the verbatim BigQuery message and the server stays up.

Cube v1.6.46 (reproduces on master). Affects every BI tool talking to the SQL API.

Repro

Cube model with any cube whose generated SQL can trigger an execution-time error from the backing database. For BigQuery specifically, a type: time dimension queried via the SQL API with this shape hits #10643's DATE()TIMESTAMP coercion gap and crashes the pod:

SELECT created_at, MEASURE(count)
FROM orders
WHERE ((CAST(created_at AS DATE) = DATE('2024-01-08')) IS NULL)
   OR (CAST(created_at AS DATE) <> DATE('2024-01-08'))
   OR (CAST(created_at AS DATE) = DATE('2024-01-08'))
GROUP BY 1 ORDER BY 1 LIMIT 5;

Real cause (only visible in container stderr until the fix):

No matching signature for operator = for argument types: TIMESTAMP, DATE
Signature: T1 = T1
Unable to find common supertype for templated argument <T1>:
  Input types for <T1>: {TIMESTAMP, DATE}

Client observation:

server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost

Process exits with status 1, crash signature processTicksAndRejections.

Root cause (revised after empirical verification)

The original revision of this issue claimed the panic site was an .unwrap() at packages/cubejs-backend-native/src/stream.rs:258. That was wrong. A sentinel eprintln! placed at the top of js_stream_push_chunk and rebuilt into the native module never fires for the failing query — meaning the entire FFI bridge for streaming chunks is bypassed. The crash happens upstream.

The actual root cause is in packages/cubejs-bigquery-driver/src/BigQueryDriver.ts (stream method):

public async stream(query, values): Promise<StreamTableData> {
  const stream = await this.bigquery.createQueryStream({...});
  const rowStream = new HydrationStream();
  stream.pipe(rowStream);           // ← only forwards data/end, NOT error
  return { rowStream };
}

Mechanism:

  • The cube native bridge (packages/cubejs-backend-native/js/index.ts:325-328) listens on rowStream.on('error', ...) to surface streaming errors to the wire layer.
  • Node's stream.pipe(destination) does not forward 'error' events to the destination. When the underlying bigquery.createQueryStream source emits 'error', that event has no listener on rowStream.
  • BUT it also has no listener on the source itself in this code — so Node falls back to its default "unhandled 'error' event" → uncaught exception → process death.
  • The bridge's outer try { await fn(...) } catch (e) cannot catch this because await fn(...) already resolved (with the { rowStream } object) before the BigQuery HTTP call asynchronously failed.

The non-streaming path (driver.query) avoids this because the rejection propagates through await and the bridge's try/catch catches it.

Fix

stream.pipe(rowStream)pipeline(stream, rowStream, () => {}) from node:stream. This:

  1. Auto-forwards source errors by destroying rowStream with the same error → bridge's rowStream.on('error', ...) fires → wire layer emits structured Postgres ErrorResponse (XX000) with verbatim BigQuery message.
  2. Propagates consumer-side cancellation back to the source — destroying rowStream now destroys the BigQuery source stream too. Without this, an aborted BI query left the driver paging into the void (a separate, also-real resource leak).

Verification

End-to-end against real BigQuery via cube SQL API + psql (cube v1.6.46 with patched BigQueryDriver.js overlaid):

Path Before After
Successful 100k-row stream works works (no regression)
BigQuery TIMESTAMP=DATE error container exits 1, server closed the connection unexpectedly ERROR: XX000: Database Execution Error: No matching signature for operator = ... Signature: T1 = T1 ...; container alive

Two synthetic unit tests added (packages/cubejs-bigquery-driver/test/BigQueryDriverStreamError.test.ts) — verified to time out without the fix and pass with the fix:

  • forwards source-stream errors to the returned rowStream
  • propagates rowStream destruction back to the source stream

A separate defensive hardening

While investigating, I also noticed an unrelated FFI panic vector in packages/cubejs-backend-native/src/stream.rs:258: a raw .unwrap() on transform_response. It is not the cause of this BigQuery crash (verified by the sentinel above), but it is a latent panic-across-the-FFI-boundary that any future driver mis-shaping a chunk would detonate. The PR therefore also contains a second commit converting that .unwrap() to a clean reject() + wait_for_future_and_execute_callback path, with its own synthetic regression test (malformed-chunk fixture).

Happy to put up the PR — coming as cube-js/cube PR shortly.

Related

A note for maintainers

The original root-cause section of this issue (claiming the .unwrap() at stream.rs:258) has been retracted above. I left the defensive hardening for that .unwrap() in the PR because it's a real (if narrow) bug, but the actual production fix is the BigQueryDriver.stream pipeline() change. Apologies for the noisy revision history on this issue — the diagnosis sharpened over the course of the investigation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions