[pull] master from cube-js:master#498
Merged
Merged
Conversation
…llation promise for Athena (#10953) * feat: support distributed query cancellation in query queue and cancellation promise for Athena Distributed Query Cancellation (QueryQueue): - During the heartbeat interval, the executing node now checks if the queue item was removed externally (by another node calling cancelQuery or reconcile removing orphaned/stalled queries) - If the queue item is gone and a cancel handler is registered locally, the local cancel handler is invoked, triggering actual DB query cancellation - This closes the gap where cancellation from one node would remove the queue entry but the actual DB query kept running on the executing node Athena Cancellation Promise (AthenaDriver): - query(), memory(), stream(), downloadQueryResults(), and loadPreAggregationIntoTable() now return MaybeCancelablePromise with a .cancel() method - Calling cancel() sets a flag that aborts the poll loop in waitForSuccess() and calls stopQueryExecution() on AWS Athena - waitForSuccess() now accepts an optional isCancelled callback to detect cancellation during the poll loop - The stream() release function now properly stops the Athena query - This enables the query orchestrator to cancel in-flight Athena queries on timeout, manual cancel, or pre-aggregation cancellation via cancelCombinator Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * refactor: use async/await instead of promise chaining in heartbeat callback Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * refactor: use processCancel for external cancellation handling Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat: add REST API endpoint to cancel query by request ID Add DELETE /cubejs-api/v1/running-query/:requestId endpoint that cancels in-flight queries matching the given request ID across all SQL and pre-aggregation queues. - QueryQueue.cancelQueryByRequestId: scans queued queries and cancels those matching the request ID - QueryOrchestrator.cancelQueryByRequestId: searches across all SQL query and pre-aggregation queues for all data sources - OrchestratorApi.cancelQueryByRequestId: delegates to orchestrator - ApiGateway: registers DELETE route with user auth middleware Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test: add smoke test for query cancellation by request ID - Add SlowQuery cube model that uses pg_sleep(30) to create a long-running query - Add test that starts the slow query via HTTP load API with a known x-request-id, waits for it to enter the queue, then cancels it via DELETE /cubejs-api/v1/running-query/:requestId - Verifies the cancel endpoint returns 200 with a result array Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix: address review feedback — partial result bug, heartbeat error handling, sync throw - AthenaDriver: throw on cancellation during row iteration instead of break, which would silently return truncated data - AthenaDriver: move S3 config validation inside async IIFE in loadPreAggregationIntoTable to preserve rejected-promise semantics - QueryQueue: wrap updateHeartBeat in try/catch to prevent unhandled promise rejections from connection failures in the async heartbeat callback Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document x-request-id header on /v1/cubesql and new cancel endpoint - Add Headers section to /v1/cubesql documenting x-request-id header for custom request tracking and cancellation - Add /v1/running-query/{requestId} endpoint documentation with parameter description, response format, and examples Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * docs: document x-request-id header on /v1/load endpoint Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test: use REST SQL API (/cubesql) for cancel smoke test Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix: resolve CI failures — remove test timeout arg and blank line padding - Remove timeout argument from test() call — jest.setTimeout at the describe level already applies, and the types don't support the 3-argument overload (TS2769) - Remove blank line before closing brace (padded-blocks lint error) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * refactor: remove localCancelHandler from heartbeat cancellation detection Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * Revert "refactor: remove localCancelHandler from heartbeat cancellation detection" This reverts commit ae4de19. * docs: add comment explaining why localCancelHandler is needed Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix: match cancel-by-request-id on UUID prefix, not exact span ID The SQL API appends -span-N suffixes to request IDs on each continue-wait retry, so the same logical query may have different span suffixes in the queue. Strip the -span-N suffix before comparing to match all queries from the same request. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix: remove -span-1 suffix from test — SQL API appends it internally Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test: assert cancelled query resolves with error within 40s Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix: use /load API for cancel test and fix error assertion The /cubesql endpoint's native transport has an internal retry loop that re-submits cancelled queries with new span IDs, so cancellation doesn't surface as an error to the client. Use /load instead where the continue-wait loop is client-controlled. After cancellation, /load returns { error: 'Continue wait' } since the queue item is gone. Assert result.error is defined. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * feat: stop SQL API polling when HTTP client disconnects When a client closes the connection to /v1/cubesql, the native transport's continue-wait retry loop would keep polling indefinitely. Fix: listen for res 'close' event in the gateway and mark the request ID as closed. On each retry iteration, the sqlApiLoad callback checks if the request was closed and throws 'Client disconnected' — a non-continue-wait error that breaks the transport retry loop. Closed request IDs are auto-cleaned after 5 minutes. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * Revert "feat: stop SQL API polling when HTTP client disconnects" This reverts commit 1a33564. * feat: stop SQL API polling when HTTP client disconnects (Rust-side) When a client closes the HTTP connection to /v1/cubesql, the native transport's continue-wait retry loop would keep polling indefinitely. Fix: register a 'close' event listener on the Node.js response stream from the Rust side (OnCloseHandler, modeled after OnDrainHandler). When the stream closes, a oneshot channel fires. In handle_sql_query, tokio::select! races the execute() future against the close signal. If the client disconnects, the execute future is dropped, which cancels the transport retry loop and all downstream operations. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test: rewrite cancel test to use SQL API and verify pg_stat_activity - Start slow query via /v1/cubesql (REST SQL API) - Poll pg_stat_activity on the underlying Postgres to confirm pg_sleep is running - Abort the HTTP connection (triggers Rust-side close detection) - Cancel the query in the queue via DELETE /running-query/:requestId - Poll pg_stat_activity again to confirm pg_sleep has stopped, verifying the query is no longer executing in Postgres Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test: verify query cancellation via SQL API with pg_stat_activity check - Start slow query via /v1/cubesql REST SQL API - Confirm pg_sleep is running via pg_stat_activity on the underlying Postgres - Abort HTTP connection (triggers Rust-side close detection) - Cancel query in queue via DELETE /running-query/:requestId - Verify second cancel returns empty (query is gone from queue) - Clean up lingering pg_sleep via pg_terminate_backend in finally Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test: increase pg_sleep to 90 seconds for cancel test reliability Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * fix: cargo fmt — fix import ordering and formatting Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * test: add 2s wait between cancel calls to allow queue cleanup Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> * refactor: move extractRequestUUID to utils and reuse across QueryQueue and QueryCache Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )