aitools: extract pollStatement helper and pin OnWaitTimeout#5092
Merged
simonfaltum merged 1 commit intomainfrom Apr 28, 2026
Merged
aitools: extract pollStatement helper and pin OnWaitTimeout#5092simonfaltum merged 1 commit intomainfrom
simonfaltum merged 1 commit intomainfrom
Conversation
This was referenced Apr 27, 2026
simonfaltum
added a commit
that referenced
this pull request
Apr 27, 2026
Adds a low-level command tree for asynchronous SQL statement management, complementing the synchronous 'tools query': databricks experimental aitools tools statement submit "SELECT ..." databricks experimental aitools tools statement get <statement_id> databricks experimental aitools tools statement status <statement_id> databricks experimental aitools tools statement cancel <statement_id> submit fires an ExecuteStatement with WaitTimeout=0s and OnWaitTimeout=CONTINUE, returning the statement_id immediately. get polls (via pollStatement from #5092) until terminal and emits rows on success or an error object on failure. status performs a single GET without polling. cancel sends CancelExecution. All four subcommands emit a uniform JSON shape {statement_id, state, warehouse_id, columns, rows, error} with omitempty so the payload only includes fields that subcommand has. Important UX nuance: 'statement get' Ctrl+C stops polling but does NOT cancel the server-side statement. Users that want server-side termination call 'statement cancel' explicitly. (This differs from 'tools query', which cancels server-side on Ctrl+C because the user invoked the synchronous path.) The pollStatement helper from #5092 is already designed to propagate ctx errors without touching the server, so 'get' inherits this behavior for free. Co-authored-by: Isaac
simonfaltum
added a commit
that referenced
this pull request
Apr 27, 2026
discover-schema previously walked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that PR 2 fixed; same fix. Two layers of parallelism: 1. Tables fan out via errgroup with --concurrency (default 8). A failure on one table never aborts the others; it gets rendered inline as "Error discovering ...". 2. Within a table, DESCRIBE still runs first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. The output text is assembled once both finish, preserving the existing column order (COLUMNS, SAMPLE DATA, NULL COUNTS). Switch executeSQL from the SDK's ExecuteAndWait helper to ExecuteStatement + pollStatement (the helper extracted in #5092). This brings discover-schema in line with query.go and statement.go: explicit OnWaitTimeout=CONTINUE on every call, and any future polling-helper improvement (e.g. signal handling) lands here for free. Failed states now flow through checkFailedState, which yields more specific error messages (e.g. "query failed: SYNTAX_ERROR ...") than the previous hand-rolled branch. The user-visible "SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v" wrapping is unchanged. Add --concurrency validation matching the cmd/fs/cp.go and experimental/aitools/cmd/query.go pattern: PreRunE rejects values <= 0 with errInvalidBatchConcurrency. Tests added in discover_schema_test.go: - quoteTableName (table-driven across valid identifiers, missing parts, injection attempts, empty parts, leading-digit identifiers) - parseDescribeResult skipping metadata rows - executeSQL pins OnWaitTimeout=CONTINUE - executeSQL propagates server-reported FAILED state - executeSQL wraps transport errors - discoverTable: sample and null-count probes run concurrently after DESCRIBE (atomic peak-counter assertion) - discoverTable: a sample failure does not abort null counts - --concurrency 0 and -1 rejected at PreRunE time - invalid table name (not CATALOG.SCHEMA.TABLE) rejected at RunE validation before any API call Co-authored-by: Isaac
Contributor
|
No findings. 🔍 Reviewed by nitpicker |
arsenyinfo
approved these changes
Apr 28, 2026
simonfaltum
added a commit
that referenced
this pull request
Apr 28, 2026
Adds a low-level command tree for asynchronous SQL statement management, complementing the synchronous 'tools query': databricks experimental aitools tools statement submit "SELECT ..." databricks experimental aitools tools statement get <statement_id> databricks experimental aitools tools statement status <statement_id> databricks experimental aitools tools statement cancel <statement_id> submit fires an ExecuteStatement with WaitTimeout=0s and OnWaitTimeout=CONTINUE, returning the statement_id immediately. get polls (via pollStatement from #5092) until terminal and emits rows on success or an error object on failure. status performs a single GET without polling. cancel sends CancelExecution. All four subcommands emit a uniform JSON shape {statement_id, state, warehouse_id, columns, rows, error} with omitempty so the payload only includes fields that subcommand has. Important UX nuance: 'statement get' Ctrl+C stops polling but does NOT cancel the server-side statement. Users that want server-side termination call 'statement cancel' explicitly. (This differs from 'tools query', which cancels server-side on Ctrl+C because the user invoked the synchronous path.) The pollStatement helper from #5092 is already designed to propagate ctx errors without touching the server, so 'get' inherits this behavior for free. Co-authored-by: Isaac
simonfaltum
added a commit
that referenced
this pull request
Apr 28, 2026
discover-schema previously walked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that PR 2 fixed; same fix. Two layers of parallelism: 1. Tables fan out via errgroup with --concurrency (default 8). A failure on one table never aborts the others; it gets rendered inline as "Error discovering ...". 2. Within a table, DESCRIBE still runs first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. The output text is assembled once both finish, preserving the existing column order (COLUMNS, SAMPLE DATA, NULL COUNTS). Switch executeSQL from the SDK's ExecuteAndWait helper to ExecuteStatement + pollStatement (the helper extracted in #5092). This brings discover-schema in line with query.go and statement.go: explicit OnWaitTimeout=CONTINUE on every call, and any future polling-helper improvement (e.g. signal handling) lands here for free. Failed states now flow through checkFailedState, which yields more specific error messages (e.g. "query failed: SYNTAX_ERROR ...") than the previous hand-rolled branch. The user-visible "SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v" wrapping is unchanged. Add --concurrency validation matching the cmd/fs/cp.go and experimental/aitools/cmd/query.go pattern: PreRunE rejects values <= 0 with errInvalidBatchConcurrency. Tests added in discover_schema_test.go: - quoteTableName (table-driven across valid identifiers, missing parts, injection attempts, empty parts, leading-digit identifiers) - parseDescribeResult skipping metadata rows - executeSQL pins OnWaitTimeout=CONTINUE - executeSQL propagates server-reported FAILED state - executeSQL wraps transport errors - discoverTable: sample and null-count probes run concurrently after DESCRIBE (atomic peak-counter assertion) - discoverTable: a sample failure does not abort null counts - --concurrency 0 and -1 rejected at PreRunE time - invalid table name (not CATALOG.SCHEMA.TABLE) rejected at RunE validation before any API call Co-authored-by: Isaac
Refactor `executeAndPoll` in `experimental/aitools/cmd/query.go` to extract a pure `pollStatement(ctx, api, resp)` helper. The helper polls until the statement reaches a terminal state and returns the response without any signal handling, spinner, or server-side cancellation; those concerns stay in `executeAndPoll` where they belong. Also pin `OnWaitTimeout: CONTINUE` explicitly on the `ExecuteStatement` call. The SDK default happens to be CONTINUE today, but relying on it is a hidden coupling: a server-side default flip would silently break the poll loop by killing the statement before our first GET. Behavior is unchanged for the existing `query` command. Follow-up PRs (parallel batch queries, statement lifecycle command tree) will reuse the helper. Co-authored-by: Isaac
8e00440 to
79fc080
Compare
simonfaltum
added a commit
that referenced
this pull request
Apr 28, 2026
Adds a low-level command tree for asynchronous SQL statement management, complementing the synchronous 'tools query': databricks experimental aitools tools statement submit "SELECT ..." databricks experimental aitools tools statement get <statement_id> databricks experimental aitools tools statement status <statement_id> databricks experimental aitools tools statement cancel <statement_id> submit fires an ExecuteStatement with WaitTimeout=0s and OnWaitTimeout=CONTINUE, returning the statement_id immediately. get polls (via pollStatement from #5092) until terminal and emits rows on success or an error object on failure. status performs a single GET without polling. cancel sends CancelExecution. All four subcommands emit a uniform JSON shape {statement_id, state, warehouse_id, columns, rows, error} with omitempty so the payload only includes fields that subcommand has. Important UX nuance: 'statement get' Ctrl+C stops polling but does NOT cancel the server-side statement. Users that want server-side termination call 'statement cancel' explicitly. (This differs from 'tools query', which cancels server-side on Ctrl+C because the user invoked the synchronous path.) The pollStatement helper from #5092 is already designed to propagate ctx errors without touching the server, so 'get' inherits this behavior for free. Co-authored-by: Isaac
simonfaltum
added a commit
that referenced
this pull request
Apr 28, 2026
discover-schema previously walked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that PR 2 fixed; same fix. Two layers of parallelism: 1. Tables fan out via errgroup with --concurrency (default 8). A failure on one table never aborts the others; it gets rendered inline as "Error discovering ...". 2. Within a table, DESCRIBE still runs first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. The output text is assembled once both finish, preserving the existing column order (COLUMNS, SAMPLE DATA, NULL COUNTS). Switch executeSQL from the SDK's ExecuteAndWait helper to ExecuteStatement + pollStatement (the helper extracted in #5092). This brings discover-schema in line with query.go and statement.go: explicit OnWaitTimeout=CONTINUE on every call, and any future polling-helper improvement (e.g. signal handling) lands here for free. Failed states now flow through checkFailedState, which yields more specific error messages (e.g. "query failed: SYNTAX_ERROR ...") than the previous hand-rolled branch. The user-visible "SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v" wrapping is unchanged. Add --concurrency validation matching the cmd/fs/cp.go and experimental/aitools/cmd/query.go pattern: PreRunE rejects values <= 0 with errInvalidBatchConcurrency. Tests added in discover_schema_test.go: - quoteTableName (table-driven across valid identifiers, missing parts, injection attempts, empty parts, leading-digit identifiers) - parseDescribeResult skipping metadata rows - executeSQL pins OnWaitTimeout=CONTINUE - executeSQL propagates server-reported FAILED state - executeSQL wraps transport errors - discoverTable: sample and null-count probes run concurrently after DESCRIBE (atomic peak-counter assertion) - discoverTable: a sample failure does not abort null counts - --concurrency 0 and -1 rejected at PreRunE time - invalid table name (not CATALOG.SCHEMA.TABLE) rejected at RunE validation before any API call Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack
This PR is part of a 4-PR stack making
aitoolsdata exploration faster for ai-dev-kit. Each PR is independently reviewable; merge in order.main) — this PRUse
git diff <base>...HEADor set the comparison base in the GitHub UI to see only this PR's changes; the default "Files changed" diff againstmainincludes ancestor PRs.Why
The query command in
experimental/aitools/cmd/query.goworks today, but two things make it fragile and hard to reuse:ExecuteStatementrequest setsWaitTimeout: 0sbut does not setOnWaitTimeout. That relies on the SDK's default beingCONTINUE. It is today, but a flip would silently break the command: the statement would be cancelled before our first GET and we'd never see the result.This PR is a pure refactor + one explicit-default fix. No user-visible behavior change.
Changes
pollStatement(ctx, api, resp)fromexecuteAndPoll. The helper polls until the statement reaches a terminal state and returns the response. It does not callCancelExecutionon context cancellation, that's the caller's job (and a deliberate design choice for the upcomingstatement getcommand, where Ctrl+C should stop polling without killing the server-side statement).OnWaitTimeout: CONTINUEexplicitly on theExecuteStatementcall.executeAndPollto delegate topollStatementand keep the existing signal-handling, spinner, and server-side cancel-on-Ctrl+C semantics intact.TestExecuteAndPollImmediateSuccessmatcher to assertOnWaitTimeout == CONTINUEso a future SDK default flip cannot regress us.Test plan
go test ./experimental/aitools/...passes (10 polling-related cases including the 5 new ones).make checksclean (tidy, whitespace, dead code).make fmtno drift.make lint0 issues.executeAndPolltests (immediate success, immediate failure, polling, fail-during-poll, ctx-cancellation-calls-cancel-execution) all still pass without modification beyond the matcher tweak.