aitools: extract pollStatement helper and pin OnWaitTimeout by simonfaltum · Pull Request #5092 · databricks/cli

simonfaltum · 2026-04-27T13:05:17Z

Stack

This PR is part of a 4-PR stack making aitools data exploration faster for ai-dev-kit. Each PR is independently reviewable; merge in order.

aitools: extract pollStatement helper and pin OnWaitTimeout #5092 — aitools: extract pollStatement helper and pin OnWaitTimeout (base: main) — this PR
aitools: run multiple SQL queries in parallel from one invocation #5093 — aitools: run multiple SQL queries in parallel from one query invocation (base: aitools: extract pollStatement helper and pin OnWaitTimeout #5092)
aitools: add 'tools statement' lifecycle commands #5095 — aitools: add 'tools statement' lifecycle commands (base: aitools: run multiple SQL queries in parallel from one invocation #5093)
aitools: parallelize discover-schema across tables and probes #5097 — aitools: parallelize discover-schema across tables and probes (base: aitools: add 'tools statement' lifecycle commands #5095)

Use git diff <base>...HEAD or set the comparison base in the GitHub UI to see only this PR's changes; the default "Files changed" diff against main includes ancestor PRs.

Why

The query command in experimental/aitools/cmd/query.go works today, but two things make it fragile and hard to reuse:

The polling loop, signal handling, spinner, and server-side cancellation are entangled in one ~100-line function. Upcoming features (parallel batch queries, a statement lifecycle command tree) need pure polling without the signal-handler side effects, so the helper has to come out cleanly.
The ExecuteStatement request sets WaitTimeout: 0s but does not set OnWaitTimeout. That relies on the SDK's default being CONTINUE. It is today, but a flip would silently break the command: the statement would be cancelled before our first GET and we'd never see the result.

This PR is a pure refactor + one explicit-default fix. No user-visible behavior change.

Changes

Extract pollStatement(ctx, api, resp) from executeAndPoll. The helper polls until the statement reaches a terminal state and returns the response. It does not call CancelExecution on context cancellation, that's the caller's job (and a deliberate design choice for the upcoming statement get command, where Ctrl+C should stop polling without killing the server-side statement).
Pin OnWaitTimeout: CONTINUE explicitly on the ExecuteStatement call.
Update executeAndPoll to delegate to pollStatement and keep the existing signal-handling, spinner, and server-side cancel-on-Ctrl+C semantics intact.
Add five unit tests covering the new helper:
- Immediate terminal short-circuit (no Get calls)
- Failed terminal returned without error (caller decides)
- Eventual success across multiple polls
- Context cancellation returns ctx error and does NOT call CancelExecution
- GetStatement transport error is wrapped and propagated
Update the existing TestExecuteAndPollImmediateSuccess matcher to assert OnWaitTimeout == CONTINUE so a future SDK default flip cannot regress us.

Test plan

go test ./experimental/aitools/... passes (10 polling-related cases including the 5 new ones).
make checks clean (tidy, whitespace, dead code).
make fmt no drift.
make lint 0 issues.
Existing executeAndPoll tests (immediate success, immediate failure, polling, fail-during-poll, ctx-cancellation-calls-cancel-execution) all still pass without modification beyond the matcher tweak.

Adds a low-level command tree for asynchronous SQL statement management, complementing the synchronous 'tools query': databricks experimental aitools tools statement submit "SELECT ..." databricks experimental aitools tools statement get <statement_id> databricks experimental aitools tools statement status <statement_id> databricks experimental aitools tools statement cancel <statement_id> submit fires an ExecuteStatement with WaitTimeout=0s and OnWaitTimeout=CONTINUE, returning the statement_id immediately. get polls (via pollStatement from #5092) until terminal and emits rows on success or an error object on failure. status performs a single GET without polling. cancel sends CancelExecution. All four subcommands emit a uniform JSON shape {statement_id, state, warehouse_id, columns, rows, error} with omitempty so the payload only includes fields that subcommand has. Important UX nuance: 'statement get' Ctrl+C stops polling but does NOT cancel the server-side statement. Users that want server-side termination call 'statement cancel' explicitly. (This differs from 'tools query', which cancels server-side on Ctrl+C because the user invoked the synchronous path.) The pollStatement helper from #5092 is already designed to propagate ctx errors without touching the server, so 'get' inherits this behavior for free. Co-authored-by: Isaac

discover-schema previously walked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that PR 2 fixed; same fix. Two layers of parallelism: 1. Tables fan out via errgroup with --concurrency (default 8). A failure on one table never aborts the others; it gets rendered inline as "Error discovering ...". 2. Within a table, DESCRIBE still runs first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. The output text is assembled once both finish, preserving the existing column order (COLUMNS, SAMPLE DATA, NULL COUNTS). Switch executeSQL from the SDK's ExecuteAndWait helper to ExecuteStatement + pollStatement (the helper extracted in #5092). This brings discover-schema in line with query.go and statement.go: explicit OnWaitTimeout=CONTINUE on every call, and any future polling-helper improvement (e.g. signal handling) lands here for free. Failed states now flow through checkFailedState, which yields more specific error messages (e.g. "query failed: SYNTAX_ERROR ...") than the previous hand-rolled branch. The user-visible "SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v" wrapping is unchanged. Add --concurrency validation matching the cmd/fs/cp.go and experimental/aitools/cmd/query.go pattern: PreRunE rejects values <= 0 with errInvalidBatchConcurrency. Tests added in discover_schema_test.go: - quoteTableName (table-driven across valid identifiers, missing parts, injection attempts, empty parts, leading-digit identifiers) - parseDescribeResult skipping metadata rows - executeSQL pins OnWaitTimeout=CONTINUE - executeSQL propagates server-reported FAILED state - executeSQL wraps transport errors - discoverTable: sample and null-count probes run concurrently after DESCRIBE (atomic peak-counter assertion) - discoverTable: a sample failure does not abort null counts - --concurrency 0 and -1 rejected at PreRunE time - invalid table name (not CATALOG.SCHEMA.TABLE) rejected at RunE validation before any API call Co-authored-by: Isaac

arsenyinfo · 2026-04-28T06:26:32Z

No findings.

🔍 Reviewed by nitpicker

Adds a low-level command tree for asynchronous SQL statement management, complementing the synchronous 'tools query': databricks experimental aitools tools statement submit "SELECT ..." databricks experimental aitools tools statement get <statement_id> databricks experimental aitools tools statement status <statement_id> databricks experimental aitools tools statement cancel <statement_id> submit fires an ExecuteStatement with WaitTimeout=0s and OnWaitTimeout=CONTINUE, returning the statement_id immediately. get polls (via pollStatement from #5092) until terminal and emits rows on success or an error object on failure. status performs a single GET without polling. cancel sends CancelExecution. All four subcommands emit a uniform JSON shape {statement_id, state, warehouse_id, columns, rows, error} with omitempty so the payload only includes fields that subcommand has. Important UX nuance: 'statement get' Ctrl+C stops polling but does NOT cancel the server-side statement. Users that want server-side termination call 'statement cancel' explicitly. (This differs from 'tools query', which cancels server-side on Ctrl+C because the user invoked the synchronous path.) The pollStatement helper from #5092 is already designed to propagate ctx errors without touching the server, so 'get' inherits this behavior for free. Co-authored-by: Isaac

discover-schema previously walked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that PR 2 fixed; same fix. Two layers of parallelism: 1. Tables fan out via errgroup with --concurrency (default 8). A failure on one table never aborts the others; it gets rendered inline as "Error discovering ...". 2. Within a table, DESCRIBE still runs first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. The output text is assembled once both finish, preserving the existing column order (COLUMNS, SAMPLE DATA, NULL COUNTS). Switch executeSQL from the SDK's ExecuteAndWait helper to ExecuteStatement + pollStatement (the helper extracted in #5092). This brings discover-schema in line with query.go and statement.go: explicit OnWaitTimeout=CONTINUE on every call, and any future polling-helper improvement (e.g. signal handling) lands here for free. Failed states now flow through checkFailedState, which yields more specific error messages (e.g. "query failed: SYNTAX_ERROR ...") than the previous hand-rolled branch. The user-visible "SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v" wrapping is unchanged. Add --concurrency validation matching the cmd/fs/cp.go and experimental/aitools/cmd/query.go pattern: PreRunE rejects values <= 0 with errInvalidBatchConcurrency. Tests added in discover_schema_test.go: - quoteTableName (table-driven across valid identifiers, missing parts, injection attempts, empty parts, leading-digit identifiers) - parseDescribeResult skipping metadata rows - executeSQL pins OnWaitTimeout=CONTINUE - executeSQL propagates server-reported FAILED state - executeSQL wraps transport errors - discoverTable: sample and null-count probes run concurrently after DESCRIBE (atomic peak-counter assertion) - discoverTable: a sample failure does not abort null counts - --concurrency 0 and -1 rejected at PreRunE time - invalid table name (not CATALOG.SCHEMA.TABLE) rejected at RunE validation before any API call Co-authored-by: Isaac

Refactor `executeAndPoll` in `experimental/aitools/cmd/query.go` to extract a pure `pollStatement(ctx, api, resp)` helper. The helper polls until the statement reaches a terminal state and returns the response without any signal handling, spinner, or server-side cancellation; those concerns stay in `executeAndPoll` where they belong. Also pin `OnWaitTimeout: CONTINUE` explicitly on the `ExecuteStatement` call. The SDK default happens to be CONTINUE today, but relying on it is a hidden coupling: a server-side default flip would silently break the poll loop by killing the statement before our first GET. Behavior is unchanged for the existing `query` command. Follow-up PRs (parallel batch queries, statement lifecycle command tree) will reuse the helper. Co-authored-by: Isaac

Adds a low-level command tree for asynchronous SQL statement management, complementing the synchronous 'tools query': databricks experimental aitools tools statement submit "SELECT ..." databricks experimental aitools tools statement get <statement_id> databricks experimental aitools tools statement status <statement_id> databricks experimental aitools tools statement cancel <statement_id> submit fires an ExecuteStatement with WaitTimeout=0s and OnWaitTimeout=CONTINUE, returning the statement_id immediately. get polls (via pollStatement from #5092) until terminal and emits rows on success or an error object on failure. status performs a single GET without polling. cancel sends CancelExecution. All four subcommands emit a uniform JSON shape {statement_id, state, warehouse_id, columns, rows, error} with omitempty so the payload only includes fields that subcommand has. Important UX nuance: 'statement get' Ctrl+C stops polling but does NOT cancel the server-side statement. Users that want server-side termination call 'statement cancel' explicitly. (This differs from 'tools query', which cancels server-side on Ctrl+C because the user invoked the synchronous path.) The pollStatement helper from #5092 is already designed to propagate ctx errors without touching the server, so 'get' inherits this behavior for free. Co-authored-by: Isaac

discover-schema previously walked tables sequentially and ran each table's three probes (DESCRIBE, sample SELECT, null counts) one after the other. For ai-dev-kit's data-exploration phase that meant warehouse-bound work was idle most of the time. Same root cause as the multi-query exploration latency that PR 2 fixed; same fix. Two layers of parallelism: 1. Tables fan out via errgroup with --concurrency (default 8). A failure on one table never aborts the others; it gets rendered inline as "Error discovering ...". 2. Within a table, DESCRIBE still runs first because the column list feeds the null-counts query. After DESCRIBE returns, the sample SELECT and null-counts probes run concurrently. The output text is assembled once both finish, preserving the existing column order (COLUMNS, SAMPLE DATA, NULL COUNTS). Switch executeSQL from the SDK's ExecuteAndWait helper to ExecuteStatement + pollStatement (the helper extracted in #5092). This brings discover-schema in line with query.go and statement.go: explicit OnWaitTimeout=CONTINUE on every call, and any future polling-helper improvement (e.g. signal handling) lands here for free. Failed states now flow through checkFailedState, which yields more specific error messages (e.g. "query failed: SYNTAX_ERROR ...") than the previous hand-rolled branch. The user-visible "SAMPLE DATA: Error - %v" / "NULL COUNTS: Error - %v" wrapping is unchanged. Add --concurrency validation matching the cmd/fs/cp.go and experimental/aitools/cmd/query.go pattern: PreRunE rejects values <= 0 with errInvalidBatchConcurrency. Tests added in discover_schema_test.go: - quoteTableName (table-driven across valid identifiers, missing parts, injection attempts, empty parts, leading-digit identifiers) - parseDescribeResult skipping metadata rows - executeSQL pins OnWaitTimeout=CONTINUE - executeSQL propagates server-reported FAILED state - executeSQL wraps transport errors - discoverTable: sample and null-count probes run concurrently after DESCRIBE (atomic peak-counter assertion) - discoverTable: a sample failure does not abort null counts - --concurrency 0 and -1 rejected at PreRunE time - invalid table name (not CATALOG.SCHEMA.TABLE) rejected at RunE validation before any API call Co-authored-by: Isaac

simonfaltum temporarily deployed to test-trigger-is April 27, 2026 13:05 — with GitHub Actions Inactive

This was referenced Apr 27, 2026

aitools: run multiple SQL queries in parallel from one invocation #5093

Merged

aitools: add 'tools statement' lifecycle commands #5095

Merged

aitools: parallelize discover-schema across tables and probes #5097

Merged

arsenyinfo approved these changes Apr 28, 2026

View reviewed changes

simonfaltum force-pushed the simonfaltum/aitools-pr1-refactor branch from 8e00440 to 79fc080 Compare April 28, 2026 09:08

simonfaltum temporarily deployed to test-trigger-is April 28, 2026 09:09 — with GitHub Actions Inactive

simonfaltum added this pull request to the merge queue Apr 28, 2026

Merged via the queue into main with commit 7ff1a81 Apr 28, 2026
19 checks passed

simonfaltum deleted the simonfaltum/aitools-pr1-refactor branch April 28, 2026 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aitools: extract pollStatement helper and pin OnWaitTimeout#5092

aitools: extract pollStatement helper and pin OnWaitTimeout#5092
simonfaltum merged 1 commit intomainfrom
simonfaltum/aitools-pr1-refactor

simonfaltum commented Apr 27, 2026 •

edited

Loading

Uh oh!

arsenyinfo commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonfaltum commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stack

Why

Changes

Test plan

Uh oh!

arsenyinfo commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonfaltum commented Apr 27, 2026 •

edited

Loading