Skip to content

Add CSV output format for SQL query results#4728

Merged
simonfaltum merged 11 commits intomainfrom
simonfaltum/sql-csv-output
Apr 16, 2026
Merged

Add CSV output format for SQL query results#4728
simonfaltum merged 11 commits intomainfrom
simonfaltum/sql-csv-output

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented Mar 12, 2026

Why

The SQL query command supports JSON and table output but not CSV. CSV is the most common format for data export and piping into tools like Excel, pandas, and database imports.

Changes

Before: databricks sql query only supports JSON and table output formats via the global --output flag (text/json).
Now: The query command shadows the global --output flag with a local version that also accepts csv, writing results as RFC 4180 CSV with column headers as the first row.

The local --output flag:

  • Accepts text, json, and csv (the global flag only accepts text and json)
  • Is case-insensitive, matching the global flag's behavior
  • Respects DATABRICKS_OUTPUT_FORMAT env var (invalid values silently ignored, matching root behavior)
  • Registers shell completions for all three values
  • Zero changes to shared code (libs/flags/output.go, cmd/root/io.go, libs/cmdio/)

Uses Go's encoding/csv package for proper escaping and quoting.

Test plan

  • Unit tests for CSV rendering (basic, special characters, empty results, short rows)
  • Unit test for unsupported output format error
  • Unit test for case-insensitive --output (e.g. --output JSON)
  • Unit test for env var override
  • Unit test for invalid env var silently ignored
  • Unit test for explicit flag overriding env var
  • Full aitools test suite passes
  • make checks passes

Add a --format csv flag to the query command for exporting results as CSV.
Uses Go's encoding/csv for proper escaping and quoting. Column headers
are included as the first row.

Co-authored-by: Isaac
@simonfaltum simonfaltum marked this pull request as ready for review March 13, 2026 10:33
@simonfaltum simonfaltum requested review from a team and lennartkats-db as code owners March 13, 2026 10:33
Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was posted by Claude (AI assistant). Shreyas will do a separate, more thorough review pass.

Priority: MEDIUM — Dual-flag design concern

MEDIUM: --format vs --output dual-flag confusion

The PR introduces a --format flag for SQL statement execution that is separate from the existing --output flag used everywhere else in the CLI. This creates two different ways to control output format, which may confuse users. Consider whether --output csv could be extended to handle this case instead, or at minimum document the distinction clearly.

Other Observations

  • CSV output implementation is clean and correct
  • Good handling of SQL result pagination
  • Proper escaping of CSV values with special characters
  • Missing test for interaction between --format and --output flags when both are set

The main thing to discuss is whether this warrants a new flag or should extend the existing --output flag.

@simonfaltum
Copy link
Copy Markdown
Member Author

Re: --format vs --output flag concern:

We intentionally use a separate --format flag here rather than extending --output. The global --output flag is a PersistentFlag on the root command with a hard-coded Set() validator that only accepts json and text. There is no mechanism in Cobra/pflag to extend a parent's persistent flag with additional values on a per-command basis.

Adding csv to the global --output would require changes to libs/flags/output.go, the cmdio render pipeline, and every command would need to handle or reject csv. That is a large, invasive change for a feature that only applies to the SQL query command.

The PR already includes a mutual-exclusion guard that rejects using both flags together with a clear error message.

Copy link
Copy Markdown
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: --format vs --output flag concern:

We intentionally use a separate --format flag here rather than extending --output. The global --output flag is a PersistentFlag on the root command with a hard-coded Set() validator that only accepts json and text. There is no mechanism in Cobra/pflag to extend a parent's persistent flag with additional values on a per-command basis.

Adding csv to the global --output would require changes to libs/flags/output.go, the cmdio render pipeline, and every command would need to handle or reject csv. That is a large, invasive change for a feature that only applies to the SQL query command.

The PR already includes a mutual-exclusion guard that rejects using both flags together with a clear error message.

Should they be mutually exclusive though? For example, what if I'd like the table to be in csv and my error message in json? Naively, I would expect --format (or maybe --table-format?) to override the value of --output for the table.

@simonfaltum simonfaltum requested review from renaudhartert-db and shreyas-goenka and removed request for a team, lennartkats-db and shreyas-goenka April 7, 2026 10:42
@simonfaltum
Copy link
Copy Markdown
Member Author

Should they be mutually exclusive though? For example, what if I'd like the table to be in csv and my error message in json? Naively, I would expect --format (or maybe --table-format?) to override the value of --output for the table.

We should probably address this eventually but right now we don't have any error formatting depending on output.

simonfaltum and others added 5 commits April 9, 2026 06:24
Shadow the root command's persistent --output flag with a local flag
on the query command that accepts text, json, and csv. This removes
the separate --format flag and the mutual-exclusion guard between
--format and --output.

The local flag handles DATABRICKS_OUTPUT_FORMAT env var fallback and
registers shell completions for all three values.

Co-authored-by: Isaac
Use flags.OutputText/OutputJSON constants and a local outputCSV
constant instead of bare string literals. Define envOutputFormat
constant matching the root command's env var name.

Co-authored-by: Isaac
Normalize --output flag value to lowercase before validation, matching
the root command's flags.Output.Set() behavior. This ensures --output JSON
works the same as on every other command.

Silently ignore invalid DATABRICKS_OUTPUT_FORMAT env var values instead of
hard-failing, matching the root command's behavior.

Add tests for case insensitivity, env var override, invalid env var
ignored, and explicit flag overriding env var.

Co-authored-by: Isaac
@simonfaltum simonfaltum added this pull request to the merge queue Apr 16, 2026
Merged via the queue into main with commit a2fb05f Apr 16, 2026
20 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/sql-csv-output branch April 16, 2026 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants