feat(cubestore-cli): Support Arrow format for responses by ovr · Pull Request #10915 · cube-js/cube

ovr · 2026-05-20T11:33:44Z

No description provided.

claude · 2026-05-20T11:34:03Z

Claude finished @ovr's task in 2m 15s —— View job

PR Review: feat(cubestore-cli): Support Arrow format for responses

Gather context and read changed files
Review code quality and correctness
Review security considerations
Review performance implications
Review test coverage
Provide detailed feedback

Summary

Clean, well-structured PR that upgrades the CubeStore CLI's wire protocol from the legacy stringified-rows FlatBuffer format to Apache Arrow IPC. The design is backwards-compatible — the client requests Arrow, but gracefully handles Legacy responses from older servers via the ResultData enum.

What's good

Backwards compatibility: The ResultData enum + ResponseFormat design cleanly handles both wire formats. The comment on ResponseFormat correctly notes that an older server can still answer with Legacy — this is the right approach.
pub use arrow: Re-exporting the arrow crate from cubestore-ws-transport ensures the CLI uses the exact same Arrow version, avoiding potential type mismatches across crate boundaries.
Arrow rendering: render_arrow_batches iterates Arrow arrays directly rather than materializing to Vec<Vec<Option<String>>> first — good for memory on large results.
Test coverage: New decode_arrow_ipc_result_with_nulls test exercises the full encode→decode path for Arrow IPC including null handling. Existing tests properly updated for the new ResultData enum.
No API break for consumers: The QueryResult struct still has a single pub data field — callers access columns/rows through the accessor methods.

Issues found (see inline comments)

Severity	File	Issue
Minor	`format.rs:95-105`	`batch_formatters()` uses `.expect()` which will panic on unsupported Arrow types — consider graceful fallback
Nit	`result.rs:70-77`	`get_columns()` allocates `Vec<String>` on every call; `Cow` would avoid cloning for the Legacy path
Nit	`format.rs:45-79`	`batch_formatters()` called twice per batch (sizing + rendering); acknowledged in doc comment, fine for CLI
Nit	`.gitignore`	`.zed` is IDE-specific — better in personal global gitignore
Question	`codec.rs:27`	Arrow is now unconditionally requested — should this be configurable via `ClientConfig`?
Question	`codec.rs:137-149`	`decode_arrow_ipc` eagerly collects all batches — fine for CLI, but worth noting if the transport crate gets reused

Verdict

The only substantive concern is the .expect() in batch_formatters — a malformed server response would crash the CLI instead of printing an error. Everything else is nits or forward-looking suggestions. Solid work overall.

claude · 2026-05-20T11:35:49Z

+    pub fn get_columns(&self) -> Vec<String> {
+        match &self.data {
+            ResultData::Legacy { columns, .. } => columns.clone(),
+            ResultData::Arrow { schema, .. } => {
+                schema.fields().iter().map(|f| f.name().clone()).collect()
+            }
+        }
    }


Nit (performance): get_columns() allocates a new Vec<String> on every call. In exec.rs it's called once, but in format.rs and codec.rs it's called in paths where the result is used immediately and then dropped.

For the Legacy variant this clones all strings unnecessarily — the caller just needs to read names and count them. Consider returning Cow<'_, [String]> or a dedicated iterator/slice wrapper so the Legacy path avoids the clone:

pub fn get_columns(&self) -> Cow<'_, Vec<String>> { match &self.data { ResultData::Legacy { columns, .. } => Cow::Borrowed(columns), ResultData::Arrow { schema, .. } => { Cow::Owned(schema.fields().iter().map(|f| f.name().clone()).collect()) } } }

Not a blocker — the column count is small — but worth noting for future optimization if this gets called in a hot path.

claude · 2026-05-20T11:35:54Z

+fn batch_formatters<'a>(
+    batch: &'a RecordBatch,
+    options: &'a FormatOptions,
+) -> Vec<ArrayFormatter<'a>> {
+    batch
+        .columns()
+        .iter()
+        .map(|col| ArrayFormatter::try_new(col.as_ref(), options))
+        .collect::<Result<_, _>>()
+        .expect("arrow ArrayFormatter init")
+}


Concern: .expect("arrow ArrayFormatter init") will panic if Arrow encounters an unsupported data type. While this is unlikely with standard CubeStore types, a malformed server response could crash the CLI rather than printing an error.

Consider propagating the error or falling back gracefully:

fn batch_formatters<'a>( batch: &'a RecordBatch, options: &'a FormatOptions, ) -> Option<Vec<ArrayFormatter<'a>>> { batch.columns() .iter() .map(|col| ArrayFormatter::try_new(col.as_ref(), options)) .collect::<Result<_, _>>() .ok() }

Then skip rendering that batch (or fall back to debug formatting) if None.

claude · 2026-05-20T11:35:58Z

+fn render_arrow_batches(columns: &[String], batches: &[RecordBatch]) -> String {
+    let ncols = columns.len();
+    let fmt_options = FormatOptions::default().with_display_error(true);
+    let mut widths = header_widths(columns);
+
+    for batch in batches {
+        let formatters = batch_formatters(batch, &fmt_options);
+        let batch_cols = formatters.len().min(ncols);
+        for row_idx in 0..batch.num_rows() {
+            for col_idx in 0..batch_cols {
+                let cell = arrow_cell(batch, &formatters, col_idx, row_idx);
+                observe_width(&mut widths, col_idx, cell_len(cell.as_deref()));
            }
        }
    }

    let mut out = String::new();
+    write_header(&mut out, columns, &widths);
+    write_separator(&mut out, &widths);
+    for batch in batches {
+        let formatters = batch_formatters(batch, &fmt_options);
+        let batch_cols = formatters.len().min(ncols);
+        for row_idx in 0..batch.num_rows() {
+            out.push('\n');
+            for (i, &width) in widths.iter().enumerate() {
+                let cell = if i < batch_cols {
+                    arrow_cell(batch, &formatters, i, row_idx)
+                } else {
+                    None
+                };
+                write_cell(&mut out, i, cell.as_deref(), width);
+            }
+        }
+    }
+    out


Nit: batch_formatters() is called twice per batch — once in the sizing pass and once in the rendering pass. The doc comment acknowledges the trade-off (CPU vs. not materializing all rows into strings), which is fair.

If this ever becomes a bottleneck with large result sets, you could compute formatters once per batch and store (batch, formatters) pairs, but the current approach is clean and the overhead is likely negligible for CLI output.

claude · 2026-05-20T11:36:05Z

+fn decode_arrow_ipc(bytes: &[u8]) -> Result<QueryResult, TransportError> {
+    let reader = StreamReader::try_new(Cursor::new(bytes), None)
+        .map_err(|e| TransportError::Protocol(format!("arrow IPC open: {e}")))?;
+
+    let schema = reader.schema();
+    let batches: Vec<RecordBatch> = reader
+        .collect::<Result<_, _>>()
+        .map_err(|e| TransportError::Protocol(format!("arrow IPC read batch: {e}")))?;
+
+    Ok(QueryResult {
+        data: ResultData::Arrow { schema, batches },
+    })
+}


Question: decode_arrow_ipc collects all batches into a Vec<RecordBatch> eagerly. For very large result sets this means the entire result is held in memory. This is fine for a CLI tool, but if this transport crate is intended to be reused by other consumers, a streaming approach (yielding batches one at a time) might be worth considering in the future.

Also: is there a size limit consideration? The StreamReader will happily read whatever the server sends. The WS layer already has a 256MiB frame cap (from actor.rs), so that provides an implicit ceiling, which seems sufficient.

codecov · 2026-05-20T11:42:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.93%. Comparing base (0fa472b) to head (712b25f).

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10915      +/-   ##
==========================================
- Coverage   83.53%   78.93%   -4.60%     
==========================================
  Files         254      470     +216     
  Lines       75844    92837   +16993     
  Branches        0     3449    +3449     
==========================================
+ Hits        63353    73279    +9926     
- Misses      12491    19054    +6563     
- Partials        0      504     +504

Flag	Coverage Δ
cube-backend	`58.44% <ø> (?)`
cubesql	`83.52% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

feat(cubestore-cli): Support Arrow format for responses

43dde7b

github-actions Bot added the rust Pull requests that update Rust code label May 20, 2026

vercel Bot deployed to Preview May 20, 2026 11:35 View deployment

claude Bot reviewed May 20, 2026

View reviewed changes

Comment thread rust/cube/.gitignore

claude Bot reviewed May 20, 2026

View reviewed changes

Comment thread rust/cube/cubestore-ws-transport/src/codec.rs

ovr added 3 commits May 20, 2026 14:52

chore: use result in tests

abd5c93

chore: drop useless comment

8875b57

chore: better errors

712b25f

waralexrom approved these changes May 20, 2026

View reviewed changes

ovr merged commit e4c8c43 into master May 20, 2026
91 of 93 checks passed

ovr deleted the feat/cubestore-cli-arrow-format branch May 20, 2026 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cubestore-cli): Support Arrow format for responses#10915

feat(cubestore-cli): Support Arrow format for responses#10915
ovr merged 4 commits into
masterfrom
feat/cubestore-cli-arrow-format

ovr commented May 20, 2026

Uh oh!

claude Bot commented May 20, 2026 •

edited

Loading

Uh oh!

claude Bot May 20, 2026

Uh oh!

claude Bot May 20, 2026

Uh oh!

claude Bot May 20, 2026

Uh oh!

Uh oh!

claude Bot May 20, 2026

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ovr commented May 20, 2026

Uh oh!

claude Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(cubestore-cli): Support Arrow format for responses

Summary

What's good

Issues found (see inline comments)

Verdict

Uh oh!

claude Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading