Skip to content

feat: add Vortex columnar file format support as optional feature#260

Merged
XiaoHongbo-Hope merged 5 commits into
apache:mainfrom
JingsongLi:vortex
Apr 20, 2026
Merged

feat: add Vortex columnar file format support as optional feature#260
XiaoHongbo-Hope merged 5 commits into
apache:mainfrom
JingsongLi:vortex

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

Purpose

Add read/write support for the Vortex file format behind a vortex feature flag. The writer uses a background task with kanal channel for streaming writes, and the reader supports predicate pushdown and row selection. Also introduces a configurable file.format option (default "parquet") to replace hardcoded file extensions across all writer paths.

Brief change log

Tests

API and Format

Documentation

JingsongLi and others added 2 commits April 18, 2026 09:08
Add read/write support for the Vortex file format behind a `vortex` feature flag.
The writer uses a background task with kanal channel for streaming writes, and the
reader supports predicate pushdown and row selection. Also introduces a configurable
`file.format` option (default "parquet") to replace hardcoded file extensions across
all writer paths.
Remove eprintln! warning from VortexFormatWriter Drop impl, and update
stale "Open parquet writer" comment in kv_file_writer.rs to be format-agnostic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
})?;

// Apply column projection.
if !projected_names.is_empty() {
Copy link
Copy Markdown
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFusion can push projection=[] for SELECT COUNT(*). Here empty projection is treated as no projection, so Vortex still returns actual columns, and RecordBatch validation later fails against the empty requested schema.

let batch = RecordBatch::try_new_with_options(
target_schema,
vec![],
&arrow_array::RecordBatchOptions::new().with_row_count(Some(row_count)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early return ignores predicate filters for empty-projection reads. Empty-projection reads can return the full file
row count instead of the filtered row count.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider row_selection. Predicates are inherently designed for coarse-grained filtering; unlike in Python, the returned data is not filtered on a per-row basis.

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

+1

@XiaoHongbo-Hope XiaoHongbo-Hope merged commit 7b1008b into apache:main Apr 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants