Fix parquet compression handling in aws_s3 sink by sundaresanr · Pull Request #37 · ExaForce/vector

sundaresanr · 2026-04-21T03:24:10Z

Parquet sinks were producing unreadable files unless users explicitly set
compression = none due to two issues:

Sink-level compression (default: gzip) wrapped parquet output,
causing S3 objects to start with gzip magic bytes instead of PAR1.
This led to ingestion failures in systems like DuckDB and Snowflake.
Parquet writer always used UNCOMPRESSED, and the parquet crate was
built without compression features (snappy, flate2, zstd), causing
runtime panics when other compression types were attempted.

Fix:

For codec = parquet, pass sink-level compression into parquet
WriterProperties (internal compression).
Force transport-layer compression to None to avoid double wrapping.
Leave behavior unchanged for non-parquet codecs.

Compression mapping:

None -> UNCOMPRESSED
Snappy -> SNAPPY
Gzip -> GZIP
Zstd -> ZSTD
Zlib -> rejected at build time (no parquet equivalent)

Two bugs caused parquet sinks to emit unreadable files unless users explicitly set compression: none: 1. Sink-level `compression` (default gzip) wrapped the parquet bytes, so the S3 object started with gzip magic instead of PAR1 — DuckDB and Snowflake rejected it. 2. Parquet internal compression was hardcoded to UNCOMPRESSED, and the parquet crate was built without snap/flate2/zstd features so even trying SNAPPY would panic at runtime. Fix: when codec=parquet, feed the sink-level compression into the parquet writer's WriterProperties and force transport-layer compression to None. Non-parquet codecs are unaffected. Vector Compression -> parquet: None -> UNCOMPRESSED, Snappy -> SNAPPY, Gzip -> GZIP, Zstd -> ZSTD. Zlib rejected at build time (no parquet equivalent). Also fixes a broken test import (vrl::value::btreemap was made private) so the parquet test module actually compiles.

sundaresanr force-pushed the parquet-compression branch from 62ac478 to 5825924 Compare April 21, 2026 03:38

LotharKAtt approved these changes Apr 21, 2026

View reviewed changes

sundaresanr merged commit 2626c73 into v0.54.0-exaforce Apr 21, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parquet compression handling in aws_s3 sink#37

Fix parquet compression handling in aws_s3 sink#37
sundaresanr merged 1 commit intov0.54.0-exaforcefrom
parquet-compression

sundaresanr commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sundaresanr commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sundaresanr commented Apr 21, 2026 •

edited

Loading