Skip to content

Core: Implement LZ4 frame compression for Puffin format#16054

Open
laserninja wants to merge 3 commits into
apache:mainfrom
laserninja:fix/16033-puffin-lz4-compression
Open

Core: Implement LZ4 frame compression for Puffin format#16054
laserninja wants to merge 3 commits into
apache:mainfrom
laserninja:fix/16033-puffin-lz4-compression

Conversation

@laserninja
Copy link
Copy Markdown
Contributor

What

Implements LZ4 frame compression and decompression in PuffinFormat, fixing the UnsupportedOperationException thrown when attempting to use LZ4 compression (which is the default for Puffin footer compression).

Why

The compress() and decompress() methods in PuffinFormat had TODO stubs for the LZ4 case that fell through to throw new UnsupportedOperationException("Unsupported codec: lz4"). This made footer compression — which defaults to LZ4 per the Puffin spec — unusable at runtime.

How

  • Added lz4-java (net.jpountz.lz4) as a dependency to iceberg-core. The library was already defined in the version catalog (gradle/libs.versions.toml) but was not wired as a dependency.
  • Implemented compressLz4() using LZ4FrameOutputStream with CONTENT_SIZE and BLOCK_INDEPENDENCE flags, conforming to the Puffin spec requirement of "LZ4 single compression frame with content size present".
  • Implemented decompressLz4() using LZ4FrameInputStream.
  • The existing aircompressor library only provides block-level LZ4, not the frame-level compression required by the Puffin spec. lz4-java provides the necessary frame format support.

Testing

  • Added round-trip tests in TestPuffinFormat for both non-empty and empty data.
  • Updated testEmptyFooterCompressed in TestPuffinWriter — previously asserted UnsupportedOperationException, now verifies successful LZ4 footer compression and read-back.
  • Added testWriteAndReadMetricDataCompressedLz4 in TestPuffinWriter for full write + read verification with LZ4-compressed blobs.

All existing Puffin tests continue to pass.

Fixes #16033

@laserninja laserninja marked this pull request as draft April 20, 2026 05:30
@nastra nastra self-requested a review April 21, 2026 15:51
@github-actions github-actions Bot added the spark label Apr 24, 2026
@laserninja laserninja force-pushed the fix/16033-puffin-lz4-compression branch from 9062543 to 5ad0ae5 Compare April 24, 2026 23:31
Implement LZ4 frame compression and decompression in PuffinFormat
using lz4-java (net.jpountz.lz4), which was already defined in the
version catalog but not wired as a dependency of iceberg-core.

The Puffin spec requires LZ4 single compression frame with content
size present. The existing aircompressor library only provides
block-level LZ4, not frame-level, so lz4-java's LZ4FrameOutputStream
and LZ4FrameInputStream are used instead.

Previously, compress() and decompress() had TODO stubs for LZ4 that
threw UnsupportedOperationException at runtime, making footer
compression (which defaults to LZ4) unusable.

Fixes apache#16033
Add lz4-java dependency to runtime-deps.txt baselines for all runtime
modules (Spark v3.4/v3.5/v4.0, Flink v1.20/v2.0/v2.1, Kafka Connect).
Update corresponding LICENSE files with lz4-java attribution.
@laserninja laserninja force-pushed the fix/16033-puffin-lz4-compression branch from 5ad0ae5 to 3895aaa Compare April 24, 2026 23:43
@laserninja laserninja marked this pull request as ready for review April 25, 2026 19:24
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Core: Puffin LZ4 footer compression throws UnsupportedOperationException at runtime

1 participant