feat: add custom Avro OCF reader for manifest parsing with filtered decoding by JingsongLi · Pull Request #281 · apache/paimon-rust

JingsongLi · 2026-04-23T09:58:34Z

Purpose

Replace apache-avro's read path (Value intermediate representation) with a zero-copy custom decoder that reads Avro binary directly into target structs. Key optimizations:

Custom AvroCursor for zero-copy Avro binary primitive decoding
OCF parser with snappy/deflate/zstd decompression support
Writer schema parsing with field index mapping for compatibility
Two-pass filtered ManifestEntry decoding: lightweight fields first, skip expensive DataFileMeta for filtered-out entries
SharedSchemaCache for cross-task schema reuse
Zero-copy entry consumption (into_identifier, into_parts)
Parallel base+delta manifest list reads via futures::try_join!
Remove redundant exists() checks before file reads

Benchmark for TableScan:

Full Scan: consistent ~4x speedup across all sizes. Each manifest has ~22,369 entries at ~7.66 MB. The custom Avro OCF decoder with zero-copy cursor and two-pass filtered decoding is paying off nicely.
Partial Scan: partition filtering scenarios is highly significant — 14.2x.

Brief change log

Tests

API and Format

Documentation

…ecoding Replace apache-avro's read path (Value intermediate representation) with a zero-copy custom decoder that reads Avro binary directly into target structs. Key optimizations: - Custom AvroCursor for zero-copy Avro binary primitive decoding - OCF parser with snappy/deflate/zstd decompression support - Writer schema parsing with field index mapping for compatibility - Two-pass filtered ManifestEntry decoding: lightweight fields first, skip expensive DataFileMeta for filtered-out entries - SharedSchemaCache for cross-task schema reuse - Zero-copy entry consumption (into_identifier, into_parts) - Parallel base+delta manifest list reads via futures::try_join! - Remove redundant exists() checks before file reads

XiaoHongbo-Hope · 2026-04-24T09:53:01Z

+                source: None,
+            });
+        }
+    };


This doesn't handle deflate. I think we can support later, not blocker.

XiaoHongbo-Hope · 2026-04-24T09:53:20Z

+1

luoyuxia

Have a quick review for this pr. Not found any blocking problem. A nice idea to optimize. LGTM!

JingsongLi closed this Apr 23, 2026

JingsongLi reopened this Apr 23, 2026

JingsongLi force-pushed the avro_super_fast branch from 98e6cc2 to 08b4576 Compare April 23, 2026 14:15

XiaoHongbo-Hope reviewed Apr 24, 2026

View reviewed changes

XiaoHongbo-Hope approved these changes Apr 24, 2026

View reviewed changes

luoyuxia approved these changes Apr 24, 2026

View reviewed changes

JingsongLi merged commit 553e4a3 into apache:main Apr 25, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add custom Avro OCF reader for manifest parsing with filtered decoding#281

feat: add custom Avro OCF reader for manifest parsing with filtered decoding#281
JingsongLi merged 1 commit intoapache:mainfrom
JingsongLi:avro_super_fast

JingsongLi commented Apr 23, 2026 •

edited

Loading

Uh oh!

XiaoHongbo-Hope Apr 24, 2026

Uh oh!

XiaoHongbo-Hope commented Apr 24, 2026

Uh oh!

luoyuxia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JingsongLi commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

XiaoHongbo-Hope Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope commented Apr 24, 2026

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JingsongLi commented Apr 23, 2026 •

edited

Loading