feat(parquet): add sparse-column writer benchmarks by HippoBaro · Pull Request #9654 · apache/arrow-rs

HippoBaro · 2026-04-02T04:26:49Z

Which issue does this PR close?

None, but relates to Parquet: level encoding cost should be proportional to RLE output size #9652

Rationale for this change

Measure sparse and all-null cases in benchmarks.

What changes are included in this PR?

Add three new benchmark cases to the arrow_writer benchmark suite for evaluating write performance on sparse and all-null data:

primitive_sparse_99pct_null: a flat primitive column with 99% nulls, exercising long RLE runs in definition levels.
list_primitive_sparse_99pct_null: a list-of-primitive column with 99% nulls, exercising null batching in the list level builder.
primitive_all_null: a flat primitive column with 100% nulls, exercising the uniform_levels fast path for entirely-null columns.

Are these changes tested?

N/A

Are there any user-facing changes?

None.

Add three new benchmark cases to the arrow_writer benchmark suite for evaluating write performance on sparse and all-null data: - `primitive_sparse_99pct_null`: a flat primitive column with 99% nulls, exercising long RLE runs in definition levels. - `list_primitive_sparse_99pct_null`: a list-of-primitive column with 99% nulls, exercising null batching in the list level builder. - `primitive_all_null`: a flat primitive column with 100% nulls, exercising the uniform_levels fast path for entirely-null columns. Baseline results (Apple M1 Max): primitive_sparse_99pct_null/default 40.3 ms primitive_sparse_99pct_null/parquet_2 43.5 ms primitive_sparse_99pct_null/zstd_parquet_2 44.4 ms list_primitive_sparse_99pct_null/default 39.9 ms list_primitive_sparse_99pct_null/parquet_2 39.9 ms list_primitive_sparse_99pct_null/zstd_p2 40.7 ms primitive_all_null/default 38.0 ms primitive_all_null/parquet_2 36.9 ms primitive_all_null/zstd_parquet_2 36.1 ms Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

etseidl

LGTM

etseidl · 2026-04-02T22:27:02Z

I'm holding off merging in case we want to add more cases.

github-actions bot added the parquet Changes to the parquet crate label Apr 2, 2026

HippoBaro mentioned this pull request Apr 2, 2026

feat(parquet): fuse level encoding passes and batch null runs in column writer #9653

Open

etseidl approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parquet): add sparse-column writer benchmarks#9654

feat(parquet): add sparse-column writer benchmarks#9654
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:sparse_benchmarks

HippoBaro commented Apr 2, 2026 •

edited by alamb

Loading

Uh oh!

etseidl left a comment

Uh oh!

etseidl commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HippoBaro commented Apr 2, 2026 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

etseidl left a comment

Choose a reason for hiding this comment

Uh oh!

etseidl commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HippoBaro commented Apr 2, 2026 •

edited by alamb

Loading