Skip to content

feat(parquet): add sparse-column writer benchmarks#9654

Open
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:sparse_benchmarks
Open

feat(parquet): add sparse-column writer benchmarks#9654
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:sparse_benchmarks

Conversation

@HippoBaro
Copy link
Copy Markdown
Contributor

@HippoBaro HippoBaro commented Apr 2, 2026

Which issue does this PR close?

Rationale for this change

Measure sparse and all-null cases in benchmarks.

What changes are included in this PR?

Add three new benchmark cases to the arrow_writer benchmark suite for evaluating write performance on sparse and all-null data:

  • primitive_sparse_99pct_null: a flat primitive column with 99% nulls, exercising long RLE runs in definition levels.
  • list_primitive_sparse_99pct_null: a list-of-primitive column with 99% nulls, exercising null batching in the list level builder.
  • primitive_all_null: a flat primitive column with 100% nulls, exercising the uniform_levels fast path for entirely-null columns.

Are these changes tested?

N/A

Are there any user-facing changes?

None.

Add three new benchmark cases to the arrow_writer benchmark suite for
evaluating write performance on sparse and all-null data:

- `primitive_sparse_99pct_null`: a flat primitive column with 99% nulls,
  exercising long RLE runs in definition levels.
- `list_primitive_sparse_99pct_null`: a list-of-primitive column with
  99% nulls, exercising null batching in the list level builder.
- `primitive_all_null`: a flat primitive column with 100% nulls,
  exercising the uniform_levels fast path for entirely-null columns.

Baseline results (Apple M1 Max):

  primitive_sparse_99pct_null/default          40.3 ms
  primitive_sparse_99pct_null/parquet_2        43.5 ms
  primitive_sparse_99pct_null/zstd_parquet_2   44.4 ms
  list_primitive_sparse_99pct_null/default     39.9 ms
  list_primitive_sparse_99pct_null/parquet_2   39.9 ms
  list_primitive_sparse_99pct_null/zstd_p2     40.7 ms
  primitive_all_null/default                   38.0 ms
  primitive_all_null/parquet_2                 36.9 ms
  primitive_all_null/zstd_parquet_2            36.1 ms

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Apr 2, 2026

I'm holding off merging in case we want to add more cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants