Skip to content

Conversation

@jhorstmann
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Improve the performance benchmark coverage.

What changes are included in this PR?

Are these changes tested?

Benchmark results on my machine, without target specific optimizations:

arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs
                        time:   [8.5123 µs 8.5218 µs 8.5321 µs]

arrow_array_reader/Int32Array/plain encoded, optional, no NULLs
                        time:   [12.843 µs 12.863 µs 12.885 µs]

arrow_array_reader/Int32Array/plain encoded, optional, half NULLs
                        time:   [64.977 µs 65.108 µs 65.267 µs]

arrow_array_reader/struct/Int32Array/plain encoded, mandatory struct, optional data, no NULLs
                        time:   [14.456 µs 14.473 µs 14.491 µs]

arrow_array_reader/struct/Int32Array/plain encoded, optional struct, optional data, no NULLs
                        time:   [173.14 µs 173.68 µs 174.23 µs]

arrow_array_reader/struct/Int32Array/plain encoded, mandatory struct, optional data, half NULLs
                        time:   [65.636 µs 65.780 µs 65.906 µs]

arrow_array_reader/struct/Int32Array/plain encoded, optional struct, optional data, half NULLs
                        time:   [433.95 µs 434.29 µs 434.67 µs]

I think these show that a mandatory struct has negligible overhead compared to a non-nested primitive column. But an optional struct, even if its definition level is always valid, has a significant overhead.

Are there any user-facing changes?

no

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jan 17, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @jhorstmann

I also verified that the benchmark shows non trivial time being spent in definition levels

I can't wait to see what else is coming

@alamb alamb merged commit 9c6065c into apache:main Jan 18, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] Add benchmarks for reading struct arrays from parquet

2 participants