Skip to content

[Variant] When possible, cast perfectly shredded children#9862

Open
AdamGS wants to merge 2 commits into
apache:mainfrom
AdamGS:adamg/perfect-shredding-with-cast
Open

[Variant] When possible, cast perfectly shredded children#9862
AdamGS wants to merge 2 commits into
apache:mainfrom
AdamGS:adamg/perfect-shredding-with-cast

Conversation

@AdamGS
Copy link
Copy Markdown
Contributor

@AdamGS AdamGS commented Apr 30, 2026

Which issue does this PR close?

Rationale for this change

For cases we can just cast the perfectly shredded array, this is a significant performance boost, we do that only for a subset of types where arrow's casting matches the existing cast behavior in parquet-variant-compute.

One issue here is that the casting behavior is slightly different, but that seems aligned with #8982 and other work.

What changes are included in this PR?

If an array is perfectly shredded AND can be cast according to the existing cast semantics, we cast it instead of falling back to the row builder.

Are these changes tested?

In addition to existing tests, added an additional test to make sure it works.

Are there any user-facing changes?

None.

@AdamGS AdamGS force-pushed the adamg/perfect-shredding-with-cast branch from 2de441e to 23b785d Compare April 30, 2026 15:44
@github-actions github-actions Bot added the parquet-variant parquet-variant* crates label Apr 30, 2026
@AdamGS AdamGS changed the title [Variant] When possible, cast perfectly shredded children if [Variant] When possible, cast perfectly shredded children when possible Apr 30, 2026
@AdamGS AdamGS force-pushed the adamg/perfect-shredding-with-cast branch from 23b785d to e154a22 Compare April 30, 2026 15:55
@AdamGS
Copy link
Copy Markdown
Contributor Author

AdamGS commented Apr 30, 2026

Ok seems like some tests are not fully deterministic and I missed that.

@AdamGS AdamGS force-pushed the adamg/perfect-shredding-with-cast branch from e154a22 to fb87e3d Compare May 11, 2026 15:55
@AdamGS AdamGS changed the title [Variant] When possible, cast perfectly shredded children when possible [Variant] When possible, cast perfectly shredded children May 15, 2026
@AdamGS AdamGS force-pushed the adamg/perfect-shredding-with-cast branch 2 times, most recently from b6e9d4c to 11538b5 Compare May 15, 2026 15:47
AdamGS added 2 commits May 19, 2026 12:21
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS force-pushed the adamg/perfect-shredding-with-cast branch from 11538b5 to e01c2da Compare May 19, 2026 11:32
}

None
match from_type {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this overlaps with can_cast_types? can we simplify this


#[test]
fn test_perfect_shredding_list_cast_gate_uses_variant_element_semantics() {
let int64_item = Arc::new(Field::new("item", DataType::Int64, true));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add more tests to cover the logic of can_use_perfect_shredding_arrow_cast here?

variant_array: &VariantArray,
as_field: &Field,
cast_options: &CastOptions,
) -> Result<Option<ArrayRef>> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the Result in the return value? Is it ok to use Option<arrayRef> here?


// Use Arrow's vectorized cast when it cleanly matches the shredded representation. If not,
// fall back to row-wise extraction to preserve the existing variant-specific semantics.
Ok(cast_with_options(target_array.as_ref(), as_field.data_type(), cast_options).ok())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that cast_with_options throws some error if the return value here is None? do we need to add some tests to cover this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants