Skip to content

[Variant] Preserve UUID extension type metadata for Parquet writer#10015

Draft
sdf-jkl wants to merge 1 commit into
apache:mainfrom
sdf-jkl:shred-preserve-uuid
Draft

[Variant] Preserve UUID extension type metadata for Parquet writer#10015
sdf-jkl wants to merge 1 commit into
apache:mainfrom
sdf-jkl:shred-preserve-uuid

Conversation

@sdf-jkl
Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl commented May 24, 2026

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

Shredding into FixedSizeBinary(16) means we're shredding into UUID Parquet logical type. shred_variant currently doesn't preserve extension type metadata for the typed value field.

UUID is the only valid Variant shredding type that requires an arrow extension type. https://github.com/apache/parquet-format/blob/master/VariantShredding.md

Earlier in # @scovich mentioned:

Yeah, as long as shred_variant only takes a DataType instead of a Field, we are forced to assume 16-byte fixed binary is UUID. If it accepted a Field, we should additionally require the UUID extension type. Otherwise, we potentially run into problems because Decimal128 can also use 16-byte fixed binary!

This is an argument proposing to use Field instead of DataType for as_type parameter in shred_variant. This should not be an issue because arrow has a Decimal128Type to represent Decimal128 logical Parquet type. This way there's no ambiguity in using FixedSizeBinary(16) arrow type to represent UUID. Switching as_type to Field is unnecessary.

What changes are included in this PR?

  • VariantArray::from_parts/ShreddedVariantFieldArray::from_parts now add UUID extension type metadata to the typed_value Field if DataType is FixedSizeBinary(16)
  • Uncommented UUID extension part metadata validation in a unit test.

Are these changes tested?

  • Yes, unit test.

Are there any user-facing changes?

  • Shredded UUID typed value fields now preserve UUID extension type metadata.

@github-actions github-actions Bot added the parquet-variant parquet-variant* crates label May 24, 2026
@sdf-jkl sdf-jkl marked this pull request as draft May 24, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant