feat: add variant type support (Iceberg v3) - non shredding#932
feat: add variant type support (Iceberg v3) - non shredding#932nssalian wants to merge 5 commits intoapache:mainfrom
Conversation
laskoviymishka
left a comment
There was a problem hiding this comment.
Thanks! Solid contribution: pattern matches #594 and #605 cleanly, scope aligns with #929 (non-shredding only), integration test round-trips 5 variant shapes.
Mergeable after three small things:
- Fix unknownTypeValidator — error messages say "unknown type field" for Variant. Split the validator or generalize.
- Verify against the v3 spec whether Variant must be optional. Unknown must (means "no value"). Variant holds real data — PyIceberg allows required. Confirm and link the spec section.
- boundRef[[]byte] for Variant needs a comment explaining the invariant: it only works because ordered/equality predicates reject Variant upstream in createBoundLiteralPredicate / createBoundSetPredicate, so eval() is never called with a real variant value.
LGTM with those.
|
Thanks for taking the time to review @laskoviymishka. I'll wait for @zeroshade to have a look and then address the comments together. |
zeroshade
left a comment
There was a problem hiding this comment.
Thanks for this, in general this looks good. Just a bunch of nitpicks
|
Addressed all comments. Variant is no longer PrimitiveType. I added Variant() on all visitor interfaces. Replaced []byte with dedicated boundVariantRef/boundVariantUnaryPred. Split the validator so variant can be required per spec. colMapping skip now checks parent is actually a variant field. Extracted projectVariant. Added tests for unary predicates, nested types (struct/list/map), multi-variant schemas, and projection exclusion. |
There was a problem hiding this comment.
All three comments look addressed now, thanks!
The Variant/Unknown default error is clearer, the boundRef[[]byte] workaround is gone in favor of dedicated Variant refs, and the Parquet stats skip is now limited to Variant sub-columns with a spec comment.
LGTM from my side, i would like to get @zeroshade review before merging on this one.
Only tiny optional nit: maybe add a short spec link/comment for why required Variant is allowed, but I don’t think that should block this.
Part of #589 and #929
Changes
Add first-class support for the Iceberg v3 variant semi-structured type. The binary encoding is handled by Arrow-Go v18.5.2 (already in go.mod) via parquet/variant and arrow/extensions.VariantType - no new dependencies.
SchemaVisitorPerPrimitiveType[T]gainsVisitVariant() T(same pattern asVisitTimestampNs/VisitUnknownadditions in #594, #605).Non-shredded path only; shredding is a follow-up.
Testing
yielded
and reading
yielded