Skip to content

feat(table): variant shredded writer #987

@laskoviymishka

Description

@laskoviymishka

Parent: #589

Once the shredded reader (#986) lands, the writer needs to decide what to shred per row and emit the shredded STRUCT alongside metadata / residual value. Java's posture: no shredding unless explicitly configured. Mirror that with a new table property write.variant.shredding-paths carrying a list of $.path.expressions — same property name as Java, parsed by the same property layer that already handles write.metadata.compression-codec etc.

Add table/internal/variant_shredder.go with ShredVariant(value variant.Value, schema ShreddingSchema) (typed any, residual []byte, err error), then hook it into the parquet writer in table/internal/parquet_files.go so shredded columns appear in the parquet schema and are populated per row. Statistics on shredded typed columns piggyback the existing column-stats path — typed columns get min/max for free, which is the substrate the follow-up stats issue builds on.

Cross-client coverage: write a shredded variant via iceberg-go, read it via Java/pyiceberg, assert equal to source. Round-trip via the iceberg-go reader is also required. Spec: Parquet Variant shredding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions