Skip to content

Add _file and _pos metadata columns to Avro readers #1019

@rdblue

Description

@rdblue

Position-based delete files encode row deletes as a file and ordinal row position within that file. To write position-based delete files for Avro, Iceberg should produce the file and position for a given row as metadata columns.

The file metadata column is a constant and can reuse the id-to-constant map that sets identity-partition values.

The position metadata column will need to keep a counter from the first row in the file (starting at 0). When reading an Avro split that doesn't start at offset 0, the Avro reader will need to scan through Avro blocks from the start of the file. Each Avro block contains a long count of objects in the block and a long size of the compressed block in bytes. Reconstructing the position of a row will require reading each block header to count the bytes in blocks before the start of a split.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions