-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Position-based delete files encode row deletes as a file and ordinal row position within that file. To write position-based delete files for Avro, Iceberg should produce the file and position for a given row as metadata columns.
The file metadata column is a constant and can reuse the id-to-constant map that sets identity-partition values.
The position metadata column will need to keep a counter from the first row in the file (starting at 0). When reading an Avro split that doesn't start at offset 0, the Avro reader will need to scan through Avro blocks from the start of the file. Each Avro block contains a long count of objects in the block and a long size of the compressed block in bytes. Reconstructing the position of a row will require reading each block header to count the bytes in blocks before the start of a split.