-
Notifications
You must be signed in to change notification settings - Fork 291
Description
Is your feature request related to a problem or challenge?
Hi team, this feature request is half a question on puffin / deletion vector progress, and half on feature request for manifest support.
As stated in the spec:
Delete manifests track deletion vectors individually by the containing file location (file_path), starting offset of the DV blob (content_offset), and total length of the blob (content_size_in_bytes). Multiple deletion vectors can be stored in the same file. There are no restrictions on the data files that can be referenced by deletion vectors in the same Puffin file.
My understanding is, in the manifest file, apart from data file tracking, there're records for puffin files, example:
{
"snapshot_id": 4439194908709239593,
"sequence_number": null,
"file_sequence_number": null,
"data_file": {
"content": 0,
"file_path": "file:///tmp/iceberg-test/default/test_table/data/iceberg-data-00000.parquet",
"file_format": "PARQUET",
...,
},
"puffin_file": {
"file_path": "file:///tmp/dir/puffin.bin",
"file_format": "PUFFIN",
"content": DELETION_VECTOR_TYPE,
"content_offset": ...,
"content_size_in_bytes": ...,
}
}
I'm aware there's an epic about puffin progress, but I don't see any change on manifest side in the PRs.
Curious am I mis-understanding for the spec, is it already implemented but I'm not aware of, or we have plans to implement that in the future?
Thank you!
Describe the solution you'd like
No response
Willingness to contribute
None