Skip to content

Feature request: manifest file can track deletion vector #1272

@dentiny

Description

@dentiny

Is your feature request related to a problem or challenge?

Hi team, this feature request is half a question on puffin / deletion vector progress, and half on feature request for manifest support.

As stated in the spec:

Delete manifests track deletion vectors individually by the containing file location (file_path), starting offset of the DV blob (content_offset), and total length of the blob (content_size_in_bytes). Multiple deletion vectors can be stored in the same file. There are no restrictions on the data files that can be referenced by deletion vectors in the same Puffin file.

My understanding is, in the manifest file, apart from data file tracking, there're records for puffin files, example:

{
  "snapshot_id": 4439194908709239593,
  "sequence_number": null,
  "file_sequence_number": null,
  "data_file": {
    "content": 0,
    "file_path": "file:///tmp/iceberg-test/default/test_table/data/iceberg-data-00000.parquet",
    "file_format": "PARQUET",
    ...,
  },
  "puffin_file": {
    "file_path": "file:///tmp/dir/puffin.bin",
    "file_format": "PUFFIN",
    "content": DELETION_VECTOR_TYPE,
    "content_offset": ...,
    "content_size_in_bytes": ...,
  }
}

I'm aware there's an epic about puffin progress, but I don't see any change on manifest side in the PRs.

Curious am I mis-understanding for the spec, is it already implemented but I'm not aware of, or we have plans to implement that in the future?

Thank you!

Describe the solution you'd like

No response

Willingness to contribute

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions