Skip to content

[epic] address manifest reader feature gaps between rust and python implementations #1714

@kevinjqliu

Description

@kevinjqliu

What's the feature are you trying to implement?

See apache/iceberg-python#2004 for the integration; pyiceberg using rust-based manifest reader

Heres the error log from make integration, grouped by error type:
https://gist.github.com/kevinjqliu/db6352f0b6d0ab8a717af67a1b71355e

  • Convert raw literal (bytes) to binary type
    • "pyo3_runtime.PanicException: called Result::unwrap() on an Err value: DataInvalid => Unable to convert raw literal (bytes) fail convert to type binary for: todo: rust avro doesn't support deserialize any bytes representation now"
  • Convert raw literal (bytes) to decimal(5,2) type
    • "pyo3_runtime.PanicException: called Result::unwrap() on an Err value: DataInvalid => Unable to convert raw literal (bytes) fail convert to type decimal(5,2) for: todo: rust avro doesn't support deserialize any bytes representation now"
  • partition field with special string characters, special#string+field
  • partition field with uuid
  • V3 manifests
    • Fail to parse format version in manifest metadata
  • files metadata table lower_bounds
    • tests/integration/test_inspect_table.py::test_inspect_files[2] - AssertionError: Difference in column lower_bounds: {} != {2147483546: b's3://warehouse/default/table_metadata_files/data/00000-0-f5c93fd4-42af-481f-bcc0-140fad66f25a.parquet', 2147483545: b'\x00\x00\x00\x00\x00\x00\x00\x00'}
  • manifest file content after merge
    • tests/integration/test_writes/test_writes.py::test_merge_manifests_file_content[2] - AssertionError: assert [(2, 78), (4,...(8, 118), ...] == [(1, 49), (2,... (6, 94), ...]
  • equality_ids can be optional (fixed by refactor: Move equality-ids closer to the spec #1705)
  • uuid support (fixed by Add UUID support for the Avro schema #1706)
  • enable zstd (fixed by feat: Enable zstd #1692)

Willingness to contribute

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions