Skip to content

Fail to read V1 Iceberg Spec #1996

@yingying-chen-cko

Description

@yingying-chen-cko

Apache Iceberg version

0.9.0 (latest release)

Please describe the bug 🐞

I have an Iceberg V1 table and I am trying to load the table using pyiceberg.table.StaticTable:

from pyiceberg.table import StaticTable

t = StaticTable.from_metadata(
    "gs://<project_id>/<table-name>/metadata/v1.metadata.json"
)

When I run t.inspect.manifests(), it gives the following error

ResolveError: 504: added_files_count: required int is non-optional, and not part of the file schema

I believe this is because in pyiceberg.manifest.DEFAULT_READ_VERSION is set to 2 but my table is V1. So I patch this to manifest.DEFAULT_READ_VERSION = 1 and this gives me another error:

AttributeError: 'pyiceberg.manifest.ManifestFile' object has no attribute 'content'

I managed to resolve this error temporary by adding the content attribute to pyiceberg.manifest.MANIFEST_LIST_FILE_SCHEMAS[1]. And more errors are raised as I keep resolving:

For pyiceberg.manifest.MANIFEST_ENTRY_SCHEMAS[1]:

AttributeError: 'pyiceberg.manifest.ManifestEntry' object has no attribute 'sequence_number'

When running t.inspect.files(), there an error generated from pyiceberg.manifest.DATA_FILE_TYPE[1]:

AttributeError: 'pyiceberg.manifest.DataFile' object has no attribute 'content'

I am able to load the table after adding all the above missing attributes, but is there a way to parse V1 table or is this a bug when loading V1 table?

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions