Skip to content

Error creating table from pyarrow schema with pa.uuid() #1986

Open
@simw

Description

@simw

Apache Iceberg version

0.9.0 (latest release)

Please describe the bug 🐞

Preamble: using a local sqlite db:

from pyiceberg.catalog import load_catalog

warehouse_path = "data/warehouse"
catalog = load_catalog(
    "default",
    **{
        'type': 'sql',
        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
        "warehouse": f"file://{warehouse_path}",
    },
)

A pyiceberg UUID column works fine:

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, UUIDType

schema = Schema(
    NestedField(field_id=1, name="uuid", field_type=UUIDType(), required=False),
)

catalog.create_table("default.test2", schema=schema)

But a pyarrow UUID column gives an error:

import pyarrow as pa

schema = pa.schema([pa.field("foo", pa.uuid(), nullable=True)])

catalog.create_table("default.test4", schema=schema)

The exception is:

File ~/Code/Projects/others/icebergs/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py:1032, in _(obj, visitor)
   1030     result = visit_pyarrow(field_type, visitor)
   1031 except TypeError as e:
-> 1032     raise UnsupportedPyArrowTypeException(obj, f"Column '{obj.name}' has an unsupported type: {field_type}") from e
   1033 visitor.after_field(obj)
   1035 return visitor.field(obj, result)

UnsupportedPyArrowTypeException: Column 'foo' has an unsupported type: extension<arrow.uuid>

Related to simw/pydantic-to-pyarrow#27

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions