Skip to content

as_arrow() fail on struct with ListType required #885

@raphaelauv

Description

@raphaelauv

Apache Iceberg version

0.6.1 (latest release)

Please describe the bug 🐞

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StructType, DoubleType, ListType, StringType
import polars as pl

data = [{"score": 1.2, "name": "hello"}, {"score": 1.4, "name": "hello"}]
iceberg_schema = Schema(
    NestedField(
        1,
        "a",
        ListType(
            element_id=2,
            element=StructType(
                NestedField(3, "score", DoubleType(), required=True),
                NestedField(4, ",name", StringType(), required=True),
            ),
            element_required=True,
        ),
        required=True,
    ),
)

df = pl.DataFrame({}).with_columns([pl.lit(data).alias("a")])
df = df.to_arrow()

rst = df.cast(target_schema=iceberg_schema.as_arrow())

give

    rst = df.cast(target_schema=iceberg_schema.as_arrow())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 4457, in pyarrow.lib.Table.cast
  File "pyarrow/table.pxi", line 574, in pyarrow.lib.ChunkedArray.cast
  File "/XXXXX/venv/lib/python3.11/site-packages/pyarrow/compute.py", line 404, in cast
    return call_function("cast", [arr], options, memory_pool)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_compute.pyx", line 590, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 385, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: cannot cast nullable field to non-nullable field: struct<score: double, name: large_string> struct<score: double not null, ,name: large_string not null>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions