-
Notifications
You must be signed in to change notification settings - Fork 344
Open
Description
Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Hi, thanks for writing pyiceberg
.
The bug is pretty much described in the title: table.scan(row_filter="x IN (0, 1)")
does not include the values for which x=0
when x
is a DoubleType
and a partition column.
Here is a reproducer:
pip install pyiceberg[sql-sqlite,pyarrow]
from pathlib import Path
from tempfile import TemporaryDirectory
import pyarrow
from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.transforms import IdentityTransform
from pyiceberg.types import DoubleType, NestedField
from pyiceberg.partitioning import PartitionSpec, PartitionField
schema = Schema(
NestedField(field_id=1, name="x", field_type=DoubleType()),
NestedField(field_id=2, name="y", field_type=DoubleType()),
)
partition_spec = PartitionSpec(PartitionField(source_id=1, field_id=1001, transform=IdentityTransform(), name="x"))
with TemporaryDirectory() as tmpdir:
catalog = SqlCatalog(
"local",
uri=f"sqlite:///{tmpdir}/catalog.db",
warehouse=f"file://{tmpdir}/warehouse",
)
catalog.create_namespace("test")
table = catalog.create_table(
"test.test", schema=schema, partition_spec=partition_spec
)
data = pyarrow.table(
{
"x": [0.0, 1.0, 2.0],
"y": [0.0, 0.0, 0.0],
}
)
table.overwrite(data)
print("=== no filter ===")
print(table.scan().to_arrow())
print("=== x IN (0) ===")
print(table.scan(row_filter="x IN (0)").to_arrow())
print("=== x IN (0, 1, 2) ===")
print(table.scan(row_filter="x IN (0, 1, 2)").to_arrow())
Output:
/tmp/tmp.l2MLQFjC7C-05duO9h5/lib/python3.13/site-packages/pyiceberg/table/__init__.py:686: UserWarning: Delete operation did not match any records
warnings.warn("Delete operation did not match any records")
=== no filter ===
pyarrow.Table
x: double
y: double
----
x: [[0],[1],[2]]
y: [[0],[0],[0]]
=== x IN (0) ===
pyarrow.Table
x: double
y: double
----
x: [[0]]
y: [[0]]
=== x IN (0, 1, 2) ===
pyarrow.Table
x: double
y: double
----
x: [[1],[2]]
y: [[0],[0]]
I expect output for x in (0, 1, 2)
to match that of the no filter
scan.
Note that I could not reproduce when x
is a LongType
instead of a DoubleType
.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels