-
Notifications
You must be signed in to change notification settings - Fork 371
Open
Description
Apache Iceberg version
None
Please describe the bug 🐞
Description
Any operation that rewrites data (delete/ overwrite/ upsert) is extremely slow or unresponsive, even on very small tables. Appending rows works as expected.
Steps to Reproduce
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType
import pyarrow as pa
# Create table
schema = Schema(
NestedField(field_id=1, name="load_id", field_type=StringType(), required=True),
NestedField(field_id=2, name="status", field_type=StringType(), required=True),
identifier_field_ids=[1],
)
catalog.create_table(identifier="test.test_load", schema=schema)
tbl = catalog.load_table("test.test_load")
df = pa.Table.from_pylist(
[
{"load_id": "123.123", "status": "started"},
{"load_id": "456.456", "status": "done"},
],
schema=tbl.schema().as_arrow(),
)
tbl.append(df)
# Delete with dot in string key → hangs indefinitely
tbl.delete(delete_filter="load_id == '123.123'")
Observed Behavior
- append: completes quickly.
- delete with string containing dot (e.g., "123.123"): runs indefinitely or takes 10+ minutes without returning a response.
- overwrite and upsert with string containing dot: same behavior — operation never completes or is extremely slow.
Environment
- PyIceberg version:
0.10.0
- Catalog:
REST
- Python version:
3.12
- Storage:
AWS S3 tables
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels