-
Notifications
You must be signed in to change notification settings - Fork 358
Closed
Description
Apache Iceberg version
0.7.1 (latest release)
Please describe the bug 🐞
Noticing some fairly weird behaviour with pyiceberg - with the same exact code being run across different versions of the API, we're seeing different counts returned. Have tried this with athena, and can confirm that the 0.6.1 count is the correct one. Any ideas on where to look when debugging this?
Can confirm that the .plan_files() and delete_files is identical across the two versions.
import pyiceberg
print(pyiceberg.__version__)
from pyiceberg import catalog as pyi_catalog
catalog = pyi_catalog.load_catalog(name="default", type="glue")
table = catalog.load_table("ml_recommendations.users_v2")
scan = table.scan(
row_filter=kwargs["row_filter"]
)
df_users = scan.to_duckdb("users")
df_users.sql("SELECT count(*) FROM users")
>> 0.6.1
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 6700635 │
└──────────────┘
>> 0.7.1
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 1973154 │
└──────────────┘
Metadata
Metadata
Assignees
Labels
No labels