-
Notifications
You must be signed in to change notification settings - Fork 134
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When running the unit tests with pyarrow-19.0.0 installed the test_write_compressed_parquet tests fail with an error:
libc++abi: terminating due to uncaught exception of type parquet::ParquetException: Repetition level histogram size mismatch
To Reproduce
Clone repo.
Initialize submodules.
Install pyarrow 19.
Run unit tests.
Downgrade to pyarrow 18
Run unit tests.
Expected behavior
This should pass. There has been no substantive changes in datafusion that caused this to happen.
The point of failure of this unit test is in the line that reads
metadata = pq.ParquetFile(tmp_path / file).metadata.to_dict()Specifically, the pq.ParquetFile() command is what causes the error above.
Additional context
I am not sure if this is a problem in datafusion, parquet, or in pyarrow. It is unlikely the problem is in datafusion-python but this is where it's been identified so it would be worth tracking IMO.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working