Skip to content

pyarrow-19.0.0 breaks unit test #1023

@timsaucer

Description

@timsaucer

Describe the bug

When running the unit tests with pyarrow-19.0.0 installed the test_write_compressed_parquet tests fail with an error:

libc++abi: terminating due to uncaught exception of type parquet::ParquetException: Repetition level histogram size mismatch

To Reproduce

Clone repo.
Initialize submodules.
Install pyarrow 19.
Run unit tests.
Downgrade to pyarrow 18
Run unit tests.

Expected behavior

This should pass. There has been no substantive changes in datafusion that caused this to happen.

The point of failure of this unit test is in the line that reads

metadata = pq.ParquetFile(tmp_path / file).metadata.to_dict()

Specifically, the pq.ParquetFile() command is what causes the error above.

Additional context

I am not sure if this is a problem in datafusion, parquet, or in pyarrow. It is unlikely the problem is in datafusion-python but this is where it's been identified so it would be worth tracking IMO.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions