New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpin pyarrow in CI #6754
Comments
cc @dask/io for visibility |
I'll start looking into this - Seems that a new pyarrow package was released. |
Just starting to look into this now - Sorry for the delay. It looks like pyarrow-2.0 deprecated the pyarrow.FileSystem API, so we can no longer check that the fsspec file-system object is an instance of |
The two tests shown above can be updated. It's actually a change in fsspec (but triggered by pyarrow 2.0 release) that fsspec will no longer inherit from the indeed deprecated python pyarrow.filesystem objects (see fsspec/filesystem_spec#411). In pyarrow 2.0, all relevant code paths should now also accept plain fsspec filesystems, and thus subclassing from the pyarrow.filesystem is no longer needed. |
Agreed - The failure in those particular tests makes perfect sense. Unfortunately, there are 22 tests failing, and some of them seem to go beyond type checking (from a high-level glance). |
The other failures in the parquet tests, I cannot reproduce locally (I have some others though, related to categorical dtype). We also run the dask tests nightly on arrow's CI, where we don't see any failures |
Thanks - That is good to know. I also cannot reproduce the failures locally (they may be Windows specific) |
Ah, I missed the fact that they're on windows. Yes, it's certainly likely that it is windows specific. We had regularly path issues on windows, it might be we missed some |
I could reproduce the test failures on Windows, and diagnose the issue: https://issues.apache.org/jira/browse/ARROW-10462 See the arrow issue for the longer story, but in short: because of switching fsspec's LocalFileSystem (passed by dask to ParquetDataset) with our own LocalFileSystem, we were generating dask/dask/dataframe/io/parquet/arrow.py Lines 568 to 573 in 48a4d4a
When creating the
So you see the mixture of We can fix that on the pyarrow side (and hopefully put that in a 2.0.1 release). |
(fsspec does have a function in the LocalFileSystem module for making paths posixy) |
Yep, and that's what dask is using to pass a base path in posix style to pyarrow. That's also the reason the base style in the example output above is still posix style, but so we have a "bug" when joining the additional file path (the bug is already present in older versions of pyarrow, but only surfaced now in dask due to a change in how we handled fsspec filesystems) |
Thank you for looking into this @jorisvandenbossche ! Getting the fix into 2.0.1 would be great :) |
So the reason that the bug surfaced now is because the new fsspec filesystems no longer inherit from pyarrow.filesystem.FileSystem, and because of that we take a different code path where we are swapping fsspec's LocalFileSystem with our own LocalFileSystem (which has a different path separator on Windows). While before (when fsspec subclassed pyarrow), we didn't actually take this code path .. |
We've temporarily pinned pyarrow<2 in the CI environments as there were some changes that broke our tests. We should test against pyarrow 2.
https://github.com/dask/dask/runs/1281335438?check_suite_focus=true#step:6:211
Some additional failures like
test_append_with_partition[pyarrow]
further down I think.This is perhaps related to pyarrow 2.0 being released yesterday?
The text was updated successfully, but these errors were encountered: