Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10433 [Python] Swopped the conditions for checking for fsspec filesystems #8557

Closed
wants to merge 1 commit into from

Conversation

mariusvniekerk
Copy link
Member

Since s3fs.S3FileSystem is an fsspec filesystem moving this check will
ensure that it is correctly detected as an fsspec filesystem and not
wrapped with S3FSWrapper.

@mariusvniekerk mariusvniekerk changed the title [ARROW-10433] Swopped the conditions for checking for fsspec filesystems ARROW-10433:[Python] Swopped the conditions for checking for fsspec filesystems Oct 30, 2020
@mariusvniekerk mariusvniekerk changed the title ARROW-10433:[Python] Swopped the conditions for checking for fsspec filesystems ARROW-10433 [Python] Swopped the conditions for checking for fsspec filesystems Oct 30, 2020
@mariusvniekerk
Copy link
Member Author

cc @jorisvandenbossche

@github-actions
Copy link

Since s3fs.S3FileSystem is an fsspec filesystem moving this check will
ensure that it is correctly detected as an fsspec filesystem and not
wrapped with S3FSWrapper.
@jorisvandenbossche
Copy link
Member

Rebased this, and running the tests that I added in #8573 with latest s3fs locally (we currently still get s3fs 0.4 on CI)

@pitrou
Copy link
Member

pitrou commented Nov 9, 2020

So what use is the S3FSWrapper after this PR?

@mariusvniekerk
Copy link
Member Author

might be useful still for s3fs<0.4 ?

@jorisvandenbossche
Copy link
Member

I was going to propose to actually completely remove the S3FSWrapper call here (instead of only moving it after the other check).

This class if from before my time, but as far as I understand, it was initially used to wrap s3fs filesystems (so they could be used in ParquetDataset). Later on, the fsspec filesystems (and thus s3fs as well) became actual pyarrow.filesystem.FileSystem subclasses, so they would be used directly in ParquetDataset. So in practice, this S3FSWrapper has not been in use for the last s3fs / fsspec releases.

But this change to inherit from pyarrow is already more than 2 years old (eg fsspec/filesystem_spec@f461317), so I think in practice nobody is (should be) using such an old fsspec version.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked locally that the relevant parquet tests are now passing with this patch and s3fs master (released s3fs 0.5.1 still has some regressions causing failures in reading partitioned parquet datasets)

As mentioned above, I think we can remove the S3FSWrapper call here alltogether, but will deal with that in a separate PR directly deprecating the class as well.

yordan-pavlov pushed a commit to yordan-pavlov/arrow that referenced this pull request Nov 14, 2020
…ilesystems

Since s3fs.S3FileSystem is an fsspec filesystem moving this check will
ensure that it is correctly detected as an fsspec filesystem and not
wrapped with S3FSWrapper.

Closes apache#8557 from mariusvniekerk/arrow-10433

Authored-by: Marius van Niekerk <marius.v.niekerk@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@iNecas
Copy link

iNecas commented Dec 15, 2020

I've just hit TypeError: 'coroutine' object is not iterable and turned out it being related to compatibility with s3fs. I was surprised not finding any issue related to pyarrow describing this problem, so I've filed one in https://issues.apache.org/jira/browse/ARROW-10921. For now, locking the s3fs to <0.5 worksaround the problem.

@jorisvandenbossche jorisvandenbossche mentioned this pull request Jan 8, 2021
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…ilesystems

Since s3fs.S3FileSystem is an fsspec filesystem moving this check will
ensure that it is correctly detected as an fsspec filesystem and not
wrapped with S3FSWrapper.

Closes apache#8557 from mariusvniekerk/arrow-10433

Authored-by: Marius van Niekerk <marius.v.niekerk@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants