Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13090: [Python] Fix create_dir() implementation in FSSpecHandler #10540

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jun 16, 2021

Recent fsspec version have started raising FileExistsError if the target directory already exists. Ignore the error, as create_dir() is supposed to succeed in that case.

Recent fsspec version have started raising FileExistsError if the target directory
already exists.  Ignore the error, as create_dir() is supposed to succeed in that case.
@github-actions
Copy link

self.fs.mkdir(path, create_parents=recursive)
try:
self.fs.mkdir(path, create_parents=recursive)
except FileExistsError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the "FileExistsError" a bit strange, because I first thought this was about a directory that already existed (but apparently, fsspec's mkdir is fine with that as well and will pass silently if the target directory already exists, at least I checked this for their local filesystem). But so it is actually about a file already existing with that name.
And the current behaviour of our filesystem is then to silently not create a directory. But is that our desired behaviour? I would say that create_dir should guarantee that the target directory exists after calling that method (or otherwise error)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But so it is actually about a file already existing with that name.

Are you sure that's the case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from fsspec.implementations.local import LocalFileSystem as FSSpecLocalFileSystem
fs = FSSpecLocalFileSystem()
fs.mkdir("test_dir")
fs.touch("test_file")

# creating an existing dir again is fine
In [9]: fs.mkdir("test_dir")

# creating a dir that already exists as a file errors
In [10]: fs.mkdir("test_file")
...
FileExistsError: [Errno 17] File exists: '/home/joris/scipy/test_file'

(now, this was a manual test of the expected behaviour, will check what actually happens in the failing test)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but it's the MemoryFileSystem that fails with the exact same example from above .. So they are again not very consistent, that seems a bug in the MemoryFileSystem. Will open an issue about it.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! (I just noticed we also have some failures in pandas with the new fsspec release)

@jorisvandenbossche
Copy link
Member

@pitrou so it's indeed about an existing dir here, but to get back to one other aspect of the above inline discussion: what is our expected behaviour for creating a dir that already exists as a file?

@pitrou
Copy link
Member Author

pitrou commented Jun 16, 2021

It's not specified currently. Ideally, if the information is available it should raise an error.

Edit: created https://issues.apache.org/jira/browse/ARROW-13092 for it

@pitrou
Copy link
Member Author

pitrou commented Jun 16, 2021

@jorisvandenbossche Besides opening an issue upstream, should we merge this workaround?

@jorisvandenbossche
Copy link
Member

Thanks for opening the JIRA, the upstream issue is fsspec/filesystem_spec#673

I think the answer depends on whether it's a bug or expected behaviour in fsspec, but maybe let's merge anyway for now to get our CI green?

@pitrou
Copy link
Member Author

pitrou commented Jun 16, 2021

Regardless, the added code seems harmless and it will fix our CI.

@jorisvandenbossche
Copy link
Member

It could potentially depend on ARROW-13092 (if we want to have it raise again for existing files).

Thanks for the fix!

@pitrou pitrou deleted the ARROW-13090-fsspec-create-dir branch June 16, 2021 12:42
sjperkins pushed a commit to sjperkins/arrow that referenced this pull request Jun 23, 2021
Recent fsspec version have started raising FileExistsError if the target directory already exists.  Ignore the error, as create_dir() is supposed to succeed in that case.

Closes apache#10540 from pitrou/ARROW-13090-fsspec-create-dir

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants