New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13090: [Python] Fix create_dir() implementation in FSSpecHandler #10540
Conversation
Recent fsspec version have started raising FileExistsError if the target directory already exists. Ignore the error, as create_dir() is supposed to succeed in that case.
self.fs.mkdir(path, create_parents=recursive) | ||
try: | ||
self.fs.mkdir(path, create_parents=recursive) | ||
except FileExistsError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the "FileExistsError" a bit strange, because I first thought this was about a directory that already existed (but apparently, fsspec's mkdir
is fine with that as well and will pass silently if the target directory already exists, at least I checked this for their local filesystem). But so it is actually about a file already existing with that name.
And the current behaviour of our filesystem is then to silently not create a directory. But is that our desired behaviour? I would say that create_dir
should guarantee that the target directory exists after calling that method (or otherwise error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But so it is actually about a file already existing with that name.
Are you sure that's the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from fsspec.implementations.local import LocalFileSystem as FSSpecLocalFileSystem
fs = FSSpecLocalFileSystem()
fs.mkdir("test_dir")
fs.touch("test_file")
# creating an existing dir again is fine
In [9]: fs.mkdir("test_dir")
# creating a dir that already exists as a file errors
In [10]: fs.mkdir("test_file")
...
FileExistsError: [Errno 17] File exists: '/home/joris/scipy/test_file'
(now, this was a manual test of the expected behaviour, will check what actually happens in the failing test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but it's the MemoryFileSystem that fails with the exact same example from above .. So they are again not very consistent, that seems a bug in the MemoryFileSystem. Will open an issue about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this! (I just noticed we also have some failures in pandas with the new fsspec release)
@pitrou so it's indeed about an existing dir here, but to get back to one other aspect of the above inline discussion: what is our expected behaviour for creating a dir that already exists as a file? |
It's not specified currently. Ideally, if the information is available it should raise an error. Edit: created https://issues.apache.org/jira/browse/ARROW-13092 for it |
@jorisvandenbossche Besides opening an issue upstream, should we merge this workaround? |
Thanks for opening the JIRA, the upstream issue is fsspec/filesystem_spec#673 I think the answer depends on whether it's a bug or expected behaviour in fsspec, but maybe let's merge anyway for now to get our CI green? |
Regardless, the added code seems harmless and it will fix our CI. |
It could potentially depend on ARROW-13092 (if we want to have it raise again for existing files). Thanks for the fix! |
Recent fsspec version have started raising FileExistsError if the target directory already exists. Ignore the error, as create_dir() is supposed to succeed in that case. Closes apache#10540 from pitrou/ARROW-13090-fsspec-create-dir Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Recent fsspec version have started raising FileExistsError if the target directory already exists. Ignore the error, as create_dir() is supposed to succeed in that case.