-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-8766: [Python] Allow implementing filesystems in Python #7349
Conversation
A first test case is working, using fsspec's in-memory filesystem:
I only have a bit trouble finding a robust way to get a file handle for an open file from the fsspec filesystem, but so that is not related to this PR ;) For the rest it was quite easy to get this working! |
python/pyarrow/_fs.pyx
Outdated
self.info.set_mtime(PyDateTime_to_TimePoint( | ||
<PyDateTime_DateTime*> mtime)) | ||
else: | ||
self.info.set_mtime(TimePoint_from_ns(mtime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: should instead add a separate mtime_ns
argument.
The WIP attempt to create a Handler for fsspec is here (which the example I posted above is using): jorisvandenbossche@14ac33d. With that, reading files is working. Although there are some incosistencies in the file-like objects the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou looks all good to me. Went through the code, and tested it further with the fsspec handler, and all seems to be working nicely.
Only the open_append_stream
, I am not yet fully sure how to do this (if possible). The PythonFile
class might not yet support an append (mode="a") writable mode?
open_append_stream; | ||
}; | ||
|
||
class ARROW_PYTHON_EXPORT PyFileSystem : public arrow::fs::FileSystem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need some doc comments? (I don't know how "public" this is, in the end it is only to be used in the python bindings I think, and there the PyFileSystem class has docstrings for python users, so that might be sufficient)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not public as a C++ API, indeed, but if there's some stuff there that needs explaining, feel free to point it out and I'll add some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just one-liner indicating this is only used to implement pyarrow.fs.PyFileSystem could be added then, but not that important
@abstractmethod | ||
def move(self, src, dest): | ||
""" | ||
Implement PyFileSystem.move(...). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose methods like this are expected to just raise the appropriate FileNotFoundError in case src
does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
Recognizing append mode as a "write" mode seems to work: --- a/python/pyarrow/io.pxi
+++ b/python/pyarrow/io.pxi
@@ -654,6 +654,8 @@ cdef class PythonFile(NativeFile):
kind = 'w'
elif inferred_mode.startswith('r'):
kind = 'r'
+ elif inferred_mode.startswith('a'):
+ kind = 'w'
else:
raise ValueError('Invalid file mode: {0}'.format(mode)) (if that's a proper fix, can include that in my follow-up PR) |
You'll have to check that writing indeed appends at the end rather than e.g. truncating. More generally, while I originally added the append-open method, I'm not sure it will ever be useful, so in the meantime if an implementation wants to raise |
At least the tests pass (I added a fsspec local filesytem handler to the list of filesytem fixtures), and from a small local test it seems to be appending. Now, if it's not actually used within arrow at the moment (eg for enabling reading/writing parquet files or datasets with those filesystems, we won't need it), then it is also not very important to get this working. |
No description provided.