New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Appending to streamable table file format doesn't seem to work #18955
Comments
Antoine Pitrou / @pitrou: |
Rob Ambalu / @robambalu: |
Antoine Pitrou / @pitrou: |
Rob Ambalu / @robambalu: You need a streamer with an "append" option to repro the issue. As far as I can see thers only FileOutputStream ( broken ) and hadoop, which im not sure how to bootstrap. I hit this issue in my code because I wrote my own MMapStreamer that supports append. Ive actually worked around this at this point by creating new .idx extension files if I already find an existing file and want to append ( reader then concats them all ) |
Wes McKinney / @wesm: |
Tim Cooijmans: (A naive use of pyarrow's OSFile fails because it rejects the "ab" file mode, expecting either read or write, and a naive use of the cat utility results in only the most recent data being readable from the file.) |
As far as I can tell it looks like appending to a streaming file format isn’t currently supported, is that right?
RecordBatchStreamWriter always writes the schema up front, and it doesn’t look like a schema is expected mid file ( assuming im doing this append test correctly, this is the error I hit when I try to read back this file into python:
Traceback (most recent call last):
File "/home/ra7293/rba_arrow_mmap.py", line 9, in
table = reader.read_all()
File "ipc.pxi", line 302, in pyarrow.lib._RecordBatchReader.read_all
File "error.pxi", line 79, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Message not expected type: record batch, was: 1
This reader script works fine if I write once / don’t append.
Seeing as IO interfaces support Append, streaming should support it as well ( if for whatever reason this cant be supported, RecordBatchStreamWriter should throw if configured with an OutputStreamer that is attempting to append )
Reporter: Rob Ambalu / @robambalu
Note: This issue was originally created as ARROW-2579. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: