Error writing record batches to IPC streaming format #25984

asfimport · 2020-09-10T07:13:02Z

Writing record batches to the Arrow IPC streaming format with on-the-fly compression generally raises errors of one type or the other when reading it back.

PFA the code producing each of the below errors. I can't reproduce it for smaller batch sizes, so it probably has to do with size of each record batch. It does not seem specific to pyarrow since I see a similar issue with the C-Glib API.

#Error case 1

~/py376/lib/python3.7/site-packages/pyarrow/ipc.pxi in pyarrow.lib._CRecordBatchReader.read_next_batch()

~/py376/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

OSError: Truncated compressed stream

#Error case 2

~/py376/lib/python3.7/site-packages/pyarrow/ipc.pxi in pyarrow.lib._RecordBatchStreamReader._open()

~/py376/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()

~/py376/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Tried reading schema message, was null or length 0

Environment: pyarrow - Version: 1.0.1
python - version 3.7.6
Operating system - CentOS Linux release 7.8.2003 (Core)
Reporter: Ishan

Original Issue Attachments:

_{Note: This issue was originally created as ARROW-9958. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2020-09-10T21:10:18Z

Kouhei Sutou / @kou:
You need to ensure closing writers and output streams.

For example1.py:


sink = pa.output_stream(FILE, COMPRESSION_TYPE)
writer = pa.RecordBatchStreamWriter(sink, batch.schema)
for _ in range(5):
    writer.write_batch(batch)
writer.close()
sink.close()

For example2.py:


sink = pa.output_stream(FILE, COMPRESSION_TYPE)
writer = pa.RecordBatchStreamWriter(sink, batch.schema)
for _ in range(5):
    writer.write_batch(batch)
writer.close()
sink.close()

asfimport · 2020-09-14T05:27:12Z

Ishan:
Thank you. While using the C-Glib API too, I did close the writer but missed the sink.

asfimport closed this as completed Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error writing record batches to IPC streaming format #25984

Error writing record batches to IPC streaming format #25984

asfimport commented Sep 10, 2020

asfimport commented Sep 10, 2020

asfimport commented Sep 14, 2020

Error writing record batches to IPC streaming format #25984

Error writing record batches to IPC streaming format #25984

Comments

asfimport commented Sep 10, 2020

Original Issue Attachments:

asfimport commented Sep 10, 2020

asfimport commented Sep 14, 2020