[Python] Make Buffered* IO classes available to Python, incorporate into input_stream, output_stream factory functions #19478

asfimport · 2018-08-28T00:41:11Z

Reporter: Wes McKinney / @wesm
Assignee: Krisztian Szucs / @kszucs

PRs and other links:

GitHub Pull Request #3252

_{Note: This issue was originally created as ARROW-3126. Please see the migration documentation for further details.}

asfimport · 2018-08-28T10:29:23Z

Antoine Pitrou / @pitrou:
What would that do under the hood exactly? Any benchmarks to watch for?

asfimport · 2018-08-28T11:36:22Z

Uwe Korn / @xhochy:
As far as I understand the title, I would do the same as https://docs.python.org/3/library/io.html#io.BufferedReader internally does. Simply using the Python class in pyarrow already brought us great improvements in reading Parquet files from Azure.

asfimport · 2018-08-28T11:46:09Z

Antoine Pitrou / @pitrou:
Ok. The term "read ahead" is a bit misleading, because it implies that I/O is hidden in the background, which is not how a buffering layer works (the buffer is filled up synchronously when empty, it's not fed by a separate thread).

Can we-reuse io.BufferedReader for this or is the intention to have a similar primitive written in C++? Also, does it return a InputStream or a full-blown RandomAccessFile (the latter is quite a bit more difficult to get right and optimize).

asfimport · 2018-08-28T16:27:00Z

Wes McKinney / @wesm:
open_stream only uses InputStream https://github.com/apache/arrow/blob/master/python/pyarrow/ipc.pxi#L247. So we should implement a buffering InputStream in C++

asfimport · 2018-12-09T20:41:55Z

Wes McKinney / @wesm:
@kszucs would you be interested in working on this? My thinking is to add a buffer_size argument to both pyarrow.input_stream and output_stream. After a raw reader or writer is created, if this argument it set, it will be wrapped in either a BufferedInputStream or BufferedOutputStream as appropriate

asfimport · 2018-12-18T01:42:46Z

Wes McKinney / @wesm:
If someone else could pick this up I would be appreciative. If I finish the other work assigned to me this week and this is not done, I will pick it up

asfimport · 2019-01-10T04:38:36Z

Wes McKinney / @wesm:
Issue resolved by pull request 3252
#3252

asfimport closed this as completed Jan 10, 2019

asfimport assigned kszucs Jan 10, 2023

asfimport added this to the 0.12.0 milestone Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Make Buffered* IO classes available to Python, incorporate into input_stream, output_stream factory functions #19478

[Python] Make Buffered* IO classes available to Python, incorporate into input_stream, output_stream factory functions #19478

asfimport commented Aug 28, 2018

asfimport commented Aug 28, 2018

asfimport commented Aug 28, 2018

asfimport commented Aug 28, 2018

asfimport commented Aug 28, 2018

asfimport commented Dec 9, 2018

asfimport commented Dec 18, 2018

asfimport commented Jan 10, 2019

[Python] Make Buffered* IO classes available to Python, incorporate into input_stream, output_stream factory functions #19478

[Python] Make Buffered* IO classes available to Python, incorporate into input_stream, output_stream factory functions #19478

Comments

asfimport commented Aug 28, 2018

PRs and other links:

asfimport commented Aug 28, 2018

asfimport commented Aug 28, 2018

asfimport commented Aug 28, 2018

asfimport commented Aug 28, 2018

asfimport commented Dec 9, 2018

asfimport commented Dec 18, 2018

asfimport commented Jan 10, 2019