Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create record batches from in memory IPC without memory copies #189

Closed
alamb opened this issue Apr 26, 2021 · 2 comments
Closed

Create record batches from in memory IPC without memory copies #189

alamb opened this issue Apr 26, 2021 · 2 comments
Labels
arrow Changes to the arrow crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11696

I have the IPC format of Arrow record batches in memory (e.g. as {{&[u8]}}) and would like to create a vector of batches while avoiding as many memory copies as possible. It would be great if there was a way to create the vector without having to go through the file abstraction.

I might be misunderstanding the way the file reader works and maybe it does not incur memory copies. I think it does, though, since creating arrow record batches from a larger arrow buffer takes much longer.

@alamb alamb added the arrow Changes to the arrow crate label Apr 26, 2021
@alamb
Copy link
Contributor Author

alamb commented Apr 26, 2021

Comment from Andrew Lamb(alamb) @ 2021-03-08T22:10:56.189+0000:

[~domoritz] I wonder if you mean this reader: https://docs.rs/arrow/3.0.0/arrow/ipc/reader/struct.FileReader.html#method.try_new

If so, while it is called a `FileReader` I think that is somewhat misleading. It requires something that implements `std::io::Read` -- which `&[u8]` does.

https://doc.rust-lang.org/std/io/trait.Read.html#impl-Read-2

Comment from Dominik Moritz(domoritz) @ 2021-03-09T08:12:41.105+0000:

But {{&[u8]}} does not seem to implement Seek so FileReader does not work.

The error is:

{{the trait bound `&[u8]: Seek` is not satisfied}}
{{ required by `FileReader::::try_new`}}

If I switch to the StreamReader, I get an IO error at runtime:

{{Io error: failed to fill whole buffer}}

So what I implemented was

{{let cursor = std::io::Cursor::new(contents);}}
{{  let reader = match arrow::ipc::reader::FileReader::try_new(cursor) {}}
{{  Ok(reader) => reader,}}
{{  Err(error) => return Err(format!("{}", error).into()),}}
{{ };}}

@tustvold
Copy link
Contributor

Closed by #2510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

No branches or pull requests

2 participants