-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] How to load an arrow::Table from a buffer of data without intermediate file creation in C++? #37144
Comments
Does this work? auto read(std::span<const char> buffer) -> std::shared_ptr<arrow::Table> {
::arrow::Buffer arrow_buffer(buffer.data(), buffer.size());
::arrow::io::BufferReader buffer_reader(arrow_buffer);
auto const ipc_reader = ::arrow::ipc::RecordBatchFileReader::Open(&buffer_reader);
if (!ipc_reader.ok()) {
return nullptr;
}
auto const reader = *ipc_reader;
auto const creation = ::arrow::Table::FromRecordBatcheReader(reader.get());
if (!creation.ok()) {
return nullptr;
}
return *creation;
} |
The code does not compile because the Using the buffer auto read(std::span<const char> buffer) -> std::shared_ptr<arrow::Table> {
::arrow::Buffer arrow_buffer(buffer.data(), buffer.size());
::arrow::io::BufferReader buffer_reader(arrow_buffer);
auto const ipc_reader = ::arrow::ipc::RecordBatchFileReader::Open(&buffer_reader);
if (!ipc_reader.ok()) {
return nullptr;
}
auto const reader = ipc_reader.ValueOrDie();
auto const num_record_batches = reader->num_record_batches();
auto batches = std::vector<std::shared_ptr<::arrow::RecordBatch>>(num_record_batches);
for (auto i = 0; i < num_record_batches; ++i) {
auto const batch = reader->ReadRecordBatch(i);
if (!batch.ok()) {
return nullptr;
}
batches[i] = batch.ValueOrDie();
}
auto const creation = ::arrow::Table::FromRecordBatches(batches);
if (!creation.ok()) {
return nullptr;
}
return creation.ValueOrDie();
} I've also tried with a auto read(std::span<const char> buffer) -> std::shared_ptr<arrow::Table> {
::arrow::Buffer arrow_buffer(buffer.data(), buffer.size());
::arrow::io::BufferReader buffer_reader(arrow_buffer);
auto const ipc_reader = ::arrow::ipc::RecordBatchStreamReader::Open(&buffer_reader);
if (!ipc_reader.ok()) {
return nullptr;
}
auto const reader = *ipc_reader;
auto const creation = ::arrow::Table::FromRecordBatcheReader(reader.get());
if (!creation.ok()) {
return nullptr;
}
return *creation;
} But I do get this error:
|
Ah, You can't use If you change the in-memory data format to IPC streaming format from IPC file format, you can use If you still need to use IPC file format, you need to use #37167. |
Thanks for the quick PR introducing the feature; As you mentioned in the PR, |
…37167) ### Rationale for this change `RecordBatchReader` has them but `RecordBatchFileReader` doesn't. They are convenient. ### What changes are included in this PR? Add them. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #37144 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…le} (apache#37167) ### Rationale for this change `RecordBatchReader` has them but `RecordBatchFileReader` doesn't. They are convenient. ### What changes are included in this PR? Add them. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: apache#37144 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
I have a pipeline of processes that do different stuff. One of the pipes reads a file and de-compresses it into a buffer. The buffer in question contains an arrow Table. A component takes this buffer and returns a table with the data.
The code works fine, but there is a significant performance issue as the code needs to write the data into a file and then process it to read it with the Arrow file stream classes.
I've been finding a way to perform this action in the documentation without writing the data into a file on disk. I was not able to make it work with a RecordBatchStreamReader or any of the alternatives I've found in the docs.
Would you happen to have any working examples showing how to avoid this disk write? Is this even possible?
Here's the code in question:
Component(s)
C++
The text was updated successfully, but these errors were encountered: