Skip to content

Commit

Permalink
ARROW-7975: [C++] Preserve intended buffer size by default when writi…
Browse files Browse the repository at this point in the history
…ng to IPC format

I stumbled into this tangentially. While padding to a minimum of an 8-byte multiple is a requirement of the binary protocol, including the padding in the metadata causes buffer sizes to be modified. In some cases, the intention of the writer is for the receiver to receive padded buffers, but including the padding unconditionally leaves no possibility of recovering the buffer size prior to producing the IPC message. I think it would be better to give the writer the option of what to do. I opened ARROW-7976 about adding an option to make this behavior configurable.

Another possibility of course is to implement an option and have "include padding" turned on by default

To be clear in case there is concern, this change has no backward or forward compatibility implications.

Closes #6513 from wesm/ARROW-7975 and squashes the following commits:

ec8e37b <Wes McKinney> Add unit test to check that buffer sizes pass through IPC unmodified
ac8d8d9 <Wes McKinney> Do not include padding in Buffer IPC metadata

Authored-by: Wes McKinney <wesm+git@apache.org>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
  • Loading branch information
wesm committed Mar 2, 2020
1 parent 9a805c1 commit 1156833
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 1 deletion.
25 changes: 25 additions & 0 deletions cpp/src/arrow/ipc/read_write_test.cc
Expand Up @@ -543,6 +543,31 @@ TEST_F(TestWriteRecordBatch, SliceTruncatesBuffers) {
CheckArray(a1);
}

TEST_F(TestWriteRecordBatch, RoundtripPreservesBufferSizes) {
// ARROW-7975
random::RandomArrayGenerator rg(/*seed=*/0);

int64_t length = 15;
auto arr = rg.String(length, 0, 10, 0.1);
auto batch = RecordBatch::Make(::arrow::schema({field("f0", utf8())}), length, {arr});

std::stringstream ss;
ss << "test-roundtrip-buffer-sizes";
ASSERT_OK_AND_ASSIGN(
mmap_, io::MemoryMapFixture::InitMemoryMap(/*buffer_size=*/1 << 20, ss.str()));
DictionaryMemo dictionary_memo;
std::shared_ptr<RecordBatch> result;
ASSERT_OK(DoStandardRoundTrip(*batch, &dictionary_memo, &result));

// Make sure that the validity bitmap is size 2 as expected
ASSERT_EQ(2, arr->data()->buffers[0]->size());

for (size_t i = 0; i < arr->data()->buffers.size(); ++i) {
ASSERT_EQ(arr->data()->buffers[i]->size(),
result->column(0)->data()->buffers[i]->size());
}
}

void TestGetRecordBatchSize(const IpcOptions& options,
std::shared_ptr<RecordBatch> batch) {
io::MockOutputStream mock;
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/ipc/writer.cc
Expand Up @@ -190,7 +190,7 @@ class RecordBatchSerializer : public ArrayVisitor {
padding = BitUtil::RoundUpToMultipleOf8(size) - size;
}

buffer_meta_.push_back({offset, size + padding});
buffer_meta_.push_back({offset, size});
offset += size + padding;
}

Expand Down

0 comments on commit 1156833

Please sign in to comment.