Skip to content

Commit

Permalink
ARROW-7943: [C++][Parquet] Add code to generate rep/def levels for ne…
Browse files Browse the repository at this point in the history
…sted arrays

There will be follow-up code to integrate this with the
higher level writers.

This takes a slightly more OO approach then LevelBuilder in
writer.cc and also attempts to do more batching at each level
when possible. No benchmarks have been run yet.

There are likely a lot of typos given the hours that I've been
working on it (but hopefully no logic bugs).  I'm sorry.

Also allow TypedBufferBuilder/BufferBuilder to take an
initial ResizableBuffer to use so scratch can easily
be reused.

Closes #6490 from emkornfield/paths and squashes the following commits:

4c6892d <Wes McKinney> export symbols
23e6bfc <Wes McKinney> int16_t->int64_t
387adc8 <Wes McKinney> iwyu
3fc9e5c <Wes McKinney> Simplify schema statements in tests. Fix lint issues, typos
215f8a7 <emkornfield> fix another typo
30a0fbd <Micah Kornfield> fix typo
ad6f4bc <Micah Kornfield> Address code review comments.
5731b08 <emkornfield> Apply suggestions from code review
3cdf55c <emkornfield> Update cpp/src/parquet/arrow/path_internal.cc
bbabb6e <emkornfield> remove errant comment.
5ea8c07 <emkornfield> remove stale comments
5ac1d75 <Micah Kornfield> use next_range for tracking null count start
ec6bc27 <Micah Kornfield> remove stray comments
9e700cb <Micah Kornfield> fix lint
f0c0177 <Micah Kornfield> ARROW-7943:  Add code to generate levels for nested array

Lead-authored-by: Micah Kornfield <emkornfield@gmail.com>
Co-authored-by: emkornfield <emkornfield@gmail.com>
Co-authored-by: Wes McKinney <wesm+git@apache.org>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
  • Loading branch information
emkornfield and wesm committed Mar 7, 2020
1 parent f6a41a4 commit a5d267d
Show file tree
Hide file tree
Showing 5 changed files with 1,547 additions and 0 deletions.
15 changes: 15 additions & 0 deletions cpp/src/arrow/buffer_builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,17 @@ class ARROW_EXPORT BufferBuilder {
capacity_(0),
size_(0) {}

/// \brief Constructs new Builder that will start using
/// the provided buffer until Finish/Reset are called.
/// The buffer is not resized.
explicit BufferBuilder(std::shared_ptr<ResizableBuffer> buffer,
MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT)
: buffer_(std::move(buffer)),
pool_(pool),
data_(buffer_->mutable_data()),
capacity_(buffer_->capacity()),
size_(buffer_->size()) {}

/// \brief Resize the buffer to the nearest multiple of 64 bytes
///
/// \param new_capacity the new capacity of the of the builder. Will be
Expand Down Expand Up @@ -187,6 +198,10 @@ class TypedBufferBuilder<
explicit TypedBufferBuilder(MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT)
: bytes_builder_(pool) {}

explicit TypedBufferBuilder(std::shared_ptr<ResizableBuffer> buffer,
MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT)
: bytes_builder_(std::move(buffer), pool) {}

Status Append(T value) {
return bytes_builder_.Append(reinterpret_cast<uint8_t*>(&value), sizeof(T));
}
Expand Down
2 changes: 2 additions & 0 deletions cpp/src/parquet/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ add_custom_command(OUTPUT ${THRIFT_OUTPUT_FILES}
# Library config

set(PARQUET_SRCS
arrow/path_internal.cc
arrow/reader.cc
arrow/reader_internal.cc
arrow/schema.cc
Expand Down Expand Up @@ -367,6 +368,7 @@ add_parquet_test(arrow-test
SOURCES
arrow/arrow_reader_writer_test.cc
arrow/arrow_schema_test.cc
arrow/path_internal_test.cc
test_util.cc)

if(PARQUET_REQUIRE_ENCRYPTION)
Expand Down
Loading

0 comments on commit a5d267d

Please sign in to comment.