ARROW-2998: [C++] Add unique_ptr versions of Allocate[Resizable]Buffer#2395
ARROW-2998: [C++] Add unique_ptr versions of Allocate[Resizable]Buffer#2395jbapple-cloudera wants to merge 5 commits intoapache:masterfrom
Conversation
wesm
left a comment
There was a problem hiding this comment.
Thanks for adding this. Adding some basic unit tests would be a good idea. These could be done with TYPED_TEST to avoid code duplication in the test suite if desired
cpp/src/arrow/buffer.cc
Outdated
There was a problem hiding this comment.
This could be templated something like
template <typename BufferType, typename Container>
inline Status ReturnBufferSized(Container<BufferType>&& buffer, const int64_t size,
Container<Buffer>* out) {
RETURN_NOT_OK(buffer->Resize(size));
buffer->ZeroPadding();
*out = std::move(buffer);
return Status::OK();
}This can also be used to address code duplication in AllocateResizableBuffer as it is now
|
Hmm, why is this useful? This is basically duplicating the existing API. If we start applying this pattern everywhere, we'll end up maintaining two mostly similar APIs... |
|
I don't think we should use |
|
@pitrou My rationale is that @wesm I don't see a thread-safe way to get a pointer out of a (*) the output parameters at the end of the param list, usually. |
|
The runtime overhead is only when copying a As for that fact that only one reference exists, it may not always be the case. For example, if you are asking for a 0-sized buffer, returning a shared 0-sized Buffer would be a valid optimization IMO. |
|
I think there's a small amount of overhead when the pointer is dereferenced. Since memory allocation is the lowest level of the stack, I'm fine with having |
|
Needs a rebase |
100c7a3 to
d8cbe2e
Compare
I doubt it's significantly different from |
|
@wesm in the libc++ that comes with gcc 5.4.0, there is no dereference overhead, but there is overhead in destruction, which must be thread-safe and so uses atomic operations. From https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rf-shared_ptr:
|
|
I'm familiar with the guidelines. What I'm saying is that we have a project here that is oriented largely around hierarchical shared memory references (e.g. to memory maps, POSIX shared memory, payloads coming over the wire), which explains our general preference for using |
|
I think that APi consistency trumps micro-optimization here. If we discover a hot internal code path where the |
|
I originally suggested to Jim to submit this patch in apache/parquet-cpp#432 |
|
Ah, right, parquet-cpp is using different internal conventions :-/ How do you plan to deal with that if we merge the codebases together? Would we migrate parquet-cpp to the same conventions as the Arrow C++ codebase? |
|
The use case there was a private buffer that would not be exported outside the scope of the Bloom filter. I think it is OK for components to use In the case of a lot of the rest of Arrow, e.g. the columnar data structures, the memory could be shared or reused in many cases, so we need to use |
|
It's ok, but is it worth adding to our API surface? You'd be saving a tiny bit of memory and a tiny bit of overhead. |
|
I'm in favor in this limited case. We have plenty of other APIs that could return both kinds of pointers, e.g. https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L48. I wouldn't be in favor of adding both variants of the functions in such cases. I agree that there is the slippery slope possibility of trying to "be all things to all people", but memory allocation is about as close to the metal as we get. I would rather see people reuse these abstractions (particularly since we deal with jemalloc interop, padding/alignment, and other issues) rather than rolling their own. |
|
Maybe we should use a similar pattern directly in |
|
Rebased. The flaked build (from a package manager timeout) should hopefully pass now. +1 -- per above discussion I think we should be conservative going forward about adding too many duplicate unique_ptr/shared_ptr APIs |
|
Ok with me. |
Codecov Report
@@ Coverage Diff @@
## master #2395 +/- ##
==========================================
+ Coverage 84.8% 86.89% +2.09%
==========================================
Files 296 237 -59
Lines 45641 42706 -2935
==========================================
- Hits 38705 37110 -1595
+ Misses 6891 5596 -1295
+ Partials 45 0 -45
Continue to review full report at Codecov.
|
|
thanks @jbapple-cloudera! |
This could be improved in a couple of ways:
Remove duplication. I didn't do this yet because ther already is duplication in buffer.cc and I wanted some feedback before proceeding.
Add tests. I didn't do this yet because the testing of the existing
shared_ptrAllocateBufferfunctions is quite slim, so I wanted some feedback before proceeding.