New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-530: [C++/Python] Provide subpools for better memory allocation … #2057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation-wise this looks good. Also for me it seems as this pool will already tracks only the allocations made for it. To complete this PR, please add some C++ unit tests that check the implementation.
Also have a look at Travis which reports some linting errors.
cpp/src/arrow/memory_pool.cc
Outdated
Status ProxyMemoryPool::Allocate(int64_t size, uint8_t** out) { | ||
proxy_bytes_allocated_ += size; | ||
Status s = pool_->Allocate(size, out); | ||
std::cout << "Allocate: size = " << size << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the calls to std::cout
. If the user wants this, he should explicitly chose the LoggingMemoryPool
.
@rok Code and tests looks good. Could you fix the linting errors mentioned in Travis? |
@xhochy - will do, traveling at the moment, better availability from Monday on. |
Codecov Report
@@ Coverage Diff @@
## master #2057 +/- ##
==========================================
- Coverage 87.44% 86.35% -1.09%
==========================================
Files 189 230 +41
Lines 29376 40513 +11137
==========================================
+ Hits 25687 34987 +9300
- Misses 3689 5526 +1837
Continue to review full report at Codecov.
|
cpp/src/arrow/memory_pool.cc
Outdated
proxy_bytes_allocated_ -= size; | ||
} | ||
|
||
int64_t ProxyMemoryPool::bytes_allocated() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realised that bytes_allocated
refers to the parent Pool. I would have expected that it would return only the allocation of the currrent scope (i.e. proxy_bytes_allocated_
).
@xhochy - agreed that is a more intuitive interface. Now the ProxyMemoryPool object should return bytes allocated through it with bytes_allocated() method. |
cpp/src/arrow/memory_pool.cc
Outdated
|
||
int64_t ProxyMemoryPool::bytes_allocated() const { | ||
int64_t nb_bytes = bytes_allocated_; | ||
return nb_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use num
or number
here instead of nb
cpp/src/arrow/memory_pool.cc
Outdated
|
||
int64_t ProxyMemoryPool::max_memory() const { | ||
int64_t mem = pool_->max_memory(); | ||
return mem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return pool_->max_memory();
?
cpp/src/arrow/memory_pool.cc
Outdated
|
||
Status ProxyMemoryPool::Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) { | ||
Status s = pool_->Reallocate(old_size, new_size, ptr); | ||
bytes_allocated_ += new_size - old_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pool_->Reallocate
fails, bytes_allocated_
will be incorrect
cpp/src/arrow/memory_pool.cc
Outdated
|
||
Status ProxyMemoryPool::Allocate(int64_t size, uint8_t** out) { | ||
Status s = pool_->Allocate(size, out); | ||
bytes_allocated_ += size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pool_->Allocate
fails, bytes_allocated_
will be incorrect
cpp/src/arrow/memory_pool.h
Outdated
@@ -86,6 +86,25 @@ class ARROW_EXPORT LoggingMemoryPool : public MemoryPool { | |||
MemoryPool* pool_; | |||
}; | |||
|
|||
class ARROW_EXPORT ProxyMemoryPool : public MemoryPool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consider some alternative names than "Proxy"? Can you add a doxygen comment describing what this class is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about SubMemoryPool? MemorySubPool? PartialMemoryPool?
cpp/src/arrow/memory_pool.h
Outdated
class ARROW_EXPORT ProxyMemoryPool : public MemoryPool { | ||
public: | ||
explicit ProxyMemoryPool(MemoryPool* pool); | ||
~ProxyMemoryPool() override = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dtor isn't needed
cpp/src/arrow/memory_pool.h
Outdated
|
||
private: | ||
MemoryPool* pool_; | ||
int64_t bytes_allocated_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic?
@@ -44,6 +44,15 @@ cdef class LoggingMemoryPool(MemoryPool): | |||
self.init(self.logging_pool.get()) | |||
|
|||
|
|||
cdef class ProxyMemoryPool(MemoryPool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other classes are not documented in memory.pxi. Is there another location for documentation I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is the main location where we should document things. We were not so good with adding docstrings in the past. I'll probably make a pass over the old classes soon and add some documentation but we should start adding documentation to new classes once we add them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Would be happy to help with the old classes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be happy to help with the old classes as well.
That would be very much appreciated!
python/pyarrow/memory.pxi
Outdated
@@ -58,6 +67,8 @@ def set_memory_pool(MemoryPool pool): | |||
cdef MemoryPool _default_memory_pool = default_memory_pool() | |||
cdef LoggingMemoryPool _logging_memory_pool = ( | |||
LoggingMemoryPool(_default_memory_pool)) | |||
cdef ProxyMemoryPool _proxy_memory_pool = ( | |||
ProxyMemoryPool(_default_memory_pool)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this?
Cannot think of anything better at the moment, I think it's ok. |
Looks good except for a small formatting issue: https://travis-ci.org/apache/arrow/jobs/387368172#L1227 Would merge on a green build. |
It seems tests in Travis failed due to timeouts: https://travis-ci.org/apache/arrow/jobs/388699104#L4333, https://travis-ci.org/apache/arrow/jobs/388699106#L523. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
Travis tests passed after I retriggered them.
Thanks @rok . I assigned the jira to |
@xhochy - yeah, that's me, thanks! :) |
…tracking
Currently we can only track the amount of bytes allocated by the main memory pool or the alternative jemalloc implementation. To better understand certain situation, we should provide a MemoryPool proxy implementation that tracks only the amount of memory that was made through its direct calls but delegates the actual allocation to an underlying pool.
Authors: Rok Mihevc rok@mihevc.org and Alex Hagerman alex@unexpectedeof.net
Usage example:
import pyarrow as pa
def report(pool, proxy_pool):
print("Total: ", pa.total_allocated_bytes())
print("Default pool: ", pool.bytes_allocated())
print("Proxy allocated: ", proxy_pool.proxy_bytes_allocated())
pool = pa.default_memory_pool()
proxy_pool = pa.ProxyMemoryPool(pool)
report(pool, proxy_pool)
a1 = pa.array([0]*1000, memory_pool=pool)
report(pool, proxy_pool)
a2 = pa.array([0]*1, memory_pool=proxy_pool)
report(pool, proxy_pool)
a3 = pa.array([0]*1000, memory_pool=proxy_pool)
report(pool, proxy_pool)
a3 = pa.array([0]*1000, memory_pool=proxy_pool)
report(pool, proxy_pool)
Result:
Total: 0
Default pool: 0
bytes_allocated: 0
Proxy allocated: 0
Total: 8128
Default pool: 8128
bytes_allocated: 0
Proxy allocated: 0
Allocate: size = 64
Allocate: size = 256
Reallocate: old_size = 256 - new_size = 64 - proxy_allocated - 128
Total: 8256
Default pool: 8256
bytes_allocated: 128
Proxy allocated: 128
Allocate: size = 128
Allocate: size = 8192
Reallocate: old_size = 8192 - new_size = 8000 - proxy_allocated - 8256
Total: 16384
Default pool: 16384
bytes_allocated: 8256
Proxy allocated: 8256
Allocate: size = 128
Allocate: size = 8192
Reallocate: old_size = 8192 - new_size = 8000 - proxy_allocated - 16384
Free: size = 128
Free: size = 8000
Total: 16384
Default pool: 16384
bytes_allocated: 8256
Proxy allocated: 8256