Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-530: [C++/Python] Provide subpools for better memory allocation … #2057

Closed
wants to merge 9 commits into from

Conversation

@rok
Copy link
Contributor

commented May 17, 2018

…tracking

Currently we can only track the amount of bytes allocated by the main memory pool or the alternative jemalloc implementation. To better understand certain situation, we should provide a MemoryPool proxy implementation that tracks only the amount of memory that was made through its direct calls but delegates the actual allocation to an underlying pool.

Authors: Rok Mihevc rok@mihevc.org and Alex Hagerman alex@unexpectedeof.net

Usage example:

import pyarrow as pa

def report(pool, proxy_pool):
print("Total: ", pa.total_allocated_bytes())
print("Default pool: ", pool.bytes_allocated())
print("Proxy allocated: ", proxy_pool.proxy_bytes_allocated())

pool = pa.default_memory_pool()
proxy_pool = pa.ProxyMemoryPool(pool)

report(pool, proxy_pool)
a1 = pa.array([0]*1000, memory_pool=pool)
report(pool, proxy_pool)
a2 = pa.array([0]*1, memory_pool=proxy_pool)
report(pool, proxy_pool)
a3 = pa.array([0]*1000, memory_pool=proxy_pool)
report(pool, proxy_pool)
a3 = pa.array([0]*1000, memory_pool=proxy_pool)
report(pool, proxy_pool)

Result:

Total: 0
Default pool: 0
bytes_allocated: 0
Proxy allocated: 0
Total: 8128
Default pool: 8128
bytes_allocated: 0
Proxy allocated: 0
Allocate: size = 64
Allocate: size = 256
Reallocate: old_size = 256 - new_size = 64 - proxy_allocated - 128
Total: 8256
Default pool: 8256
bytes_allocated: 128
Proxy allocated: 128
Allocate: size = 128
Allocate: size = 8192
Reallocate: old_size = 8192 - new_size = 8000 - proxy_allocated - 8256
Total: 16384
Default pool: 16384
bytes_allocated: 8256
Proxy allocated: 8256
Allocate: size = 128
Allocate: size = 8192
Reallocate: old_size = 8192 - new_size = 8000 - proxy_allocated - 16384
Free: size = 128
Free: size = 8000
Total: 16384
Default pool: 16384
bytes_allocated: 8256
Proxy allocated: 8256

@xhochy
Copy link
Member

left a comment

Implementation-wise this looks good. Also for me it seems as this pool will already tracks only the allocations made for it. To complete this PR, please add some C++ unit tests that check the implementation.

Also have a look at Travis which reports some linting errors.

Status ProxyMemoryPool::Allocate(int64_t size, uint8_t** out) {
proxy_bytes_allocated_ += size;
Status s = pool_->Allocate(size, out);
std::cout << "Allocate: size = " << size << std::endl;

This comment has been minimized.

Copy link
@xhochy

xhochy May 17, 2018

Member

Please remove the calls to std::cout. If the user wants this, he should explicitly chose the LoggingMemoryPool.

@xhochy

This comment has been minimized.

Copy link
Member

commented May 19, 2018

@rok Code and tests looks good. Could you fix the linting errors mentioned in Travis?

@rok

This comment has been minimized.

Copy link
Contributor Author

commented May 19, 2018

@xhochy - will do, traveling at the moment, better availability from Monday on.

rok added some commits May 19, 2018

@codecov-io

This comment has been minimized.

Copy link

commented May 20, 2018

Codecov Report

Merging #2057 into master will decrease coverage by 1.08%.
The diff coverage is 68.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2057      +/-   ##
==========================================
- Coverage   87.44%   86.35%   -1.09%     
==========================================
  Files         189      230      +41     
  Lines       29376    40513   +11137     
==========================================
+ Hits        25687    34987    +9300     
- Misses       3689     5526    +1837
Impacted Files Coverage Δ
python/pyarrow/__init__.py 57.14% <ø> (ø)
python/pyarrow/memory.pxi 34.48% <0%> (ø)
cpp/src/arrow/memory_pool.h 100% <100%> (ø) ⬆️
cpp/src/arrow/memory_pool-test.cc 100% <100%> (ø) ⬆️
cpp/src/arrow/memory_pool.cc 65.09% <60.86%> (-1.18%) ⬇️
cpp/src/arrow/status.cc 44.64% <0%> (-1.79%) ⬇️
cpp/src/arrow/util/bit-util.h 98.01% <0%> (-1.23%) ⬇️
cpp/src/arrow/io/hdfs-internal.cc 33.62% <0%> (-0.45%) ⬇️
cpp/src/arrow/compare.cc 89.97% <0%> (-0.2%) ⬇️
cpp/src/arrow/ipc/json-internal.cc 89.79% <0%> (-0.12%) ⬇️
... and 103 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df20683...eabd8a4. Read the comment docs.

@rok rok changed the title ARROW-530: C++/Python: Provide subpools for better memory allocation … ARROW-530 [C++/Python]: Provide subpools for better memory allocation … May 20, 2018

@rok rok changed the title ARROW-530 [C++/Python]: Provide subpools for better memory allocation … ARROW-530: [C++/Python] Provide subpools for better memory allocation … May 20, 2018

proxy_bytes_allocated_ -= size;
}

int64_t ProxyMemoryPool::bytes_allocated() const {

This comment has been minimized.

Copy link
@xhochy

xhochy May 21, 2018

Member

Just realised that bytes_allocated refers to the parent Pool. I would have expected that it would return only the allocation of the currrent scope (i.e. proxy_bytes_allocated_).

@rok

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2018

@xhochy - agreed that is a more intuitive interface. Now the ProxyMemoryPool object should return bytes allocated through it with bytes_allocated() method.
By the way how do you feel about the naming here? ProxyMemoryPool might not be the best name.


int64_t ProxyMemoryPool::bytes_allocated() const {
int64_t nb_bytes = bytes_allocated_;
return nb_bytes;

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

Use num or number here instead of nb


int64_t ProxyMemoryPool::max_memory() const {
int64_t mem = pool_->max_memory();
return mem;

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

return pool_->max_memory(); ?


Status ProxyMemoryPool::Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) {
Status s = pool_->Reallocate(old_size, new_size, ptr);
bytes_allocated_ += new_size - old_size;

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

If pool_->Reallocate fails, bytes_allocated_ will be incorrect


Status ProxyMemoryPool::Allocate(int64_t size, uint8_t** out) {
Status s = pool_->Allocate(size, out);
bytes_allocated_ += size;

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

If pool_->Allocate fails, bytes_allocated_ will be incorrect

@@ -86,6 +86,25 @@ class ARROW_EXPORT LoggingMemoryPool : public MemoryPool {
MemoryPool* pool_;
};

class ARROW_EXPORT ProxyMemoryPool : public MemoryPool {

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

Can we consider some alternative names than "Proxy"? Can you add a doxygen comment describing what this class is?

This comment has been minimized.

Copy link
@rok

rok Jun 2, 2018

Author Contributor

How about SubMemoryPool? MemorySubPool? PartialMemoryPool?

class ARROW_EXPORT ProxyMemoryPool : public MemoryPool {
public:
explicit ProxyMemoryPool(MemoryPool* pool);
~ProxyMemoryPool() override = default;

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

This dtor isn't needed


private:
MemoryPool* pool_;
int64_t bytes_allocated_ = 0;

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

Atomic?

@@ -44,6 +44,15 @@ cdef class LoggingMemoryPool(MemoryPool):
self.init(self.logging_pool.get())


cdef class ProxyMemoryPool(MemoryPool):

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

Docstring

This comment has been minimized.

Copy link
@rok

rok Jun 2, 2018

Author Contributor

Other classes are not documented in memory.pxi. Is there another location for documentation I'm missing?

This comment has been minimized.

Copy link
@xhochy

xhochy Jun 3, 2018

Member

No, this is the main location where we should document things. We were not so good with adding docstrings in the past. I'll probably make a pass over the old classes soon and add some documentation but we should start adding documentation to new classes once we add them.

This comment has been minimized.

Copy link
@rok

rok Jun 3, 2018

Author Contributor

Got it. Would be happy to help with the old classes as well.

This comment has been minimized.

Copy link
@xhochy

xhochy Jun 4, 2018

Member

Would be happy to help with the old classes as well.

That would be very much appreciated!

@@ -58,6 +67,8 @@ def set_memory_pool(MemoryPool pool):
cdef MemoryPool _default_memory_pool = default_memory_pool()
cdef LoggingMemoryPool _logging_memory_pool = (
LoggingMemoryPool(_default_memory_pool))
cdef ProxyMemoryPool _proxy_memory_pool = (
ProxyMemoryPool(_default_memory_pool))

This comment has been minimized.

Copy link
@wesm

wesm May 22, 2018

Member

What's the purpose of this?

@xhochy

This comment has been minimized.

Copy link
Member

commented Jun 3, 2018

By the way how do you feel about the naming here? ProxyMemoryPool might not be the best name.

Cannot think of anything better at the moment, I think it's ok.

@xhochy

This comment has been minimized.

Copy link
Member

commented Jun 4, 2018

Looks good except for a small formatting issue: https://travis-ci.org/apache/arrow/jobs/387368172#L1227

Would merge on a green build.

rok added some commits Jun 4, 2018

@rok

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2018

It seems tests in Travis failed due to timeouts: https://travis-ci.org/apache/arrow/jobs/388699104#L4333, https://travis-ci.org/apache/arrow/jobs/388699106#L523.
Is there a way to rerun them manually?

@xhochy

xhochy approved these changes Jun 7, 2018

Copy link
Member

left a comment

+1, LGTM

Travis tests passed after I retriggered them.

@xhochy xhochy closed this in e82a34a Jun 7, 2018

@xhochy

This comment has been minimized.

Copy link
Member

commented Jun 7, 2018

Thanks @rok . I assigned the jira to rokm, is that your user on https://issues.apache.org/jira ?

@rok

This comment has been minimized.

Copy link
Contributor Author

commented Jun 8, 2018

@xhochy - yeah, that's me, thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.