Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-530: [C++/Python] Provide subpools for better memory allocation … #2057

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
21 changes: 21 additions & 0 deletions cpp/src/arrow/memory_pool-test.cc
Expand Up @@ -91,4 +91,25 @@ TEST(LoggingMemoryPool, Logging) {

ASSERT_EQ(200, pool->max_memory());
}

TEST(ProxyMemoryPool, Logging) {
MemoryPool* pool = default_memory_pool();

ProxyMemoryPool pp = ProxyMemoryPool(pool);

uint8_t* data;
ASSERT_OK(pool->Allocate(100, &data));

uint8_t* data2;
ASSERT_OK(pp.Allocate(300, &data2));

ASSERT_EQ(400, pool->bytes_allocated());
ASSERT_EQ(300, pp.bytes_allocated());

pool->Free(data, 100);
pp.Free(data2, 300);

ASSERT_EQ(0, pool->bytes_allocated());
ASSERT_EQ(0, pp.bytes_allocated());
}
} // namespace arrow
29 changes: 29 additions & 0 deletions cpp/src/arrow/memory_pool.cc
Expand Up @@ -201,4 +201,33 @@ int64_t LoggingMemoryPool::max_memory() const {
std::cout << "max_memory: " << mem << std::endl;
return mem;
}

ProxyMemoryPool::ProxyMemoryPool(MemoryPool* pool) : pool_(pool) {}

Status ProxyMemoryPool::Allocate(int64_t size, uint8_t** out) {
Status s = pool_->Allocate(size, out);
bytes_allocated_ += size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pool_->Allocate fails, bytes_allocated_ will be incorrect

return s;
}

Status ProxyMemoryPool::Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) {
Status s = pool_->Reallocate(old_size, new_size, ptr);
bytes_allocated_ += new_size - old_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pool_->Reallocate fails, bytes_allocated_ will be incorrect

return s;
}

void ProxyMemoryPool::Free(uint8_t* buffer, int64_t size) {
pool_->Free(buffer, size);
bytes_allocated_ -= size;
}

int64_t ProxyMemoryPool::bytes_allocated() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised that bytes_allocated refers to the parent Pool. I would have expected that it would return only the allocation of the currrent scope (i.e. proxy_bytes_allocated_).

int64_t nb_bytes = bytes_allocated_;
return nb_bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use num or number here instead of nb

}

int64_t ProxyMemoryPool::max_memory() const {
int64_t mem = pool_->max_memory();
return mem;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return pool_->max_memory(); ?

}
} // namespace arrow
19 changes: 19 additions & 0 deletions cpp/src/arrow/memory_pool.h
Expand Up @@ -86,6 +86,25 @@ class ARROW_EXPORT LoggingMemoryPool : public MemoryPool {
MemoryPool* pool_;
};

class ARROW_EXPORT ProxyMemoryPool : public MemoryPool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider some alternative names than "Proxy"? Can you add a doxygen comment describing what this class is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about SubMemoryPool? MemorySubPool? PartialMemoryPool?

public:
explicit ProxyMemoryPool(MemoryPool* pool);
~ProxyMemoryPool() override = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dtor isn't needed


Status Allocate(int64_t size, uint8_t** out) override;
Status Reallocate(int64_t old_size, int64_t new_size, uint8_t** ptr) override;

void Free(uint8_t* buffer, int64_t size) override;

int64_t bytes_allocated() const override;

int64_t max_memory() const override;

private:
MemoryPool* pool_;
int64_t bytes_allocated_ = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic?

};

ARROW_EXPORT MemoryPool* default_memory_pool();

#ifdef ARROW_NO_DEFAULT_MEMORY_POOL
Expand Down
2 changes: 1 addition & 1 deletion python/pyarrow/__init__.py
Expand Up @@ -90,7 +90,7 @@ def parse_version(root):
from pyarrow.lib import (Buffer, ResizableBuffer, foreign_buffer, py_buffer,
compress, decompress, allocate_buffer)

from pyarrow.lib import (MemoryPool, total_allocated_bytes,
from pyarrow.lib import (MemoryPool, ProxyMemoryPool, total_allocated_bytes,
set_memory_pool, default_memory_pool,
log_memory_allocations)

Expand Down
4 changes: 4 additions & 0 deletions python/pyarrow/includes/libarrow.pxd
Expand Up @@ -192,6 +192,10 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil:
cdef cppclass CLoggingMemoryPool" arrow::LoggingMemoryPool"(CMemoryPool):
CLoggingMemoryPool(CMemoryPool*)

cdef cppclass CProxyMemoryPool" arrow::ProxyMemoryPool"(CMemoryPool):
CProxyMemoryPool(CMemoryPool*)
int64_t bytes_allocated()

cdef cppclass CBuffer" arrow::Buffer":
CBuffer(const uint8_t* data, int64_t size)
const uint8_t* data()
Expand Down
11 changes: 11 additions & 0 deletions python/pyarrow/memory.pxi
Expand Up @@ -44,6 +44,15 @@ cdef class LoggingMemoryPool(MemoryPool):
self.init(self.logging_pool.get())


cdef class ProxyMemoryPool(MemoryPool):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other classes are not documented in memory.pxi. Is there another location for documentation I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is the main location where we should document things. We were not so good with adding docstrings in the past. I'll probably make a pass over the old classes soon and add some documentation but we should start adding documentation to new classes once we add them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Would be happy to help with the old classes as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be happy to help with the old classes as well.

That would be very much appreciated!

cdef:
unique_ptr[CProxyMemoryPool] proxy_pool

def __cinit__(self, MemoryPool pool):
self.proxy_pool.reset(new CProxyMemoryPool(pool.pool))
self.init(self.proxy_pool.get())


def default_memory_pool():
cdef:
MemoryPool pool = MemoryPool()
Expand All @@ -58,6 +67,8 @@ def set_memory_pool(MemoryPool pool):
cdef MemoryPool _default_memory_pool = default_memory_pool()
cdef LoggingMemoryPool _logging_memory_pool = (
LoggingMemoryPool(_default_memory_pool))
cdef ProxyMemoryPool _proxy_memory_pool = (
ProxyMemoryPool(_default_memory_pool))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this?



def log_memory_allocations(enable=True):
Expand Down