Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add roaring_bitmap_memory_size_in_bytes(), with C++ interfaces #409

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion cpp/roaring.hh
Original file line number Diff line number Diff line change
Expand Up @@ -529,7 +529,6 @@ public:
/**
* Read a bitmap from a serialized version, reading no more than maxbytes
* bytes. This is meant to be compatible with the Java and Go versions.
*
*/
static Roaring readSafe(const char *buf, size_t maxbytes) {
roaring_bitmap_t * r =
Expand All @@ -540,6 +539,13 @@ public:
return Roaring(r);
}

/**
* Bytes of memory used by this bitmap.
*/
size_t getMemorySizeInBytes() const {
return api::roaring_bitmap_memory_size_in_bytes(&roaring);
}

/**
* How many bytes are required to serialize this bitmap (meant to be
* compatible with Java and Go versions)
Expand Down
20 changes: 20 additions & 0 deletions cpp/roaring64map.hh
Original file line number Diff line number Diff line change
Expand Up @@ -1051,6 +1051,26 @@ public:
return result;
}

/**
* Return the number of bytes of memory used by this bitmap
*/
size_t getMemorySizeInBytes() const {
// Figuring out how much memory is used by a std::map is guesswork.
// A common red/black tree implementation has 3 pointers plus 2 ints
// per element, plus the size of the pair. The size of the Roaring
// struct is included in roarings.getMemorySizeInBytes() so remove it.
constexpr size_t perEntry = 3 * sizeof(void*) + 2 * sizeof(int) + sizeof(std::pair<uint32_t, Roaring>) - sizeof(Roaring);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that counting bytes is hard. I wonder if this calculation has to have an accommodation for the "new" header... since map nodes are relatively small, the amount of header added by the storage allocator could be significant. I would imagine it's the size of a couple of pointers at least but I don't actually know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deciding exactly what to count is not always simple. Do you count the header? Do you count all the allocated bytes not just the "usable" bytes (often allocators will give a chunk of memory of a set size rather than the exact number of bytes)? These things are essentially not countable, at least not in a portable way (some systems have malloc_usable_size() which returns the latter value). I don't know that the first value is of much interest in general although of course if you are allocating tons of small things, it could be non-trivial. In general in that situation you'd probably move to something like jemalloc which uses a very different allocation model than the "traditional" one of allocating n+X bytes and returning a pointer into the buffer.


return std::accumulate(
roarings.cbegin(), roarings.cend(),
sizeof(*this),
[=](size_t previous,
const std::pair<const uint32_t, Roaring> &map_entry) {
// add bytes used by each Roaring std::map entry
return previous + perEntry + map_entry.second.getMemorySizeInBytes();
});
}

/**
* Return the number of bytes required to serialize this bitmap (meant to
* be compatible with Java and Go versions)
Expand Down
8 changes: 8 additions & 0 deletions include/roaring/containers/array.h
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,14 @@ int32_t array_container_write(const array_container_t *container, char *buf);
int32_t array_container_read(int32_t cardinality, array_container_t *container,
const char *buf);

/**
* Return the size in bytes of the memory used by the container.
*/
static inline size_t array_container_memory_size_in_bytes(
const array_container_t *container) {
return sizeof(*container) + (container->capacity * sizeof(container->array[0]));
}

/**
* Return the serialized size in bytes of a container (see
* bitset_container_write)
Expand Down
9 changes: 9 additions & 0 deletions include/roaring/containers/bitset.h
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,15 @@ int32_t bitset_container_write(const bitset_container_t *container, char *buf);
*/
int32_t bitset_container_read(int32_t cardinality,
bitset_container_t *container, const char *buf);

/**
* Return the size in bytes of the memory used by the container.
*/
static inline size_t bitset_container_memory_size_in_bytes(
const bitset_container_t *container) {
return sizeof(*container) + BITSET_CONTAINER_SIZE_IN_WORDS * sizeof(uint64_t);
}

/**
* Return the serialized size in bytes of a container (see
* bitset_container_write).
Expand Down
21 changes: 21 additions & 0 deletions include/roaring/containers/containers.h
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,27 @@ static inline int32_t container_write(
return 0; // unreached
}

/**
* Get the size in bytes of memory used by the container, requires a
* typecode
*/
static inline size_t container_memory_size_in_bytes(
const container_t *c, uint8_t typecode
){
c = container_unwrap_shared(c, &typecode);
switch (typecode) {
case BITSET_CONTAINER_TYPE:
return bitset_container_memory_size_in_bytes(const_CAST_bitset(c));
case ARRAY_CONTAINER_TYPE:
return array_container_memory_size_in_bytes(const_CAST_array(c));
case RUN_CONTAINER_TYPE:
return run_container_memory_size_in_bytes(const_CAST_run(c));
}
assert(false);
__builtin_unreachable();
return 0; // unreached
}

/**
* Get the container size in bytes under portable serialization (see
* container_write), requires a
Expand Down
8 changes: 8 additions & 0 deletions include/roaring/containers/run.h
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,14 @@ int32_t run_container_write(const run_container_t *container, char *buf);
int32_t run_container_read(int32_t cardinality, run_container_t *container,
const char *buf);

/**
* Return the size in bytes of the memory used by the container.
*/
static inline size_t run_container_memory_size_in_bytes(
const run_container_t *container) {
return sizeof(*container) + (container->capacity * sizeof(container->runs[0]));
}

/**
* Return the serialized size in bytes of a container (see run_container_write).
* This is meant to be compatible with the Java and Go versions of Roaring.
Expand Down
5 changes: 5 additions & 0 deletions include/roaring/roaring.h
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,11 @@ bool roaring_bitmap_run_optimize(roaring_bitmap_t *r);
*/
size_t roaring_bitmap_shrink_to_fit(roaring_bitmap_t *r);

/**
* Return the number of bytes of memory used by the bitmap.
*/
size_t roaring_bitmap_memory_size_in_bytes(const roaring_bitmap_t *r);

/**
* Write the bitmap to an output pointer, this output buffer should refer to
* at least `roaring_bitmap_size_in_bytes(r)` allocated bytes.
Expand Down
5 changes: 5 additions & 0 deletions include/roaring/roaring_array.h
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,11 @@ void ra_to_uint32_array(const roaring_array_t *ra, uint32_t *ans);

bool ra_range_uint32_array(const roaring_array_t *ra, size_t offset, size_t limit, uint32_t *ans);

/**
* Return the number of bytes of memory used by the bitmap.
*/
size_t ra_memory_size_in_bytes(const roaring_array_t *ra);

/**
* write a bitmap to a buffer. This is meant to be compatible with
* the
Expand Down
4 changes: 4 additions & 0 deletions src/roaring.c
Original file line number Diff line number Diff line change
Expand Up @@ -1397,6 +1397,10 @@ bool roaring_bitmap_remove_run_compression(roaring_bitmap_t *r) {
return answer;
}

size_t roaring_bitmap_memory_size_in_bytes(const roaring_bitmap_t *r) {
return ra_memory_size_in_bytes(&r->high_low_container);
}

size_t roaring_bitmap_serialize(const roaring_bitmap_t *r, char *buf) {
size_t portablesize = roaring_bitmap_portable_size_in_bytes(r);
uint64_t cardinality = roaring_bitmap_get_cardinality(r);
Expand Down
12 changes: 12 additions & 0 deletions src/roaring_array.c
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,18 @@ bool ra_has_run_container(const roaring_array_t *ra) {
return false;
}

size_t ra_memory_size_in_bytes(const roaring_array_t *ra) {
size_t count = sizeof(*ra) + (
ra->allocation_size * (sizeof(void*) + sizeof(uint16_t) + sizeof(uint8_t)));

for (int32_t k = 0; k < ra->size; ++k) {
count += container_memory_size_in_bytes(ra->containers[k],
ra->typecodes[k]);
}

return count;
}

uint32_t ra_portable_header_size(const roaring_array_t *ra) {
if (ra_has_run_container(ra)) {
if (ra->size <
Expand Down