-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor serialization buffer management and support memory tracking. #5231
Conversation
Fixes compile errors due to redefinition of the name `span`.
And port `array_serialize` as a proof of concept.
Only `SerializationBuffer` now implicitly converts to span.
This cleans-up code structure and will allow easily flowing the allocator from the list.
…cept an optional allocator.
`tiledb_buffer_list_t` was also updated to pass the allocator from the context's serialization memory tracker.
bool is_owned() const { | ||
return buffer_.data() == buffer_owner_.data(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought I could make this private. It has no use cases outside of this class. It could be used to guard for calling owned_mutable_span
, but you are supposed to call this function on buffers you created yourself, and you know whether they are owned.
And reminder for me to document it.
if (!is_owned()) | ||
throw BufferStatusException( | ||
"Cannot get a mutable span of a non-owned buffer."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just remove that or replace it with assert, which would always return an empty span for non-owned buffers.
} | ||
|
||
return num_read; | ||
return buffer_list->read_at_most(dest, max_nbytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should catch exceptions here and return CURL_READFUNC_ABORT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_at_most
does not throw. I can further make it noexcept
.
|
||
// Read from buffer | ||
const uint64_t bytes_in_src = src.size() - current_relative_offset_; | ||
const uint64_t bytes_from_src = std::min(bytes_in_src, bytes_left); | ||
// If the destination pointer is not null, then read into it | ||
// if it is null then we are just seeking | ||
if (dest_ptr != nullptr) | ||
RETURN_NOT_OK(src.read(dest_ptr + dest_offset, bytes_from_src)); | ||
memcpy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this get rid of checks that were previously performed by read to make sure we don't read OOB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The declaration of bytes_from_src
above ensures we don't read OOB.
TEST_CASE( | ||
"SerializationBuffer: Test memory tracking", | ||
"[serialization_buffer][memory_tracking]") { | ||
tiledb::sm::MemoryTrackerManager memory_tracker_manager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test what happens to copied/moved buffers related to memory tracking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote some tests locally and noticed the allocator does not propagate when copying a collection. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Something we need to figure out for sure.
9c8dcdf
to
5162e5c
Compare
5162e5c
to
dad6d08
Compare
@@ -2738,23 +2729,12 @@ Status query_est_result_size_serialize( | |||
case SerializationType::JSON: { | |||
::capnp::JsonCodec json; | |||
kj::String capnp_json = json.encode(est_result_size_builder); | |||
const auto json_len = capnp_json.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we loose the behavior of adding \0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I was sure I would miss a case. Fixed, thanks!
tiledb/sm/serialization/query.cc
Outdated
SerializationType serialize_type, | ||
bool clientside, | ||
CopyState* copy_state, | ||
Query* query, | ||
ThreadPool* compute_tp) { | ||
// Create an original, serialized copy of the 'query' that we will revert | ||
// to if we are unable to deserialize 'serialized_buffer'. | ||
BufferList original_bufferlist; | ||
BufferList original_bufferlist( | ||
query->resources().serialization_memory_tracker()->get_resource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very convoluted way to get the memory tracker. We have to find a way to pass it in query_deserialize without getting it from another object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting it from another object
My understanding is that this is how allocators propagate; from "parent" to "child" objects. We backup the state of a query in a temporary buffer list, and for me it makes sense to use the query's allocator.
We could get the allocator from the context argument passed in the C API, but I'd be against using it for cases where there are alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialized query buffer is not a child of the query IMO. We should do the work to properly pass the context into query_deserialize. It will require us to save the context instead of the threadpool in RestClientRemote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, we were already using the context in tiledb_deserialize_query
.
tiledb/sm/query/query.h
Outdated
@@ -918,6 +918,13 @@ class Query { | |||
*/ | |||
RestClient* rest_client() const; | |||
|
|||
/** | |||
* Returns the ContextResources associated to this query. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this. It was added as a way to get the serialization memory tracker but we should be able to get it from another source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said above, query is for me the best candidate to get the allocator.
8801934
to
300298a
Compare
09eedef
to
9d68068
Compare
9d68068
to
a3fbc9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KiterLuc's changes LGTM, thanks!
SC-50984
This PR adds support for tracking memory used by serialization buffers. The first step towards implementing this is the replacement of the
Buffer
class with the newSerializationBuffer
class, which has the following characteristics:Buffer
,SerializationBuffer
supports both owned and non-owned memory buffers. This is necessary to maintain the semantics of thetiledb_buffer_t
C API.SerializationBuffer
also allows swapping between owned and non-owned modes.SerializationBuffer
does not support thewrite
function which incrementally adds data by growing the buffer. Instead, it supports only theassign
operation, which replaces the whole existing data of the buffer.assign
copies the given data to a buffer owned bySerializationBuffer
, unless a special marker value is passed, which changes it to just store a span to the buffer.assign_null_terminated
method that sets an owned buffer and adds a null terminator at the end. This is used by JSON serialization for some reason (is it actually needed?).SerializationBuffer
does not store an offset and does not expose aread
function that copies to a user-provided buffer and advances the offset. Instead the class is implicitly convertible to a const span of bytes, which gives access to the entire buffer at once.SerializationBuffer
s are typically immutable betweenassign
ments, but some cases require subsequent modification. To cover these, theowned_mutable_span
function returns a mutable span to the buffer, but will throw if used in a non-owned buffer.SerializationBuffer
supports being used with polymorphic allocators, which enables integration with the memory tracking system. A newMemoryTracker
was added onContextResources
to track serialization-related allocations.BufferList
was also updated to storeSerializationBuffer
s. Creations of serialization buffers and buffer lists were updated to pass an allocator, and C API handles for both types were likewise updated.I also took up on the opportunity and converted all deserialization APIs that accepted a
Buffer
to accept aspan<const char>
, because they did not need any other facilities ofSerializationBuffer
.TYPE: NO_HISTORY