Ensure CRubyBuffer respects max_size #1715

marcotc · 2021-10-07T23:22:59Z

This PR ensures that CRubyBuffer continuously tries to maintain the internal buffer to less then @max_size items.

This is done by "clamping" the internal array to @max_size when we detect at any point that the buffer is full. This eventually catches any race modifications that would cause the buffer to exceed its maximum size.

marcotc

@ivoanjo I can think of a few ways to insert strategic Thread.pass around this code to make it exceed the internal buffer capacity, but surprisingly I wasn't able to make it exceed the limit.
Due to this, I'm thinking this implementation is now empirically thread safe 😬, unless we can find a whole in it.

Do you see any possible issues with the implementation as of right now @ivoanjo? If you can find an artificial wait point that breaks it I would appreciate it, because I can't find one anymore.

codecov-commenter · 2021-10-07T23:29:51Z

Codecov Report

Merging #1715 (73c4630) into master (8f697cc) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1715   +/-   ##
=======================================
  Coverage   98.18%   98.18%           
=======================================
  Files         934      934           
  Lines       45122    45121    -1     
=======================================
+ Hits        44301    44302    +1     
+ Misses        821      819    -2

Impacted Files	Coverage Δ
lib/ddtrace/buffer.rb	`95.34% <100.00%> (ø)`
spec/ddtrace/buffer_spec.rb	`98.22% <100.00%> (-0.02%)`	⬇️
lib/ddtrace/diagnostics/environment_logger.rb	`100.00% <0.00%> (+1.94%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f697cc...73c4630. Read the comment docs.

delner

I like that there's not a "leniency" and that max size is max size. Looks good!

ivoanjo

@ivoanjo I can think of a few ways to insert strategic Thread.pass around this code to make it exceed the internal buffer capacity, but surprisingly I wasn't able to make it exceed the limit.
Due to this, I'm thinking this implementation is now empirically thread safe 😬, unless we can find a whole in it.

Hmm initially I was going to say I was happy but after staring at it for a while I'm not quite anymore lol.

There's a few weird effects I can think of (and simulate with a few sleeps):

If only adds are made, we can still get into trouble, e.g. if I run the test with thread_count = 200, max_size = 100 but each thread only adds a single item AND we add the sleep in #full?, then we're back to a similar issue as in CRubyTraceBuffer can exceed its max size and that leads to flaky tests #1704: all 200 threads decide to add, and all add and go on their merry way.
Because replace is implemented as delete + add, this means that during the execution of replace, between the delete and the add, if you additionally add a sleep there, we can get again into the issue in 1) -- first thread observes a full buffer, decides to replace, removes item, sleeps. All other threads observe a non-full buffer, decide to add.
If all threads decide to replace at the same time, a third weird effect can happen: the buffer can actually be observed as being empty, although it shouldn't. E.g. imagine that thread_count = 100, max_size = 100, and the buffer starts full. Let's say that all 100 threads try to add a single item, decide that they need to replace, and then run delete and then sleep. This means that we can catch them "in the act" and actually find an empty buffer because all 100 threads deleted an item and none yet added their replacement.

So the issues I still identify are:

a. Add can still go over -- may or may not be problematic; e.g. I think 1) above isn't very, but 2) is
b. Replace using delete+add is not atomic, so weird things can happen in between (and between it and add).

My suggestion for a. is also adding slice! to a; I'm still a bit unconvinced that a slice! that effectively does nothing is expensive enough (but if we fix b I think it's less of a problem).

My suggestion for b. is to not use delete+add, but instead just replace a given position. This would mean that the array size would become an invariant after all threads agree that the array is full. E.g. once no thread is inside an add codepath, if we always replace, the array size never changes and thus 2. and 3. cannot happen anymore.

The one downside here is that there's no array API to "replace an item in an array, and give me back the old item". But from my inspection, it doesn't seem that relevant for #replace! to return the old item (it's just a "nice to have"), so we can just change the method contract to not do that. (Because otherwise we could have a race where two threads decide to replace the same location and both claim to have replaced the same item).

marcotc · 2021-10-15T22:11:59Z

My suggestion for a. is also adding slice! to a; I'm still a bit unconvinced that a slice! that effectively does nothing is expensive enough (but if we fix b I think it's less of a problem).

I don't think we can have a "thread-safe" slice! operation that we can perform in add! that also does not mess up with the fair trace sampling when the buffer is full:

If we slice! before @items << item we are no better than checking for full? again.
If we slice! after @items << item, we are possibly dropping the item we just inserted at the end of the array. If we slice! from the beginning of @items the operation requires the whole array to be moved in memory every single time.

My suggestion for b. is to not use delete+add, but instead just replace a given position. This would mean that the array size would become an invariant after all threads agree that the array is full. E.g. once no thread is inside an add codepath, if we always replace, the array size never changes and thus 2. and 3. cannot happen anymore.

I've made this change as suggested.

The one downside here is that there's no array API to "replace an item in an array, and give me back the old item". But from my inspection, it doesn't seem that relevant for #replace! to return the old item (it's just a "nice to have"), so we can just change the method contract to not do that. (Because otherwise we could have a race where two threads decide to replace the same location and both claim to have replaced the same item).

I don't think this is a problematic downside. I'm made changes to replace! that now don't return the old item anymore.

marcotc · 2021-10-15T22:15:54Z

I can get the internal @items to spike, like you said @ivoanjo. It does get correct on the next item added to the buffer, but in case of a super mad short burst of insertions the buffer can temporarily be longer than the limit.

I don't think this is a large issue, given that those objects that the application want to very quickly insert into the buffer are already taking their place in the heap, and we'll reap them from the Buffer in the very next opportunity, not letting them accumulate memory more than they already have in the heap.

ivoanjo

I don't think we can have a "thread-safe" slice! operation that we can perform in add! that also does not mess up with the fair trace sampling when the buffer is full:
[...]
I don't think this is a large issue, given that those objects that the application want to very quickly insert into the buffer are already taking their place in the heap, and we'll reap them from the Buffer in the very next opportunity, not letting them accumulate memory more than they already have in the heap.

I don't think it's a big issue either -- I was about to click approve in the previous revision before I spotted the weird interaction during the replace.

I will say a few things though, just to mess with you :p

I don't think this race would be common enough for us to care this much about fairness under the race conditions. E.g. since losing the fairness would be the corner case, I could live with that trade-off either.
I don't think we're keeping the fairness as you intended. Yes, we don't slice at the end of add, but we're still appending to the end. Thus, once the first replace kicks in, we'll slice! all of it anyway. So the fairness only really gets preserved in that tiny window of opportunity between the last add running, and the first replace kicking in. 😉
It would be nice if we were to document some of these hidden expectations from the class, as I was searching a bit and I don't think we ever reference our intentions, which I think would be relevant once neither of us are around ;)

lib/ddtrace/buffer.rb

marcotc · 2021-10-18T19:11:37Z

I documented the desire to fairly sample traces.

Thus, once the first replace kicks in, we'll slice! all of it anyway.

I agree. It doesn't seem like we have good performant options to strictly ensure correctness and fairness in Ruby today.

Ensure CRubyBuffer respects max_size

b500c91

marcotc added the core Involves Datadog core libraries label Oct 7, 2021

marcotc self-assigned this Oct 7, 2021

marcotc requested a review from a team October 7, 2021 23:23

marcotc commented Oct 7, 2021

View reviewed changes

delner approved these changes Oct 8, 2021

View reviewed changes

ivoanjo reviewed Oct 11, 2021

View reviewed changes

marcotc added 2 commits October 15, 2021 14:49

Merge branch 'master' into max-size-cruby

b338281

Atomic replace!

73c4630

marcotc requested a review from ivoanjo October 15, 2021 22:15

ivoanjo approved these changes Oct 18, 2021

View reviewed changes

lib/ddtrace/buffer.rb Outdated Show resolved Hide resolved

Address comments

aa53035

marcotc merged commit 72892df into master Oct 18, 2021

marcotc deleted the max-size-cruby branch October 18, 2021 19:29

github-actions bot added this to the 0.54.0 milestone Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure CRubyBuffer respects max_size #1715

Ensure CRubyBuffer respects max_size #1715

marcotc commented Oct 7, 2021

marcotc left a comment

codecov-commenter commented Oct 7, 2021 •

edited

Loading

delner left a comment

ivoanjo left a comment

marcotc commented Oct 15, 2021 •

edited

Loading

marcotc commented Oct 15, 2021

ivoanjo left a comment

marcotc commented Oct 18, 2021

Ensure CRubyBuffer respects max_size #1715

Ensure CRubyBuffer respects max_size #1715

Conversation

marcotc commented Oct 7, 2021

marcotc left a comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 7, 2021 • edited Loading

Codecov Report

delner left a comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

marcotc commented Oct 15, 2021 • edited Loading

marcotc commented Oct 15, 2021

ivoanjo left a comment

Choose a reason for hiding this comment

marcotc commented Oct 18, 2021

codecov-commenter commented Oct 7, 2021 •

edited

Loading

marcotc commented Oct 15, 2021 •

edited

Loading