-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Avoid freezing ConcurrentQueue segments in Count #18035
Conversation
In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.
// based on those and add them. Note that this and the below additions to count may overflow: previous | ||
// implementations allowed that, so we don't check, either, and it is theoretically possible for the | ||
// queue to store more than int.MaxValue items. | ||
int count = GetCount(head, headHead, headTail) + GetCount(tail, tailHead, tailTail); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetCount [](start = 84, length = 9)
I am wondering why we don't store the segment's items count inside the segment so we don't have to recalculate it every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it changes on every enqueue and dequeue, which can happen concurrently (multiple of each), and it would then be very expensive to then maintain an accurate count in the segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is because we don't take any lock on enqueue/dequeue?
In reply to: 189106709 [](ancestors = 189106709)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the head
+ tail
counts be added last after all the internal frozen segments have been counted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The head and tail segments aren't frozen so can have stuff added or removed during the evaluation of the count? (e.g. the queue is not completely locked only the intermediate segments or changing segments)
So in the time elapsed during accumulating the intermediate segments counts; the head and tail segments may have changed count? (hence getting them last).
Though I may be misunderstanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e it returns a correct point in time count; however it could return a more recent point in time count (e.g. if there were 2000 segments, it would take a slight time to enumerate them all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter when the GetCount methods are called; those are deterministic based on the inputs to them, which are captured much earlier on and passed in here.
// trying to compute the count. Once we've taken the lock, we know that any internal | ||
// segments (not the head and not the tail) won't have their counts changed, as you | ||
// can't dequeue from or enqueue to an internal segment. Thus, we only need to get | ||
// a stable count for the head and tail segments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit comma,
@@ -276,26 +276,29 @@ public int Count | |||
{ | |||
get | |||
{ | |||
ConcurrentQueueSegment<T> head, tail; | |||
int headHead, headTail, tailHead, tailTail; | |||
// Take the cross-segment lock to prevent _head and _tail from changing while we're |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not taking the lock at this point but the comment says you are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, leftover from a refactoring. Will fix.
if (head == _head && | ||
head == _tail && | ||
tail == _tail && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the previous comparison here a mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, head and tail contain the same value, as validated by the if statement above. I just felt it was clearer and more obvious to use tail instead of head.
It looks like this is changing the CoreLib version of ConcurrentQueue, there's another version in corefx in System.Collections.Concurrent. Isn't that the one that would be used by the public type? |
Ah ok it looks like the intent was to use to CoreLib versions and remove the CoreFX versions but that hasn't been done yet |
Yeah, it's in progress. Once corefx ingests the latest coreclr build, @maryamariyan will finish the type forwarding. So I figured it wasn't worth mucking with the about-to-be-deleted copy in corefx. |
if (headHead == Volatile.Read(ref head._headAndTail.Head) && | ||
headTail == Volatile.Read(ref head._headAndTail.Tail) && | ||
tailHead == Volatile.Read(ref tail._headAndTail.Head) && | ||
tailTail == Volatile.Read(ref tail._headAndTail.Tail)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for the double-checking on the head and tail segments' head and tail here and above to cause this API to spin for a very long time if enqueue/dequeue are happening very actively?
Would it be better to:
- Double-check for 1 and 2 segment cases as above once
- But on failure of initial condition or the double-check, fall back to lock and inside the lock, don't double-check
Typically the lock should not be contended (once the queue stabilizes in size and the tail does not change frequently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for the double-checking on the head and tail segments' head and tail here and above to cause this API to spin for a very long time if enqueue/dequeue are happening very actively?
Yes, it's possible. (That's true today as well.)
inside the lock, don't double-check
Can you explain this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's possible. (That's true today as well.)
For the 1 and 2 segment cases it's true today, but now it's also true when there are multiple segments (though that's a less interesting case). I wonder if even that is necessary. Count only provides a snapshot and it can't be relied upon without appropriate synchronization by callers, why double-check at all?
inside the lock, don't double-check
I mean inside the lock, can we just read the values and compute the counts without double-checking new values with previous values?
Generally wondering if the spinning can be removed altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why double-check at all?
The problem is that you can end up with totally wrong values, e.g.
- The queue is empty.
- We read the head position as 0.
- An item is then enqueued and deqeued, 10 times.
- We read the tail position as 10.
- Without double-checking, we compute the Count as 10, even though at no point in the queue's history did it ever contain more than 1 item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count only provides a snapshot and it can't be relied upon without appropriate synchronization by callers, why double-check at all?
It does provide a correct snapshot; even though it may have changed after measuring; so you still need to either get the head+tail count while the segments are locked; or verify its still valid - otherwise it may not be a head+tail count that matches any state of the locked segments? (i.e. a count that never existed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that you can end up with totally wrong values
otherwise it may not be a head+tail count that matches any state of the locked segments? (i.e. a count that never existed)
Aren't those true despite the double-checking? I figured the double-checking would only make it less likely but not impossible. Is it less likely enough to be worth it? If double-checking for the 1 and 2 segment cases is worth it, would it be better to fall back to a lock on failure as I suggested above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't those true despite the double-checking? I figured the double-checking would only make it less likely but not impossible.
For it to not be true there would have to be more ~4 billion items enqueued and dequeued between the checks, in order for the values to wrap around. And with the cross-segment lock being taken, that's not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible thing we could do in the locked 2+ case is have a loop within the lock. While holding the lock, the only operations that can be performed are dequeueing from the head segment and enqueueing to the tail segment, which means the max number of operations that can occur is the number of currently queued items in the head segment and the number of empty spaces in the tail segment. That's a finite number, worst case 2M enqueues and dequeues due to MaxSegmentLength being 1M. But, it's still a very big number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok I see how that works, makes sense
* Avoid freezing ConcurrentQueue segments in Count In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating. * Remove stale comment Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
* Avoid freezing ConcurrentQueue segments in Count In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating. * Remove stale comment Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
* Avoid freezing ConcurrentQueue segments in Count In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating. * Remove stale comment Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Avoid freezing ConcurrentQueue segments in Count In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating. * Remove stale comment Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
…clr#18035)" This reverts commit 59edecc.
In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.
It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.
Fixes https://github.com/dotnet/corefx/issues/29759
cc: @kouvel, @benaadams, @tarekgh