Avoid freezing ConcurrentQueue segments in Count #18035

stephentoub · 2018-05-17T20:52:50Z

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.

It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.

Fixes https://github.com/dotnet/corefx/issues/29759
cc: @kouvel, @benaadams, @tarekgh

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.

tarekgh · 2018-05-17T21:25:44Z

src/System.Private.CoreLib/shared/System/Collections/Concurrent/ConcurrentQueue.cs

+                                    // based on those and add them. Note that this and the below additions to count may overflow: previous
+                                    // implementations allowed that, so we don't check, either, and it is theoretically possible for the
+                                    // queue to store more than int.MaxValue items.
+                                    int count = GetCount(head, headHead, headTail) + GetCount(tail, tailHead, tailTail);


GetCount [](start = 84, length = 9)

I am wondering why we don't store the segment's items count inside the segment so we don't have to recalculate it every time?

Because it changes on every enqueue and dequeue, which can happen concurrently (multiple of each), and it would then be very expensive to then maintain an accurate count in the segment.

that is because we don't take any lock on enqueue/dequeue?

In reply to: 189106709 [](ancestors = 189106709)

Should the head + tail counts be added last after all the internal frozen segments have been counted?

The head and tail segments aren't frozen so can have stuff added or removed during the evaluation of the count? (e.g. the queue is not completely locked only the intermediate segments or changing segments)

So in the time elapsed during accumulating the intermediate segments counts; the head and tail segments may have changed count? (hence getting them last).

Though I may be misunderstanding

i.e it returns a correct point in time count; however it could return a more recent point in time count (e.g. if there were 2000 segments, it would take a slight time to enumerate them all)

That makes sense to me.

It doesn't matter when the GetCount methods are called; those are deterministic based on the inputs to them, which are captured much earlier on and passed in here.

danmoseley · 2018-05-18T05:03:32Z

src/System.Private.CoreLib/shared/System/Collections/Concurrent/ConcurrentQueue.cs

+                // trying to compute the count.  Once we've taken the lock, we know that any internal
+                // segments (not the head and not the tail) won't have their counts changed, as you
+                // can't dequeue from or enqueue to an internal segment.  Thus, we only need to get
+                // a stable count for the head and tail segments, 


danmoseley · 2018-05-18T05:10:58Z

src/System.Private.CoreLib/shared/System/Collections/Concurrent/ConcurrentQueue.cs

@@ -276,26 +276,29 @@ public int Count
        {
            get
            {
-                ConcurrentQueueSegment<T> head, tail;
-                int headHead, headTail, tailHead, tailTail;
+                // Take the cross-segment lock to prevent _head and _tail from changing while we're


You're not taking the lock at this point but the comment says you are.

Oops, leftover from a refactoring. Will fix.

danmoseley · 2018-05-18T05:12:56Z

src/System.Private.CoreLib/shared/System/Collections/Concurrent/ConcurrentQueue.cs

                        if (head == _head &&
-                            head == _tail &&
+                            tail == _tail &&


Was the previous comparison here a mistake?

No, head and tail contain the same value, as validated by the if statement above. I just felt it was clearer and more obvious to use tail instead of head.

kouvel · 2018-05-18T17:34:11Z

It looks like this is changing the CoreLib version of ConcurrentQueue, there's another version in corefx in System.Collections.Concurrent. Isn't that the one that would be used by the public type?

kouvel · 2018-05-18T17:41:58Z

Ah ok it looks like the intent was to use to CoreLib versions and remove the CoreFX versions but that hasn't been done yet

stephentoub · 2018-05-18T17:46:16Z

Ah ok it looks like the intent was to use to CoreLib versions and remove the CoreFX versions but that hasn't been done yet

Yeah, it's in progress. Once corefx ingests the latest coreclr build, @maryamariyan will finish the type forwarding. So I figured it wasn't worth mucking with the about-to-be-deleted copy in corefx.

kouvel · 2018-05-18T18:16:07Z

src/System.Private.CoreLib/shared/System/Collections/Concurrent/ConcurrentQueue.cs

+                                if (headHead == Volatile.Read(ref head._headAndTail.Head) &&
+                                    headTail == Volatile.Read(ref head._headAndTail.Tail) &&
+                                    tailHead == Volatile.Read(ref tail._headAndTail.Head) &&
+                                    tailTail == Volatile.Read(ref tail._headAndTail.Tail))


Is it possible for the double-checking on the head and tail segments' head and tail here and above to cause this API to spin for a very long time if enqueue/dequeue are happening very actively?

Would it be better to:

Double-check for 1 and 2 segment cases as above once

But on failure of initial condition or the double-check, fall back to lock and inside the lock, don't double-check

Typically the lock should not be contended (once the queue stabilizes in size and the tail does not change frequently)

Is it possible for the double-checking on the head and tail segments' head and tail here and above to cause this API to spin for a very long time if enqueue/dequeue are happening very actively?

Yes, it's possible. (That's true today as well.)

inside the lock, don't double-check

Can you explain this?

Yes, it's possible. (That's true today as well.)

For the 1 and 2 segment cases it's true today, but now it's also true when there are multiple segments (though that's a less interesting case). I wonder if even that is necessary. Count only provides a snapshot and it can't be relied upon without appropriate synchronization by callers, why double-check at all?

inside the lock, don't double-check

I mean inside the lock, can we just read the values and compute the counts without double-checking new values with previous values?

Generally wondering if the spinning can be removed altogether.

why double-check at all?

The problem is that you can end up with totally wrong values, e.g.

The queue is empty.

We read the head position as 0.

An item is then enqueued and deqeued, 10 times.

We read the tail position as 10.

Without double-checking, we compute the Count as 10, even though at no point in the queue's history did it ever contain more than 1 item.

Count only provides a snapshot and it can't be relied upon without appropriate synchronization by callers, why double-check at all?

It does provide a correct snapshot; even though it may have changed after measuring; so you still need to either get the head+tail count while the segments are locked; or verify its still valid - otherwise it may not be a head+tail count that matches any state of the locked segments? (i.e. a count that never existed)

The problem is that you can end up with totally wrong values

otherwise it may not be a head+tail count that matches any state of the locked segments? (i.e. a count that never existed)

Aren't those true despite the double-checking? I figured the double-checking would only make it less likely but not impossible. Is it less likely enough to be worth it? If double-checking for the 1 and 2 segment cases is worth it, would it be better to fall back to a lock on failure as I suggested above

Aren't those true despite the double-checking? I figured the double-checking would only make it less likely but not impossible.

For it to not be true there would have to be more ~4 billion items enqueued and dequeued between the checks, in order for the values to wrap around. And with the cross-segment lock being taken, that's not possible.

One possible thing we could do in the locked 2+ case is have a loop within the lock. While holding the lock, the only operations that can be performed are dequeueing from the head segment and enqueueing to the tail segment, which means the max number of operations that can occur is the number of currently queued items in the head segment and the number of empty spaces in the tail segment. That's a finite number, worst case 2M enqueues and dequeues due to MaxSegmentLength being 1M. But, it's still a very big number.

Ah ok I see how that works, makes sense

* Avoid freezing ConcurrentQueue segments in Count In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating. * Remove stale comment Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* Avoid freezing ConcurrentQueue segments in Count In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments. It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating. * Remove stale comment Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

…clr#18035)" This reverts commit 59edecc.

tarekgh reviewed May 17, 2018

View reviewed changes

tarekgh approved these changes May 17, 2018

View reviewed changes

benaadams approved these changes May 17, 2018

View reviewed changes

danmoseley reviewed May 18, 2018

View reviewed changes

danmoseley approved these changes May 18, 2018

View reviewed changes

Remove stale comment

2464ce2

kouvel reviewed May 18, 2018

View reviewed changes

kouvel approved these changes May 18, 2018

View reviewed changes

stephentoub merged commit 6e82320 into dotnet:master May 19, 2018

stephentoub deleted the cqcount branch May 19, 2018 02:07

stephentoub added a commit to dotnet/corefx that referenced this pull request May 21, 2018

Revert "Avoid freezing ConcurrentQueue segments in Count (dotnet/core…

7cb875b

…clr#18035)" This reverts commit 59edecc.

lewurm mentioned this pull request Feb 1, 2019

[2018-08] Bump corert mono/mono#12721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid freezing ConcurrentQueue segments in Count #18035

Avoid freezing ConcurrentQueue segments in Count #18035

stephentoub commented May 17, 2018

tarekgh May 17, 2018

stephentoub May 17, 2018

tarekgh May 17, 2018

stephentoub May 17, 2018

benaadams May 17, 2018

stephentoub May 17, 2018

benaadams May 17, 2018

benaadams May 18, 2018

danmoseley May 18, 2018

stephentoub May 18, 2018

danmoseley May 18, 2018

danmoseley May 18, 2018

stephentoub May 18, 2018

danmoseley May 18, 2018

stephentoub May 18, 2018

kouvel commented May 18, 2018

kouvel commented May 18, 2018

stephentoub commented May 18, 2018

kouvel May 18, 2018 •

edited

Loading

stephentoub May 18, 2018

kouvel May 18, 2018

stephentoub May 18, 2018 •

edited

Loading

benaadams May 18, 2018

kouvel May 18, 2018 •

edited

Loading

stephentoub May 18, 2018 •

edited

Loading

stephentoub May 18, 2018

kouvel May 19, 2018

Avoid freezing ConcurrentQueue segments in Count #18035

Avoid freezing ConcurrentQueue segments in Count #18035

Conversation

stephentoub commented May 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kouvel commented May 18, 2018

kouvel commented May 18, 2018

stephentoub commented May 18, 2018

kouvel May 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub May 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kouvel May 18, 2018 • edited Loading

Choose a reason for hiding this comment

stephentoub May 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kouvel May 18, 2018 •

edited

Loading

stephentoub May 18, 2018 •

edited

Loading

kouvel May 18, 2018 •

edited

Loading

stephentoub May 18, 2018 •

edited

Loading