Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Avoid freezing ConcurrentQueue segments in Count #18035

Merged
merged 2 commits into from
May 19, 2018

Conversation

stephentoub
Copy link
Member

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment). However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this. Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.

It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments. Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers. Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable. That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.

Fixes https://github.com/dotnet/corefx/issues/29759
cc: @kouvel, @benaadams, @tarekgh

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment).  However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this.  Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.

It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments.  Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers.  Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable.  That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.
// based on those and add them. Note that this and the below additions to count may overflow: previous
// implementations allowed that, so we don't check, either, and it is theoretically possible for the
// queue to store more than int.MaxValue items.
int count = GetCount(head, headHead, headTail) + GetCount(tail, tailHead, tailTail);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetCount [](start = 84, length = 9)

I am wondering why we don't store the segment's items count inside the segment so we don't have to recalculate it every time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it changes on every enqueue and dequeue, which can happen concurrently (multiple of each), and it would then be very expensive to then maintain an accurate count in the segment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is because we don't take any lock on enqueue/dequeue?


In reply to: 189106709 [](ancestors = 189106709)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the head + tail counts be added last after all the internal frozen segments have been counted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The head and tail segments aren't frozen so can have stuff added or removed during the evaluation of the count? (e.g. the queue is not completely locked only the intermediate segments or changing segments)

So in the time elapsed during accumulating the intermediate segments counts; the head and tail segments may have changed count? (hence getting them last).

Though I may be misunderstanding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e it returns a correct point in time count; however it could return a more recent point in time count (e.g. if there were 2000 segments, it would take a slight time to enumerate them all)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter when the GetCount methods are called; those are deterministic based on the inputs to them, which are captured much earlier on and passed in here.

// trying to compute the count. Once we've taken the lock, we know that any internal
// segments (not the head and not the tail) won't have their counts changed, as you
// can't dequeue from or enqueue to an internal segment. Thus, we only need to get
// a stable count for the head and tail segments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit comma,

@@ -276,26 +276,29 @@ public int Count
{
get
{
ConcurrentQueueSegment<T> head, tail;
int headHead, headTail, tailHead, tailTail;
// Take the cross-segment lock to prevent _head and _tail from changing while we're
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not taking the lock at this point but the comment says you are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, leftover from a refactoring. Will fix.

if (head == _head &&
head == _tail &&
tail == _tail &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the previous comparison here a mistake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, head and tail contain the same value, as validated by the if statement above. I just felt it was clearer and more obvious to use tail instead of head.

@kouvel
Copy link
Member

kouvel commented May 18, 2018

It looks like this is changing the CoreLib version of ConcurrentQueue, there's another version in corefx in System.Collections.Concurrent. Isn't that the one that would be used by the public type?

@kouvel
Copy link
Member

kouvel commented May 18, 2018

Ah ok it looks like the intent was to use to CoreLib versions and remove the CoreFX versions but that hasn't been done yet

@stephentoub
Copy link
Member Author

Ah ok it looks like the intent was to use to CoreLib versions and remove the CoreFX versions but that hasn't been done yet

Yeah, it's in progress. Once corefx ingests the latest coreclr build, @maryamariyan will finish the type forwarding. So I figured it wasn't worth mucking with the about-to-be-deleted copy in corefx.

if (headHead == Volatile.Read(ref head._headAndTail.Head) &&
headTail == Volatile.Read(ref head._headAndTail.Tail) &&
tailHead == Volatile.Read(ref tail._headAndTail.Head) &&
tailTail == Volatile.Read(ref tail._headAndTail.Tail))
Copy link
Member

@kouvel kouvel May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the double-checking on the head and tail segments' head and tail here and above to cause this API to spin for a very long time if enqueue/dequeue are happening very actively?

Would it be better to:

  • Double-check for 1 and 2 segment cases as above once
  • But on failure of initial condition or the double-check, fall back to lock and inside the lock, don't double-check

Typically the lock should not be contended (once the queue stabilizes in size and the tail does not change frequently)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the double-checking on the head and tail segments' head and tail here and above to cause this API to spin for a very long time if enqueue/dequeue are happening very actively?

Yes, it's possible. (That's true today as well.)

inside the lock, don't double-check

Can you explain this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible. (That's true today as well.)

For the 1 and 2 segment cases it's true today, but now it's also true when there are multiple segments (though that's a less interesting case). I wonder if even that is necessary. Count only provides a snapshot and it can't be relied upon without appropriate synchronization by callers, why double-check at all?

inside the lock, don't double-check

I mean inside the lock, can we just read the values and compute the counts without double-checking new values with previous values?

Generally wondering if the spinning can be removed altogether.

Copy link
Member Author

@stephentoub stephentoub May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why double-check at all?

The problem is that you can end up with totally wrong values, e.g.

  • The queue is empty.
  • We read the head position as 0.
  • An item is then enqueued and deqeued, 10 times.
  • We read the tail position as 10.
  • Without double-checking, we compute the Count as 10, even though at no point in the queue's history did it ever contain more than 1 item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count only provides a snapshot and it can't be relied upon without appropriate synchronization by callers, why double-check at all?

It does provide a correct snapshot; even though it may have changed after measuring; so you still need to either get the head+tail count while the segments are locked; or verify its still valid - otherwise it may not be a head+tail count that matches any state of the locked segments? (i.e. a count that never existed)

Copy link
Member

@kouvel kouvel May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that you can end up with totally wrong values

otherwise it may not be a head+tail count that matches any state of the locked segments? (i.e. a count that never existed)

Aren't those true despite the double-checking? I figured the double-checking would only make it less likely but not impossible. Is it less likely enough to be worth it? If double-checking for the 1 and 2 segment cases is worth it, would it be better to fall back to a lock on failure as I suggested above

Copy link
Member Author

@stephentoub stephentoub May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't those true despite the double-checking? I figured the double-checking would only make it less likely but not impossible.

For it to not be true there would have to be more ~4 billion items enqueued and dequeued between the checks, in order for the values to wrap around. And with the cross-segment lock being taken, that's not possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible thing we could do in the locked 2+ case is have a loop within the lock. While holding the lock, the only operations that can be performed are dequeueing from the head segment and enqueueing to the tail segment, which means the max number of operations that can occur is the number of currently queued items in the head segment and the number of empty spaces in the tail segment. That's a finite number, worst case 2M enqueues and dequeues due to MaxSegmentLength being 1M. But, it's still a very big number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I see how that works, makes sense

@stephentoub stephentoub merged commit 6e82320 into dotnet:master May 19, 2018
@stephentoub stephentoub deleted the cqcount branch May 19, 2018 02:07
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request May 19, 2018
* Avoid freezing ConcurrentQueue segments in Count

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment).  However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this.  Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.

It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments.  Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers.  Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable.  That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.

* Remove stale comment

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
stephentoub added a commit to dotnet/corefx that referenced this pull request May 19, 2018
* Avoid freezing ConcurrentQueue segments in Count

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment).  However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this.  Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.

It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments.  Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers.  Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable.  That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.

* Remove stale comment

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request May 19, 2018
* Avoid freezing ConcurrentQueue segments in Count

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment).  However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this.  Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.

It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments.  Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers.  Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable.  That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.

* Remove stale comment

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request May 19, 2018
* Avoid freezing ConcurrentQueue segments in Count

In .NET Core 2.0, we changed the implementation of ConcurrentQueue to allow segment reuse, making enqueues ammoritzed allocation-free, whereas previously every enqueue had some allocation cost (even if it was batched as part of allocating a segment).  However, some operations mark segments as being frozen, which then prevents subsequent enqueues into those segments; this is necessary for enumeration, but Count also caused this.  Ideally Count isn't used on hot paths, but if it is, this can end up in degenerate situations where, for example, if Count is called after every enqueue, we'll end up creating a new segment for every enqueued item, which is incredibly inefficient both for the enqueues and for Count, which is O(N) in the number of segments.

It turns out, though, that we were overly cautious in implementing Count, and we don't actually need it to freeze the segments.  Instead, when there are more than two segments (the case where we previously froze), we can take the cross-segment lock, which is the same lock that's held any time we update the head and tail segment pointers.  Once that lock is held, we know that the internal segments won't change, because code can only enqueue/dequeue from the head and tail segments, so any segment that's not head and tail while the lock is held is effectively immutable.  That means that we can simply walk those segments and add up their counts, safe in knowing they won't change while we're enumerating.

* Remove stale comment

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub added a commit to dotnet/corefx that referenced this pull request May 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants