Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
BlockingCollection<T>.TryTakeFromAny throws InvalidOperationException when underlying collection is ConcurrentBag<T> #30781
When the underlying collection for
This behavior is present in .NET Core 2.0/2.1 but not in .NET Framework 4.6.1.
EDIT: This can be reproduced without
EDIT 2: This does not reproduce with an underlying collection of type
EDIT 3: Updated repro to use ThreadPool instead of tasks - it reproduces much more frequently now.
I believe the problem is that when we rewrote ConcurrentBag for .NET Core to significantly improve its performance, as one small part of that we effectively removed this check:
The problem with that, as this repro highlights, is that if multiple threads are taking/stealing, it's possible that a thread may miss an item if it's taken by another thread. Consider a situation with four threads each with their own local queue, all of which are currently empty. Then consider this ordering of operations:
So, even though there was an item in the collection that could have been taken, it missed it.
This sequence highlights why "Updated repro to use ThreadPool instead of tasks - it reproduces much more frequently now" made a difference: ThreadPool.QueueUserWorkItem(callback) puts work items into the global queue, whereas Task.Run from a thread pool thread puts the task into the thread's queue... that means the thread that's doing the add is very likely to keep doing adds rather than takes, which means it'll be much less likely to get into a situation like with steps (1) and (4) in the above sequence, where the same thread needed to add then take.
Unfortunately I think we're going to need to put back some kind of versioning check, where steals that fail check the versions, and if anything's been added since, it tries again.
Thank you for taking a look, @stephentoub.
One effect of this bug is that some items which are added to the collection cannot be retrieved even after successive calls to
Maybe this is because
referenced this issue
Jul 10, 2018
referenced this issue
Jul 12, 2018
@ReubenBond you want https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/dogfooding.md in particular I think you need https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/dogfooding.md#option-1-framework-dependent (because this fix has not yet been built into a full product build)
It worked! This change fixes the issue we were experiencing.
As a bonus, I see a massive speed improvement in our little Orleans repro project when running netcoreapp3.0 with the latest nightly SDK compared to netcoreapp2.0 on SDK 2.1.301.
Before: 41.48 ms per iteration on average
EDIT: I'm not sure of the etiquette/workflow for corefx, but please feel free to close this.
Thanks for validating!
Did you try with netcoreapp2.1? My assumption is that's where the bulk of the wins are coming from, though there has already been some additional perf work for netcoreapp3.0, just not as much.
Thanks. We'll close it when we either close or merge the release/2.1 port PR.
ConcurrentBag.TryTake may fail to take an item from the collection even if it’s known to be there. This in turn causes problems for wrappers that assume if they know the collection contains an item that TryTake will succeed, like BlockingCollection. Race conditions can result in BlockingCollection throwing exceptions and getting into a corrupted state due to TryTake failing to return an item when it should have been able to.
Exceptions / corrupted data structures / deadlocks when multiple threads access a BlockingCollection wrapped around a ConcurrentBag and race in a manner that results in takes on the bag failing.
Regression from 1.x