New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ConcurrentQueue<T> for better performance #14254

Merged
merged 3 commits into from Dec 8, 2016

Conversation

@stephentoub
Member

stephentoub commented Dec 6, 2016

This commit rewrites ConcurrentQueue to provide better throughput and a much better allocation profile. As with the previous implementation, it's structured as a linked list of segments. Unlike the previous implementation, these segments are now ring buffers, allowing them to be reused as long as they're not full; if an enqueue does make a segment full, then a new segment is created for subsequent enqueues. Also unlike the previous implementation, segments are dynamic in size, so newly allocated segments follow a doubling scheme similar to that employed by other generic collections. The synchronization mechanism used is also lower-overhead than before, with a typical operation incurring just one CAS operation to either enqueue or dequeue.

In a test that just enqueues 20M items serially:

Old - Time: 0.461s GC: 27
New - Time: 0.237s GC: 0

In a test that enqueues and dequeues 20M items serially (adding 1, taking 1, adding 1, taking 1, etc.):

Old - Time: 0.631s GC: 32
New - Time: 0.374s GC: 0

In a test that has 4 threads (quad core) each enqueueing 5M items in parallel:

Old - Time: 0.997s GC: 27
New - Time: 0.244s GC: 0

In a test that has 2 threads each enqueueing 5M items in parallel with 2 threads each dequeueing 5M items:

Old - Time: 0.621s GC: 17
New - Time: 0.121s GC: 0

Note that while most things get faster, there are a few operations that have a small downside. Any operation that needs to preserve items for observation (e.g. GetEnumerator, ToArray, TryPeek, etc.) essentially makes all current segments non-reusable. That's mostly no worse than before, where no segments were reusable, but since these segments get bigger, if you have strange usage pattern, such as enqueueing one item and then enumerating the queue, enqueueing another and then enumerating the queue, repeatedly, this implementation will end up allocating more memory than before. For example, in a test that enqueues10,000 items, but between adding each item it enumerates the whole queue, I get these results:

Old - Time: 0.404s GC: 0
New - Time: 1.091s GC: 1

I'm hopeful this is not a common pattern. If it is, we may be able to find some ways to mitigate the concern.

Fixes #14207
cc: @kouvel, @alexperovich, @ianhays, @benaadams

@i3arnon

This comment has been minimized.

Show comment
Hide comment
@i3arnon

i3arnon Dec 6, 2016

Collaborator

@stephentoub these improvements to the concurrent collections are awesome. Will they also be ported back to the full framework? Or are they limited to corefx?

Collaborator

i3arnon commented Dec 6, 2016

@stephentoub these improvements to the concurrent collections are awesome. Will they also be ported back to the full framework? Or are they limited to corefx?

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 7, 2016

Collaborator

@i3arnon I assume you can just reference the Corefx versions from full framework using nuget? https://www.nuget.org/packages/System.Collections.Concurrent/ (might be wrong...)

Collaborator

benaadams commented Dec 7, 2016

@i3arnon I assume you can just reference the Corefx versions from full framework using nuget? https://www.nuget.org/packages/System.Collections.Concurrent/ (might be wrong...)

@i3arnon

This comment has been minimized.

Show comment
Hide comment
@i3arnon

i3arnon Dec 7, 2016

Collaborator

@benaadams there's no actual dll for net45 in that package. I assume it just takes the one in the installed FX (or the GAC?). Not sure though... assembly loading & precedence was never my strong suit.

Nuget Package

image

Collaborator

i3arnon commented Dec 7, 2016

@benaadams there's no actual dll for net45 in that package. I assume it just takes the one in the installed FX (or the GAC?). Not sure though... assembly loading & precedence was never my strong suit.

Nuget Package

image

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Dec 7, 2016

Member

@i3arnon, thanks. It's not automatically ported back to desktop, but I've labeled it as netfx-port-consider, which has the effect of opening an internal work item to consider doing so.

Member

stephentoub commented Dec 7, 2016

@i3arnon, thanks. It's not automatically ported back to desktop, but I've labeled it as netfx-port-consider, which has the effect of opening an internal work item to consider doing so.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 7, 2016

Collaborator

Awesome change! Does this also need to be ported to coreclr ConcurrentQueue? Not sure how the two types interact?

Aside: Comparing the results with ConcurrentBag #14126 does this still make ConcurrentQueue the preferred go to data structure of the two? (and more so after this change)

Are there any learnings that could be applied back to ConcurrentBag; as in theory since its is weaker ordered and makes use of thread locals could be faster due to less contention. Or is it the differential because the example perf tests are always hitting an enumeration of steal; which is worse case for Bag, than Queue - so is more the scenario tested?

Collaborator

benaadams commented Dec 7, 2016

Awesome change! Does this also need to be ported to coreclr ConcurrentQueue? Not sure how the two types interact?

Aside: Comparing the results with ConcurrentBag #14126 does this still make ConcurrentQueue the preferred go to data structure of the two? (and more so after this change)

Are there any learnings that could be applied back to ConcurrentBag; as in theory since its is weaker ordered and makes use of thread locals could be faster due to less contention. Or is it the differential because the example perf tests are always hitting an enumeration of steal; which is worse case for Bag, than Queue - so is more the scenario tested?

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Dec 7, 2016

Member

Does this also need to be ported to coreclr ConcurrentQueue? Not sure how the two types interact?

The ConcurrentQueue that's internal to coreclr has minimal usage. I don't think it's worthwhile for performance reasons; the primary benefit would be in keeping them in sync (but it might be worthwhile for that reason alone).

Comparing the results with ConcurrentBag #14126 does this still make ConcurrentQueue the preferred go to data structure of the two? (and more so after this change)

"go to" for what purpose? First, the results I included in the two PRs aren't directly comparable. And which data structure makes more sense will depend on the situation. For example, if you're doing something producer/consumer, with dedicated producers and dedicated consumers, ConcurrentQueue is going to be better than ConcurrentBag, e.g. this test:

using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        const int Iters = 10000000;
        var sw = new Stopwatch();
        while (true)
        {
            var cb = new ConcurrentBag<int>();
            var cq = new ConcurrentQueue<int>();

            sw.Restart();
            Task.WaitAll(
                Task.Run(() =>
                {
                    for (int i = 0; i < Iters; i++) cb.Add(i);
                }),
                Task.Run(() =>
                {
                    int item;
                    for (int i = 0; i < Iters; i++) while(!cb.TryTake(out item));
                }));
            sw.Stop();
            Console.WriteLine("Bag  : " + sw.Elapsed);
            GC.Collect();

            sw.Restart();
            Task.WaitAll(
                Task.Run(() =>
                {
                    for (int i = 0; i < Iters; i++) cq.Enqueue(i);
                }),
                Task.Run(() =>
                {
                    int item;
                    for (int i = 0; i < Iters; i++) while(!cq.TryDequeue(out item));
                }));
            sw.Stop();
            Console.WriteLine("Queue: " + sw.Elapsed);
            GC.Collect();

            Console.WriteLine();
        }
    }
}

on my machine has ConcurrentQueue performing 5x better than ConcurrentBag. But for example a situation where each thread is doing its own adds and takes but wants to have the whole collection available to it for takes if needed, ConcurrentBag scales much better, e.g. this test:

using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        const int Iters = 10000000;
        var sw = new Stopwatch();
        while (true)
        {
            var cb = new ConcurrentBag<int>();
            var cq = new ConcurrentQueue<int>();

            sw.Restart();
            Parallel.For(0, Iters, i =>
            {
                int item;
                cb.Add(i);
                cb.TryTake(out item);
            });
            sw.Stop();
            Console.WriteLine("Bag  : " + sw.Elapsed);
            GC.Collect();

            sw.Restart();
            Parallel.For(0, Iters, i =>
            {
                int item;
                cq.Enqueue(i);
                cq.TryDequeue(out item);
            });
            sw.Stop();
            Console.WriteLine("Queue: " + sw.Elapsed);
            GC.Collect();

            Console.WriteLine();
        }
    }
}

on my machine has ConcurrentBag performing 5x better than ConcurrentQueue.

Member

stephentoub commented Dec 7, 2016

Does this also need to be ported to coreclr ConcurrentQueue? Not sure how the two types interact?

The ConcurrentQueue that's internal to coreclr has minimal usage. I don't think it's worthwhile for performance reasons; the primary benefit would be in keeping them in sync (but it might be worthwhile for that reason alone).

Comparing the results with ConcurrentBag #14126 does this still make ConcurrentQueue the preferred go to data structure of the two? (and more so after this change)

"go to" for what purpose? First, the results I included in the two PRs aren't directly comparable. And which data structure makes more sense will depend on the situation. For example, if you're doing something producer/consumer, with dedicated producers and dedicated consumers, ConcurrentQueue is going to be better than ConcurrentBag, e.g. this test:

using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        const int Iters = 10000000;
        var sw = new Stopwatch();
        while (true)
        {
            var cb = new ConcurrentBag<int>();
            var cq = new ConcurrentQueue<int>();

            sw.Restart();
            Task.WaitAll(
                Task.Run(() =>
                {
                    for (int i = 0; i < Iters; i++) cb.Add(i);
                }),
                Task.Run(() =>
                {
                    int item;
                    for (int i = 0; i < Iters; i++) while(!cb.TryTake(out item));
                }));
            sw.Stop();
            Console.WriteLine("Bag  : " + sw.Elapsed);
            GC.Collect();

            sw.Restart();
            Task.WaitAll(
                Task.Run(() =>
                {
                    for (int i = 0; i < Iters; i++) cq.Enqueue(i);
                }),
                Task.Run(() =>
                {
                    int item;
                    for (int i = 0; i < Iters; i++) while(!cq.TryDequeue(out item));
                }));
            sw.Stop();
            Console.WriteLine("Queue: " + sw.Elapsed);
            GC.Collect();

            Console.WriteLine();
        }
    }
}

on my machine has ConcurrentQueue performing 5x better than ConcurrentBag. But for example a situation where each thread is doing its own adds and takes but wants to have the whole collection available to it for takes if needed, ConcurrentBag scales much better, e.g. this test:

using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        const int Iters = 10000000;
        var sw = new Stopwatch();
        while (true)
        {
            var cb = new ConcurrentBag<int>();
            var cq = new ConcurrentQueue<int>();

            sw.Restart();
            Parallel.For(0, Iters, i =>
            {
                int item;
                cb.Add(i);
                cb.TryTake(out item);
            });
            sw.Stop();
            Console.WriteLine("Bag  : " + sw.Elapsed);
            GC.Collect();

            sw.Restart();
            Parallel.For(0, Iters, i =>
            {
                int item;
                cq.Enqueue(i);
                cq.TryDequeue(out item);
            });
            sw.Stop();
            Console.WriteLine("Queue: " + sw.Elapsed);
            GC.Collect();

            Console.WriteLine();
        }
    }
}

on my machine has ConcurrentBag performing 5x better than ConcurrentQueue.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 7, 2016

Collaborator

The ConcurrentQueue that's internal to coreclr has minimal usage.

Wasn't sure if it was forwarded - never know these days 😉

depend on the situation... ConcurrentQueue performing 5x better than ConcurrentBag... ConcurrentBag performing 5x better than ConcurrentQueue.

K, hoping that was so - thank you for verifying.

Collaborator

benaadams commented Dec 7, 2016

The ConcurrentQueue that's internal to coreclr has minimal usage.

Wasn't sure if it was forwarded - never know these days 😉

depend on the situation... ConcurrentQueue performing 5x better than ConcurrentBag... ConcurrentBag performing 5x better than ConcurrentQueue.

K, hoping that was so - thank you for verifying.

// initial segment length; if these observations are happening frequently,
// this will help to avoid wasted memory, and if they're not, we'll
// relatively quickly grow again to a larger size.
int nextSize = tail._preservedForObservation ? InitialSegmentLength : tail.Capacity * 2;

This comment has been minimized.

@benaadams

benaadams Dec 8, 2016

Collaborator

Should this remain at current capacity rather than dropping back to initial size? e.g.

int nextSize = tail._preservedForObservation ? tail.Capacity : tail.Capacity * 2;

Else freezing operations will be even more disruptive for large queues?

@benaadams

benaadams Dec 8, 2016

Collaborator

Should this remain at current capacity rather than dropping back to initial size? e.g.

int nextSize = tail._preservedForObservation ? tail.Capacity : tail.Capacity * 2;

Else freezing operations will be even more disruptive for large queues?

This comment has been minimized.

@benaadams

benaadams Dec 8, 2016

Collaborator

Hmm... maybe not - as it may suggest freezing operations are common; thus making most of the larger segments unusable? If they are not common then it will grow between the freezes. Makes sense... Is that the approach?

@benaadams

benaadams Dec 8, 2016

Collaborator

Hmm... maybe not - as it may suggest freezing operations are common; thus making most of the larger segments unusable? If they are not common then it will grow between the freezes. Makes sense... Is that the approach?

This comment has been minimized.

@benaadams

benaadams Dec 8, 2016

Collaborator

Which is kinda what it says in the comment 😝

@benaadams

benaadams Dec 8, 2016

Collaborator

Which is kinda what it says in the comment 😝

This comment has been minimized.

@stephentoub

stephentoub Dec 8, 2016

Member

Which is kinda what it says in the comment

😉

@stephentoub

stephentoub Dec 8, 2016

Member

Which is kinda what it says in the comment

😉

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 8, 2016

Collaborator

LGTM - The approach taken makes sense; and its doing the right thing; however I've not extensively verified line by line.

Collaborator

benaadams commented Dec 8, 2016

LGTM - The approach taken makes sense; and its doing the right thing; however I've not extensively verified line by line.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 8, 2016

Collaborator

Perhaps add something to the <remarks> comment of TryPeek that it is a more costly operation, and suggest using IsEmpty in preference if its just to determine if there is potentially a next item? As there is in Count.

Collaborator

benaadams commented Dec 8, 2016

Perhaps add something to the <remarks> comment of TryPeek that it is a more costly operation, and suggest using IsEmpty in preference if its just to determine if there is potentially a next item? As there is in Count.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 8, 2016

Collaborator

Thinking aloud...

I worry about increasing the cost of Count as it has a high usage relative to Enqueue

Some usage may be replaceable with IsEmpty; speculating, another use may be to apply backpressure?

However, Introducing a "cheaper" way of counting, for example Interlocked.Increment/Decrement to approximate count would increase the cost for regular operations; when it may not be used which I don't think would be good. (And going for a more accurate count would likely be more costly)

If it is used for backpressure then a wrapping type would need to be used to apply the backpressure; and that could also perform the counting and bear the cost - and there are other bounded Producer/Consumer data structures that already provide this.

So... wondering if a blog post or release note announcement item should go with this change? Just for awareness.

Collaborator

benaadams commented Dec 8, 2016

Thinking aloud...

I worry about increasing the cost of Count as it has a high usage relative to Enqueue

Some usage may be replaceable with IsEmpty; speculating, another use may be to apply backpressure?

However, Introducing a "cheaper" way of counting, for example Interlocked.Increment/Decrement to approximate count would increase the cost for regular operations; when it may not be used which I don't think would be good. (And going for a more accurate count would likely be more costly)

If it is used for backpressure then a wrapping type would need to be used to apply the backpressure; and that could also perform the counting and bear the cost - and there are other bounded Producer/Consumer data structures that already provide this.

So... wondering if a blog post or release note announcement item should go with this change? Just for awareness.

Rewrite ConcurrentQueue for better performance
This commit rewrites ConcurrentQueue<T> to provide better throughput and a much better allocation profile.  As with the previous implementation, it's structured as a linked list of segments.  Unlike the previous implementation, these segments are now ring buffers, allowing them to be reused as long as they're not full; if an enqueue does make a segment full, then a new segment is created for subsequent enqueues.  Also unlike the previous implementation, segments are dynamic in size, so newly allocated segments follow a doubling scheme similar to that employed by other generic collections.  The synchronization mechanism used is also lower-overhead than before, with a typical operation incurring just one CAS operation to either enqueue or dequeue.
@kouvel

kouvel approved these changes Dec 8, 2016

Nice!

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 8, 2016

Collaborator

Note: As a follow up, I don't think the performance should be generally compromised for Count as there are a lot of use cases that don't use Count.

At high throughput it is a bit of a Schrödinger's value; and already different after measuring. It's either high velocity of change/thoughput; point in time accuracy; or the user has a race condition...

Its just the high usage that apisof suggests that's troubling.

Collaborator

benaadams commented Dec 8, 2016

Note: As a follow up, I don't think the performance should be generally compromised for Count as there are a lot of use cases that don't use Count.

At high throughput it is a bit of a Schrödinger's value; and already different after measuring. It's either high velocity of change/thoughput; point in time accuracy; or the user has a race condition...

Its just the high usage that apisof suggests that's troubling.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Dec 8, 2016

Member

As a follow up, I don't think the performance should be generally compromised for Count as there are a lot of use cases that don't use Count.

Yes, I stayed away from doing anything that would hurt perf if Count wasn't used. A very accurate moment-in-time Count could be produced, for example, by maintaining an interlocked count that's incremented on enqueuer and decremented on dequeuer, but that would be terrible for throughput and scalability. Other tricks could be played, like a per-thread count, where enqueues increment the local count and dequeues decrement it, and then accessing Count loops through and sums all of the TLS counts, but a) that still impacts throughput by adding TLS accesses on every operation, and b) you don't get an accurate count.

Its just the high usage that apisof suggests that's troubling.

Just because Count is accessed doesn't tell us anything about its patterns of access. It could be, for example, be that Count is used at the end of the operation to determine how many elements are left, or for example at the beginning of usage when it's wrapped in some other data structure that will maintain its own count and needs to know how many elements are in the queue to begin with.

That said, I pushed another commit to try to limit the impact if Count is used. It follows the same general approach that the current implementation follows, trying to get a stable picture of the queue's state and then computing a count based on that:
http://source.dot.net/#System.Collections.Concurrent/System/Collections/Concurrent/ConcurrentQueue.cs,346
I limited it to just being for when there are one or two segments in the queue, which, other than when the queue is initially growing to a stable size or when lots of other freezing operations are being performed (in which case Count's additional freezing won't matter), should be the common cases.

Member

stephentoub commented Dec 8, 2016

As a follow up, I don't think the performance should be generally compromised for Count as there are a lot of use cases that don't use Count.

Yes, I stayed away from doing anything that would hurt perf if Count wasn't used. A very accurate moment-in-time Count could be produced, for example, by maintaining an interlocked count that's incremented on enqueuer and decremented on dequeuer, but that would be terrible for throughput and scalability. Other tricks could be played, like a per-thread count, where enqueues increment the local count and dequeues decrement it, and then accessing Count loops through and sums all of the TLS counts, but a) that still impacts throughput by adding TLS accesses on every operation, and b) you don't get an accurate count.

Its just the high usage that apisof suggests that's troubling.

Just because Count is accessed doesn't tell us anything about its patterns of access. It could be, for example, be that Count is used at the end of the operation to determine how many elements are left, or for example at the beginning of usage when it's wrapped in some other data structure that will maintain its own count and needs to know how many elements are in the queue to begin with.

That said, I pushed another commit to try to limit the impact if Count is used. It follows the same general approach that the current implementation follows, trying to get a stable picture of the queue's state and then computing a count based on that:
http://source.dot.net/#System.Collections.Concurrent/System/Collections/Concurrent/ConcurrentQueue.cs,346
I limited it to just being for when there are one or two segments in the queue, which, other than when the queue is initially growing to a stable size or when lots of other freezing operations are being performed (in which case Count's additional freezing won't matter), should be the common cases.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 8, 2016

Collaborator

Nice, all comments addressed 😄

Collaborator

benaadams commented Dec 8, 2016

Nice, all comments addressed 😄

Modify concurrent collections tests to detect tearing
Modifies one of the tests of concurrently using peeks and adds/takes to use a large value type that's prone to being torn in order to verify it's not.

@stephentoub stephentoub merged commit 1e19076 into dotnet:master Dec 8, 2016

6 checks passed

Innerloop CentOS7.1 Debug Build and Test Build finished.
Details
Innerloop CentOS7.1 Release Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Debug Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Release Build and Test Build finished.
Details
Innerloop Windows_NT Debug Build and Test Build finished.
Details
Innerloop Windows_NT Release Build and Test Build finished.
Details

@stephentoub stephentoub deleted the stephentoub:new_concurrentqueue branch Dec 8, 2016

@karelz karelz modified the milestone: 1.2.0 Dec 9, 2016

@MendelMonteiro

This comment has been minimized.

Show comment
Hide comment
@MendelMonteiro

MendelMonteiro Jan 2, 2017

It would be useful to add a constructor which initialises the first segment to an initial size. Was there a specific reason why this was not done other than passing through the ctor(IEnumerable) ?

MendelMonteiro commented Jan 2, 2017

It would be useful to add a constructor which initialises the first segment to an initial size. Was there a specific reason why this was not done other than passing through the ctor(IEnumerable) ?

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Jan 2, 2017

Member

It would be useful to add a constructor which initialises the first segment to an initial size

Because that suggests a specific implementation be used for such an initial size to be useful. Case in point, the previous implementation used fixed size segments that were not reusable, such that such a ctor could actually have been detrimental rather than helpful. While such a ctor could be useful in this implementation in some situations, it still suffers from some of the same issues in others, e.g. various operations on the queue make a segment non reusable or even not open to subsequent enqueues. And I'm hesitant to see APIs added that lock the type into a specific implementation.

Member

stephentoub commented Jan 2, 2017

It would be useful to add a constructor which initialises the first segment to an initial size

Because that suggests a specific implementation be used for such an initial size to be useful. Case in point, the previous implementation used fixed size segments that were not reusable, such that such a ctor could actually have been detrimental rather than helpful. While such a ctor could be useful in this implementation in some situations, it still suffers from some of the same issues in others, e.g. various operations on the queue make a segment non reusable or even not open to subsequent enqueues. And I'm hesitant to see APIs added that lock the type into a specific implementation.

@MendelMonteiro

This comment has been minimized.

Show comment
Hide comment
@MendelMonteiro

MendelMonteiro Jan 2, 2017

I agree that the parameter should not be called firstSegmentSize but rather initialCapacity and I can see that my comment was not clear on that point.

The implementation can then choose whether to create one first segment of initialCapacity or multiple segments of increasing size. This is exactly what happens when we use the ctor that accepts an IEnumerable<T> (and implementing ICollection) so it's already part of the public API.

MendelMonteiro commented Jan 2, 2017

I agree that the parameter should not be called firstSegmentSize but rather initialCapacity and I can see that my comment was not clear on that point.

The implementation can then choose whether to create one first segment of initialCapacity or multiple segments of increasing size. This is exactly what happens when we use the ctor that accepts an IEnumerable<T> (and implementing ICollection) so it's already part of the public API.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Jan 2, 2017

Member

This is exactly what happens when we use the ctor that accepts an IEnumerable so it's already part of the public API.

That ctor doesn't guarantee anything about how much space is initially allocated, just that it'll initialize the collection to store the specified items; it could, for example, end up allocating much more space than that. In contrast, adding a ctor that's explicitly about specifying an initial capacity hints at a stronger guarantee that it won't allocate space to store more items, but that couldn't be satisfied, for example, with the previous implementation that used constant sized segments... even in this implementation, it would still often end up needing to allocate a larger segment, as segments in this implementation are required to be powers of 2, so most values would need to be rounded up.

The two main reasons to specify a capacity to a collection are to avoid unnecessary resizing and to avoid over-allocating storage space. As noted, the latter reason doesn't really hold in this case, as we'd likely end up over-allocating anyway. It's true there are some situations with this implementation where the former could be useful, in that if you could guarantee that no one would ever call one of the operations that freezes a segment (e.g. enumeration, peeking, etc.) and if you could guarantee that the number of elements wouldn't exceed the specified capacity, then you could know that you were only ever allocating a single segment and you could avoid unnecessarily creating additional segments. But that's relatively rare, and with other implementations (like the previous one), it really doesn't buy you that much, due to the non-reusability of the segments.

I'd want to see a very strong performance-motivated use case where such pre-sizing in a real app really made a measurable difference before considering adding such an API.

Member

stephentoub commented Jan 2, 2017

This is exactly what happens when we use the ctor that accepts an IEnumerable so it's already part of the public API.

That ctor doesn't guarantee anything about how much space is initially allocated, just that it'll initialize the collection to store the specified items; it could, for example, end up allocating much more space than that. In contrast, adding a ctor that's explicitly about specifying an initial capacity hints at a stronger guarantee that it won't allocate space to store more items, but that couldn't be satisfied, for example, with the previous implementation that used constant sized segments... even in this implementation, it would still often end up needing to allocate a larger segment, as segments in this implementation are required to be powers of 2, so most values would need to be rounded up.

The two main reasons to specify a capacity to a collection are to avoid unnecessary resizing and to avoid over-allocating storage space. As noted, the latter reason doesn't really hold in this case, as we'd likely end up over-allocating anyway. It's true there are some situations with this implementation where the former could be useful, in that if you could guarantee that no one would ever call one of the operations that freezes a segment (e.g. enumeration, peeking, etc.) and if you could guarantee that the number of elements wouldn't exceed the specified capacity, then you could know that you were only ever allocating a single segment and you could avoid unnecessarily creating additional segments. But that's relatively rare, and with other implementations (like the previous one), it really doesn't buy you that much, due to the non-reusability of the segments.

I'd want to see a very strong performance-motivated use case where such pre-sizing in a real app really made a measurable difference before considering adding such an API.

@MendelMonteiro

This comment has been minimized.

Show comment
Hide comment
@MendelMonteiro

MendelMonteiro Jan 2, 2017

I understand and agree with your argument about the ambiguity that would be created by introducing a parameter that then does something slightly different.

I don't have any measurements yet but am working with the idea of performing as much allocation as possible up-front (and not potentially on the critical path) in mind.

MendelMonteiro commented Jan 2, 2017

I understand and agree with your argument about the ambiguity that would be created by introducing a parameter that then does something slightly different.

I don't have any measurements yet but am working with the idea of performing as much allocation as possible up-front (and not potentially on the critical path) in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment