Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: BCL additions to support C# 12's "Collection Literals" feature. #87569

Closed
CyrusNajmabadi opened this issue Jun 14, 2023 · 40 comments · Fixed by #88470
Closed
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections blocking Marks issues that we want to fast track in order to unblock other important work
Milestone

Comments

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jun 14, 2023

EDITED by @stephentoub on 6/26/2023 with updated remaining APIs for review

main...stephentoub:runtime:immutablecollectionsbuilder

namespace System.Runtime.CompilerServices
{
+   [AttributeUsage(AttributeTargets.Class | AttributeTargets.Interface | AttributeTargets.Struct, Inherited = false, AllowMultiple = true)]
+   public sealed class CollectionBuilderAttribute : Attribute
+   {
+       public CollectionBuilderAttribute(Type builderType, string methodName);
+       public Type BuilderType { get; }
+       public string MethodName { get; }
+   }
}

namespace System.Collections.Immutable
{
+   [CollectionBuilder(typeof(ImmutableArray), "Create")]
    public readonly struct ImmutableArray<T> { ... }

+   [CollectionBuilder(typeof(ImmutableHashSet), "Create")]
    public sealed class ImmutableHashSet<T> { ... }

+   [CollectionBuilder(typeof(ImmutableList), "Create")]
    public sealed class ImmutableList<T> { ... }

+   [CollectionBuilder(typeof(ImmutableQueue), "Create")]
    public sealed class ImmutableQueue<T> { ... }

+   [CollectionBuilder(typeof(ImmutableSortedSet), "Create")]
    public sealed class ImmutableSortedSet<T> { ... }

+   [CollectionBuilder(typeof(ImmutableStack), "Create")]
    public sealed class ImmutableStack<T> { ... }
}

For C# 12 / .NET 8, the attribute will only recognize the pattern CollectionType Method(ReadOnlySpan). However, the compiler may special-case system types it cares about to do something more efficient based on its knowledge of how they work, in particular for List<T> (which is otherwise supported via its support for collection initializers) and ImmutableArray<T> (which is otherwise supported via copy via the attribute to use Create). We can add more supported patterns in the future.


Background and motivation

the C# design group is moving forward with a proposal for a lightweight syntax construct collections: csharplang/collection-literals.md

We intend to ship this in C# 12 for linear collections (like List<T>​, ImmutableArray<T>​, HashSet<T>​, etc.). We also intend to support this for map collections (like Dictionary<TKey, TValue>​) though that may only be in 'preview' in C# 12.

Part of this proposal involves being able to efficiently construct certain collections (like List<T>​), as well as construct immutable collections (which have generally never worked with the existing new ImmutableXXX<int>() { 1, 2, 3 }​ form). To that end, working with @stephentoub and @captainsafia , we've come up with a set of API proposals we'd like to work through with the runtime team to allow these types to "light up" with this language feature. This would be expected to align with the release of C#12.

API Proposal

The API shape is as follows (with all naming/shaping open to bike shedding):

A new attribute to be placed on a type to specify where to find the method responsible for constructing it:

namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Interface, Inherited = false, AllowMultiple = true)]
    public sealed class CollectionLiteralsBuilderAttribute : Attribute
    {
        /// <summary>Initialize the attribute to refer to the <paramref name="methodName"/> method on the <paramref name="builderType"/> type.</summary>
        /// <param name="builderType">The type of the builder to use to construct the collection.</param>
        /// <param name="methodName">
        /// The method on the builder to use to construct the collection. This must refer to a static method,
        /// or if the <paramref name="builderType"/> is also the type of the collection being constructed, it
        /// may be an empty string to indicate a constructor should be used.
        /// </param>
        public CollectionLiteralsBuilderAttribute(Type builderType, string methodName)
        {
            BuilderType = builderType;
            MethodName = methodName;
        }

        /// <summary>Gets the type of the builder to use to construct the collection.</summary>
        public Type BuilderType { get; }

        /// <summary>Gets the name of the method on the builder to use to construct the collection.</summary>
        public string MethodName { get; }
    }
}

For the purposes of this proposal, you can consider this attribute added to every type whose factory method the proposal suggests.

Pattern1: Specialized entrypoints to efficiently construct List<T> and ImmutableArray<T>.

We would like to be able to construct these two types as efficiently as possible.

[CollectionLiteralsBuilder(typeof(CollectionsMarshal), Name = "Create")]
public class List<T> { }

[CollectionLiteralsBuilder(typeof(CollectionsMarshal), Name = "Create")]
public struct ImmutableArray<T> { }

public static class CollectionsMarshal
{
    public static void Create<T>(int capacity, out List<T> list, out Span<T> storage);
    public static void Create<T>(int capacity, out ImmutableArray<T> list, out Span<T> storage);
}

These apis are placed in CollectionsMarshal as they are viewed as too advanced to be something we want to expose directly on the types themselves.

Here, the compiler would ask the runtime to create an instance of these types with an explicit capacity, and also be given direct access to the underlying storage backing these types. The compiler would then translate code like so:

// User code:
ImmutableArray<string> values = ["a", "b", "c"];

// Translation:
CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out Span<T> __storage);
__storage[0] = "a";
__storage[1] = "b";
__storage[2] = "c";

This would be practically the lowest overhead possible, and would allow users to use immutable collections as easily as normal collections.

Note: the above api would incur a copy cost in the case of code involving async/await (as the compiler would need to make its own array that it populated, which it would then copy into the Span given by this api). We could avoid any copy overhead entirely if the above api were instead:

public static class CollectionsMarshal
{
    public static void Create<T>(int capacity, out List<T> list, out T[] storage);
    public static void Create<T>(int capacity, out ImmutableArray<T> list, out T[] storage);
}

However, there was some reservation about an API (even one in CollectionsMarshal) directly exposing the array to operate over. Language/Compiler will use whatever api is provided here. So if you are comfortable on your end exposing this, then we'll use it. If not, we can always use the Span approach (with the additional cost in async contexts).

Pattern2: ReadOnlySpan<T> entrypoints to construct existing collection types.

This would allow lower overhead construction of certain types (no need for intermediary heap allocations), as well as providing a mechanism for constructing Immutable types (without intermediary builder allocations).

// Note: attributes will be placed on the actual immutable types to point to these factory methods.
// These have been elided for brevity.

public static class ImmutableHashSet
{
+    public static System.Collections.Immutable.ImmutableHashSet<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableList
{
+    public static System.Collections.Immutable.ImmutableList<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableQueue
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableSortedSet
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableStack
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

+[CollectionLiteralsBuilder(typeof(Stack<>), ".ctor")] // demonstrates how the attribute can point to a constructor.
public class Stack<T>
{
+    public Stack(ReadOnlySpan<T> collection);
}

+[CollectionLiteralsBuilder(typeof(Queue<>), ".ctor")]
public class Queue<T>
{
+    public Stack(Queue<T> collection);
}

API Usage

// User code:
ImmutableHashSet<string> values = ["a", "b", "c"];

// Translation:
ReadOnlySpan<string> storage = ["a", "b", "c"];
ImmutableHashSet<string> values = ImmutableHashSet.Create<string>(storage);
// User code:
ImmutableArray<string> values = ["a", "b", "c"];

// Translation:
CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out Span<T> __storage);
__storage[0] = "a";
__storage[1] = "b";
__storage[2] = "c";

Alternative Designs

  1. In the List<T>/ImmutableArray<T> apis we have the options of returning the underlying array as a span or as an array. e.g.:
public Span<T> Create<T>(int capacity, out List<T> result);
public T[] Create<T>(int capacity, out List<T> result);

The latter would work better in async contexts, but has caused a small amount of concern about directly exposing values that could be placed on the heap. However, this capability is already possible today according to Stephen, so perhaps that is fine.

  1. In the List<T>/ImmutableArray<T> we could use multiple out-params instead of out+return. i.e.:
public void Create<T>(int capacity, out List<T> result, out Span<T> storage); // or
public void Create<T>(int capacity, out List<T> result, out T[] storage);

If there is only a single method exposed, this is mainly a stylistic difference. If we did want both methods, this would allow for overloads.

Risks

Adding overloads to existing IEnumerable<T> taking methods introduced ambiguity errors for existing compilers. Specifically:

void M(IEnumerable<int> values); // existing method
void M(ReadOnlySpan<int> values); // new overload

M(new int[] { 1, 2, 3 }); // ambiguity error.

The C# team is working through a proposal to make this not an error, and to prefer the ReadOnlySpan version as the preferred overload. This strongly matches the intuition we have that if you have overloads like this, they will have the same semantics, just that the latter will be lower overhead.

However, this may cause issues on other compilers that haven't updated, or on older compilers. To that end, we may want to emit the new ReadOnlySpan overload with a modreq so that older compilers do not consider it a viable option. Newer compilers will be ok with it and will then switch which overload they call on recompile.

This work is being tracked here: dotnet/csharplang#7276

@CyrusNajmabadi CyrusNajmabadi added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 14, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 14, 2023
@ghost
Copy link

ghost commented Jun 14, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

the C# design group is moving forward with a proposal for a lightweight syntax construct collections: csharplang/collection-literals.md

We intend to ship this in C# 12 for linear collections (like List​, ImmutableArray​, HashSet​, etc.). We also intend to support this for map collections (like Dictionary<K,V>​) though that may only be in 'preview' in C# 12.

Part of this proposal involves being able to efficiently construct certain collections (like List​), as well as construct immutable collections (which have generally never worked with the existing "new ImmutableXXX() { 1, 2, 3 }​" form). To that end, working with @stephentoub and @captainsafia , we've come up with a set of API proposals we'd like to work through with the runtime team to allow these types to "light up" with this language feature. This would be expected to align with the release of C#12.

API Proposal

The API shape is as follows (with all naming/shaping open to bike shedding):

A new attribute to be placed on a type to specify where to find the method responsible for constructing it:

namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Interface, Inherited = false, AllowMultiple = true)]
    public sealed class CollectionLiteralsBuilderAttribute : Attribute
    {
        /// <summary>Initialize the attribute to refer to the <paramref name="methodName"/> method on the <paramref name="builderType"/> type.</summary>
        /// <param name="builderType">The type of the builder to use to construct the collection.</param>
        /// <param name="methodName">
        /// The method on the builder to use to construct the collection. This must refer to a static method,
        /// or if the <paramref name="builderType"/> is also the type of the collection being constructed, it
        /// may be an empty string to indicate a constructor should be used.
        /// </param>
        public CollectionLiteralsBuilderAttribute(Type builderType, string methodName)
        {
            BuilderType = builderType;
            MethodName = methodName;
        }

        /// <summary>Gets the type of the builder to use to construct the collection.</summary>
        public Type BuilderType { get; }

        /// <summary>Gets the name of the method on the builder to use to construct the collection.</summary>
        public string MethodName { get; }
    }
}

For the purposes of this proposal, you can consider this attribute added to every type whose factory method the proposal suggests.

Pattern1: Specialized entrypoints to efficiently construct List<T> and ImmutableArray<T>.

We would like to be able to construct these two types as efficiently as possible.

[CollectionLiteralsBuilder(typeof(CollectionsMarshal), Name = "Create")]
public class List<T> { }

[CollectionLiteralsBuilder(typeof(CollectionsMarshal), Name = "Create")]
public struct ImmutableArray<T> { }

public static class CollectionsMarshal
{
    public static void Create<T>(int capacity, out List<T> list, out Span<T> storage);
    public static void Create<T>(int capacity, out ImmutableArray<T> list, out Span<T> storage);
}

These apis are placed in CollectionsMarshal as they are viewed as too advanced to be something we want to expose directly on the types themselves.

Here, the compiler would ask the runtime to create an instance of these types with an explicit capacity, and also be given direct access to the underlying storage backing these types. The compiler would then translate code like so:

ImmutableArray<string> values = ["a", "b", "c"];

// Translation:
CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out Span<T> __storage);
__storage[0] = "a";
__storage[1] = "b";
__storage[2] = "c";

This would be practically the lowest overhead possible, and would allow users to use immutable collections as easily as normal collections.

Note: the above api would incur a copy cost in the case of code involving async/await (as the compiler would need to make its own array that it populated, which it would then copy into the Span given by this api). We could avoid any copy overhead entirely if the above api were instead:

public static class CollectionsMarshal
{
    public static void Create<T>(int capacity, out List<T> list, out T[] storage);
    public static void Create<T>(int capacity, out ImmutableArray<T> list, out T[] storage);
}

However, there was some reservation about an API (even one in CollectionsMarshal) directly exposing the array to operate over. Language/Compiler will use whatever api is provided here. So if you are comfortable on your end exposing this, then we'll use it. If not, we can always use the Span approach (with the additional cost in async contexts).

Pattern2: ReadOnlySpan<T> entrypoints to construct existing collection types.

This would allow lower overhead construction of certain types (no need for intermediary heap allocations), as well as providing a mechanism for constructing Immutable types (without intermediary builder allocations).

// Note: attributes will be placed on the actual immutable types to point to these factory methods.  I've elided that for brevity.

public static class ImmutableHashSet
{
+    public static System.Collections.Immutable.ImmutableHashSet<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableList
{
+    public static System.Collections.Immutable.ImmutableList<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableQueue
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableSortedSet
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableStack
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

+[CollectionLiteralsBuilder(typeof(Stack<>), ".ctor")] // demonstrates how the attribute can point to a constructor.
public class Stack<T>
{
+    public Stack(ReadOnlySpan<T> collection);
}

+[CollectionLiteralsBuilder(typeof(Queue<>), ".ctor")]
public class Queue<T>
{
+    public Stack(Queue<T> collection);
}

API Usage

ImmutableHashSet<string> values = ["a", "b", "c"];

// translation:
ReadOnlySpan<string> storage = ["a", "b", "c"];
ImmutableHashSet<string> values = ImmutableHashSet.Create<string>(storage);
ImmutableArray<string> values = ["a", "b", "c"];

// Translation:
CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out Span<T> __storage);
__storage[0] = "a";
__storage[1] = "b";
__storage[2] = "c";

Alternative Designs

  1. In the List<T>/ImmutableArray<T> apis we have the options of returning the underlying array as a span or as an array. e.g.:
public Span<T> Create<T>(int capacity, out List<T> result);
public T[] Create<T>(int capacity, out List<T> result);

The latter would work better in async contexts, but has caused a small amount of concern about directly exposing values that could be placed on the heap. However, this capability is already possible today according to Stephen, so perhaps that is fine.

  1. In the List<T>/ImmutableArray<T> we could use multiple out-params instead of out+return. i.e.:
public void Create<T>(int capacity, out List<T> result, out Span<T> storage); // or
public void Create<T>(int capacity, out List<T> result, out T[] storage);

If there is only a single method exposed, this is mainly a stylistic difference. If we did want both methods, this would allow for overloads.

Risks

Adding overloads to existing IEnumerable<T> taking methods introduced ambiguity errors for existing compilers. Specifically:

void M(IEnumerable<int> values); // existing method
void M(ReadOnlySpan<int> values); // new overload

M(new int[] { 1, 2, 3 }); // ambiguity error.

The C# team is working through a proposal to make this not an error, and to prefer the ReadOnlySpan version as the preferred overload. This strongly matches the intuition we have that if you have overloads like this, they will have the same semantics, just that the latter will be lower overhead.

However, this may cause issues on other compilers that haven't updated, or on older compilers. To that end, we may want to emit the new ReadOnlySpan overload with a modreq so that older compilers do not consider it a viable option. Newer compilers will be ok with it and will then switch which overload they call on recompile.

Author: CyrusNajmabadi
Assignees: -
Labels:

api-suggestion, area-System.Collections

Milestone: -

@stephentoub
Copy link
Member

stephentoub commented Jun 14, 2023

The exact set of APIs I have prototyped here:
main...stephentoub:runtime:collectionliterals
is this:

namespace System.Collections.Immutable
{
+   [CollectionLiteralsBuilder(typeof(ImmutableCollectionsMarshal), "Create")]
    public readonly struct ImmutableArray<T> { ... }

    public static class ImmutableHashSet
    {
+       public static ImmutableHashSet<T> Create<T>(ReadOnlySpan<T> items);
+       public static ImmutableHashSet<T> Create<T>(IEqualityComparer<T>? equalityComparer, ReadOnlySpan<T> items);
    }

+   [CollectionLiteralsBuilder(typeof(ImmutableHashSet), "Create")]
    public sealed class ImmutableHashSet<T> { ... }

    public static class ImmutableList
    {
+       public static ImmutableList<T> Create<T>(ReadOnlySpan<T> items);
    }

+   [CollectionLiteralsBuilder(typeof(ImmutableList), "Create")]
    public sealed class ImmutableList<T> { ... }

    public static class ImmutableQueue
    {
+       public static ImmutableQueue<T> Create<T>(ReadOnlySpan<T> items);
    }

+   [CollectionLiteralsBuilder(typeof(ImmutableQueue), "Create")]
    public sealed class ImmutableQueue<T> { ... }

    public static class ImmutableSortedSet
    {
+       public static ImmutableSortedSet<T> Create<T>(ReadOnlySpan<T> items);
+       public static ImmutableSortedSet<T> Create<T>(IComparer<T>? comparer, ReadOnlySpan<T> items);
    }

+   [CollectionLiteralsBuilder(typeof(ImmutableSortedSet), "Create")]
    public sealed class ImmutableSortedSet<T> { ... }

    public static class ImmutableStack
    {
+       public static ImmutableStack<T> Create<T>(ReadOnlySpan<T> items);
    }

+   [CollectionLiteralsBuilder(typeof(ImmutableStack), "Create")]
    public sealed class ImmutableStack<T> { ... }
}

namespace System.Runtime.InteropServices
{
    public static partial class CollectionsMarshal
    {
+       public static Span<T> Create<T>(int count, out List<T> list);
        // Alternative: List<T> AsList(T[])
    }

    public static class ImmutableCollectionsMarshal
    {
+        public static Span<T> Create<T>(int length, out ImmutableArray<T> array);
          // Alternative: don't add this if we don't add Create for List<T>
    }
}

namespace System.Runtime.CompilerServices
{
+   [AttributeUsage(AttributeTargets.Class | AttributeTargets.Interface | AttributeTargets.Struct, Inherited = false, AllowMultiple = true)]
+   public sealed class CollectionLiteralsBuilderAttribute : Attribute
+   {
+       public CollectionLiteralsBuilderAttribute(Type builderType, string methodName);
+       public Type BuilderType { get; }
+       public string MethodName { get; }
+   }
}

Once dotnet/csharplang#7276 lands and we validate we're comfortable with any cross-language impact, we will also want:

namespace System.Collections.Generic
{
    // This isn't strictly necessary; it's only for efficiency.
    public class HashSet<T>
    {
+       public HashSet(ReadOnlySpan<T> collection);
+       public HashSet(ReadOnlySpan<T> collection, IEqualityComparer? comparer);
    }

    // With the current rules, these are necessary to use Queue/Stack<T> with collection literals at all
    public class Queue<T>
    {
+       public Queue(ReadOnlySpan<T> collection);
    }
    public class Stack<T>
    {
+       public Stack(ReadOnlySpan<T> collection);
    }

    // This would just be for consistency.  We should also consider moving the AddRange(ReadOnlySpan) extension back to being an instance method on List.
    public class List<T>
    {
+       public List(ReadOnlySpan<T> collection);
    }
}

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Jun 14, 2023
@stephentoub stephentoub added this to the 8.0.0 milestone Jun 14, 2023
@stephentoub
Copy link
Member

I've marked this as ready for review so we can work through any issues concurrently with everything flowing through language design.

Note that this doesn't include anything to do with dictionary collection literals, which we'll handle separately (and possibly not for .NET 8).

@huoyaoyuan
Copy link
Member

I'm really concerned for making these builders ordinal methods (somehow "hidden" though), instead of some special method marked with modreq.

@CyrusNajmabadi
Copy link
Member Author

@huoyaoyuan not sure what you mean by 'ordinal methods'. Can you expand?

@CyrusNajmabadi
Copy link
Member Author

@stephentoub thanks for that. I was sure I missed something:-)

@huoyaoyuan
Copy link
Member

not sure what you mean by 'ordinal methods'. Can you expand?

The init Construct method in original proposal should not be callable in non collection literal context.
Given we already have unsafe accessors in CollectionsMarshal, it's not breaking to add another one, but I think it's better to not provide so many unsafe accessors.

@stephentoub
Copy link
Member

Why shouldn't these Create methods be callable? The immutable collection ones other than immutable array are all just overloads of existing Create methods and aren't unsafe at all. Only two of the methods in the proposal are "unsafe", and both of them just expose a different shape on functionality already available. Further, if you look at my branch, you'll see they're quite useful in certain circunstances... eg it allowed me to remove a bespoke helper from LINQ and just use this method instead.

@Tornhoof
Copy link
Contributor

Is there a reason why the overloads with the equalitycomparer have that argument at the first position instead the last position?,
i.e. public static ImmutableHashSet<T> Create<T>(IEqualityComparer<T>? equalityComparer, ReadOnlySpan<T> items);
instead of public static ImmutableHashSet<T> Create<T>(ReadOnlySpan<T> items, IEqualityComparer<T> equalityComparer,);
Is that to support params style calls with ReadOnlySpan?

@dubiousconst282
Copy link
Contributor

Another alternative for the List and ImmutableArray constructors would be taking ownership of an array directly. This has the advantage of not creating address-taken variables, which I believe can (or could) trouble the JIT a bit depending on inlining and how they are used afterwards.

public static class CollectionsMarshal
{
    public static List<T> Create<T>(T[] storage);
    public static ImmutableArray<T> Create<T>(T[] storage);
}

@huoyaoyuan
Copy link
Member

Why shouldn't these Create methods be callable? The immutable collection ones other than immutable array are all just overloads of existing Create methods and aren't unsafe at all.

I was meaning the methods on CollectionsMarshal. It feels like making FastAllocateString public instead of string.Create. The latter is much harder to violate.
In the other hand, preventing violations of the span via delegate is costful.

@stephentoub
Copy link
Member

stephentoub commented Jun 15, 2023

Is there a reason why the overloads with the equalitycomparer have that argument at the first position instead the last position?

Mainly because that's the order of the existing array-based overloads. This is just creating overloads with spans where there are already arrays.

Another alternative for the List and ImmutableArray constructors would be taking ownership of an array directly

We could and it was discussed.

That already exists for ImmutableArray:
https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.immutablecollectionsmarshal.asimmutablearray

For List it has its own complications. For example, the method would need to validate that the type of the array is actually T[] and not something derived from it, or else anywhere the implementation subsequently tries to create a Span would blow up. It's also likely to have it's own costs, eg for non-sealed reference types, writing into a span instead of array could avoid variance checking costs. And for List, while we've exposed the backing store via AsSpan, there's no way to persist that backing store beyond the stack; exposing it by assuming ownership transfer of another array means new avenues for misuse.

None of these are deal-breakers. If we decided we didn't care about List backing being exposed as an array, that could tip the balance. Then we'd add an AsList and remove both Create methods on {Immutable} CollectionsMarshal.

As a pattern, though, the span-based approach is more flexible. It can represent future cases the array one can't, eg where the storage is contiguous but not an array or not the entirety of an array.

I was meaning the methods on CollectionsMarshal

They have no power beyond what's already possible with CollectionsMarshal.AsSpan, CollectionsMarshal.SetCount, and ImmutableCollectionsMarshal.AsImmutableArray.

@eiriktsarpalis
Copy link
Member

We discussed this offline too, but I think that this feature has value beyond collection literals. For instance, deserializers should be able to take advantage of such annotations when determining how a specific collection type is meant to be hydrated. We might want to consider naming the attribute in such a way that reflects this.

@Enderlook
Copy link

If returning a Span<T> is bad for async, and returning a T[] would limit future usages for cases which aren't backed by an array or only use a part of it... why not return Memory<T>?
It may have the unfortunate problem of accidentally collecting the backing storage of the Memory<T>.Span when it's still being used as the Span<T> may not store a strong reference to the IMemoryOwner<T> for custom backings... but that is avoidable by using GC.KeepAlive(memory) which could the compiler insert automatically in such code.
If calling .Span is too expensive for the common case, then have both Span<T> and Memory<T> overloads, so the second is only used on async methods.

@alrz
Copy link
Member

alrz commented Jun 16, 2023

What would be the overhead of using ImmutableArray<T> Create<T>(ReadOnlySpan<T> items) for ImmutableArray and List creation instead of CollectionMarshal?

@stephentoub
Copy link
Member

What would be the overhead of using ImmutableArray Create(ReadOnlySpan items) for ImmutableArray and List creation instead of CollectionMarshal?

Rather than being able to allocate the backing array once and write all items directly into it, the compiler would need to first store all the items somewhere else and then call this method to allocate an array that the elements are all copied into. Best case, it's an extra copy. Worst case, it's double the allocation.

@stephentoub
Copy link
Member

If returning a Span is bad for async, and returning a T[] would limit future usages for cases which aren't backed by an array or only use a part of it... why not return Memory?

It could, and using Memory<T> is one of the considered options. But it's also more expensive. And it has the same cited issue for List<T> previously cited, that we're a bit more hesitant to surface the actual array object rather than a span which can't live past the stack lifetime.

Span<T> is problematic if the collection literal itself contains await. The conclusion is that's sufficiently rare that in the cases where it does occur, the compiler just falls back to doing the extra allocation/copy (i.e. it first builds up an array/list and then copies the data in via the span).

but that is avoidable by using GC.KeepAlive(memory)

GC.KeepAlive will box the struct.

@Enderlook
Copy link

GC.KeepAlive will box the struct.

I though the JIT avoided the boxing for GC.KeepAlive?

Anyway, I understand the first issue, so you don't want the exposed type to outlive the stack lifetime?
Then, in that case, there is no way to avoid a copy in an async context, as any exposed type would have to outlive the stack lifetime, am I right?

@stephentoub
Copy link
Member

Anyway, I understand the first issue, so you don't want the exposed type to outlive the stack lifetime? Then, in that case, there is no way to avoid a copy in an async context, as any exposed type would have to outlive the stack lifetime, am I right?

That is the concern, yes. We could choose to say it's not enough of a concern, and then as noted in my previous replies, that could shift the balance of what we decide to do.

@alrz
Copy link
Member

alrz commented Jun 16, 2023

What would be the overhead of using ImmutableArray Create(ReadOnlySpan items) for ImmutableArray and List creation instead of CollectionMarshal?

Rather than being able to allocate the backing array once and write all items directly into it, the compiler would need to first store all the items somewhere else and then call this method to allocate an array that the elements are all copied into. Best case, it's an extra copy. Worst case, it's double the allocation.

Thanks, so this is like string.Create((Span) => { .. }) for IA but without the lambda.

Could this pattern be generalized for any type without that allocation for use in collection literals?

string.Create(length, out Span<char>)
ImmutableArray<T>.Create(length, out Span<T>)

I'm not sure how that lambda in the string.Create API helps compared to out Span which saves an allocation.

Also notice that that is not considered an "advanced" API as OP mentions, since it lives in string.

@CyrusNajmabadi
Copy link
Member Author

The conclusion is that's sufficiently rare that in the cases where it does occur, the compiler just falls back to doing the extra allocation/copy (i.e. it first builds up an array/list and then copies the data in via the span).

I def want this point discussed with ldm though. My personal leaning is still preferring this case have no overhead.

@stephentoub
Copy link
Member

Could this pattern be generalized for any type without that allocation for use in collection literals?

The runtime needs string to actually be immutable after the runtime has declared that it's been created. Handing out the span would be like encouraging folks to use unsafe code to mutate immutable strings. ImmutableArray<T> isn't special like string is.

@Enderlook
Copy link

What would be the overhead of using ImmutableArray Create(ReadOnlySpan items) for ImmutableArray and List creation instead of CollectionMarshal?

Rather than being able to allocate the backing array once and write all items directly into it, the compiler would need to first store all the items somewhere else and then call this method to allocate an array that the elements are all copied into. Best case, it's an extra copy. Worst case, it's double the allocation.

Thanks, so this is like string.Create((Span) => { .. }) for IA but without the lambda.

Could this pattern be generalized for any type without that allocation for use in collection literals?

string.Create(length, out Span<char>)
ImmutableArray<T>.Create(length, out Span<T>)

I'm not sure how that lambda in the string.Create API helps compared to out Span which saves an allocation.

Also notice that that is not considered an "advanced" API as OP mentions, since it lives in string.

If the delegate supported a state parameter, it could be cached... but it still has the performance cost of a delegate call.

Also, unless the delegate returns ValueTask, it won't be very useful for async methods.

I'm not sure if a delegate would be the best approach for this problem.

@stephentoub
Copy link
Member

perhaps returning a span over FastStringAlloc should work?

For reasons I already mentioned, I don't forsee is exposing any APIs that encourage mutating String after it's already handed out. If we wanted to enable collection literals to produce strings, we'd do so via the ROS ctor.

#54412. I'm not sure if Mono and NativeAOT have this optimization though

The box is optimized away for NET 6+, but not on mono to my knowledge, and more relevant to my comment not on .NET Framework, which is still important, in particular from a pattern perspective (we don't want to use a pattern that'll actively be more expensive).

I def want this point discussed with ldm though. My personal leaning is still preferring this case have no overhead.

If this is important, we should enable the compiler to a) know how to construct collection-initializer collections with a capacity and for List we'll just point to that existing ctor, then populating it will happen with Add{Range} and b) know how to transfer ownership of an array, and for ImmutableArray we'll point it to that. Then we won't expose those two Create methods. This will be simpler, safer, and only incrementally less efficient in the list case for some scenarios. Or we augment rather than replace the kinds of patterns supported, allow for multiple attributes, and let the compiler decide which it wants to use on a case by case basis.

@CyrusNajmabadi
Copy link
Member Author

this is important,

Sounds good. Can you and Chuck discuss if this comes up while I'm away. I'd prefer to get the right long term solution in so we don't regret things we may end changing soon down the line. Thanks!

@bartonjs
Copy link
Member

bartonjs commented Jun 22, 2023

Video

We think that we can leave out the "Literal" token in CollectionLiteralBuilderAttribute, so just "CollectionBuilderAttribute".

There's still work to be done on the List creation pattern (exposing storage, ownership transferrence, etc).

The ImmutableCollections(ReadOnlySpan) are all good things to add, even without the language feature.

public static class ImmutableHashSet
{
+    public static System.Collections.Immutable.ImmutableHashSet<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableList
{
+    public static System.Collections.Immutable.ImmutableList<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableQueue
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableSortedSet
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

public static class ImmutableStack
{
+    public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}

@bartonjs bartonjs added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 22, 2023
@stephentoub stephentoub self-assigned this Jun 22, 2023
@CyrusNajmabadi
Copy link
Member Author

@bartonjs great! Would it make sense to break out the discussion (and the impl) between the "this is trivial, and we can just do it now" (like immutable collections) vs the more complex ownership apis which can happen a little later? It would be nice to have some of the APIs soon for the compiler to light up on.

@stephentoub
Copy link
Member

@CyrusNajmabadi, the easy ones have already merged:
#87945

What's left is "just" the attribute (which I didn't want to add until we agreed on exactly what patterns would be supported) and the one or two APIs for List / ImmutableArray depending on what we decide to do with them. Next step is you/me/etc. need to have some follow-up discussions.

@CyrusNajmabadi
Copy link
Member Author

Ok. I'll schedule something for next week!

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Jun 23, 2023
@RenderMichael
Copy link
Contributor

This feature would be greatly improved if ref structs like ReadOnlySpan<> were allowed in async methods as long as they don't cross an await.

@stephentoub
Copy link
Member

This feature would be greatly improved if ref structs like ReadOnlySpan<> were allowed in async methods as long as they don't cross an await.

While I would also like that restriction removed, how is it relevant to this issue? The C# compiler itself isn't subject to the same constraint for span-related code it itself manufactures.

@stephentoub stephentoub added the blocking Marks issues that we want to fast track in order to unblock other important work label Jul 6, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 6, 2023
@terrajobst
Copy link
Member

terrajobst commented Jul 13, 2023

Video

  • We should mark it as AllowMultiple = false for now
  • The proposal allows interfaces. Should we add attributes for the immutable interfaces? Are we OK with making a decision which concrete types are the canonical implementations for the attributed interface?
namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Class |
                    AttributeTargets.Interface |
                    AttributeTargets.Struct,
                    Inherited = false,
                    AllowMultiple = false)]
    public sealed class CollectionBuilderAttribute : Attribute
    {
        public CollectionBuilderAttribute(Type builderType, string methodName);
        public Type BuilderType { get; }
        public string MethodName { get; }
    }
}

namespace System.Collections.Immutable
{
    [CollectionBuilder(typeof(ImmutableArray), "Create")]
    public readonly partial struct ImmutableArray<T> { }

    [CollectionBuilder(typeof(ImmutableHashSet), "Create")]
    public sealed partial class ImmutableHashSet<T> { }

    [CollectionBuilder(typeof(ImmutableList), "Create")]
    public sealed partial class ImmutableList<T> { }

    [CollectionBuilder(typeof(ImmutableQueue), "Create")]
    public sealed partial class ImmutableQueue<T> { }

    [CollectionBuilder(typeof(ImmutableSortedSet), "Create")]
    public sealed partial class ImmutableSortedSet<T> { }

    [CollectionBuilder(typeof(ImmutableStack), "Create")]
    public sealed partial class ImmutableStack<T> { }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 13, 2023
@CyrusNajmabadi
Copy link
Member Author

Are we OK with making a decision which concrete types are the canonical implementations for the attributed interface?

Do you have cases where you even have multiple options? Or a case where you would feel bad guessing? For example, being able to write the following would be a totally mainline use case in C#:

public IImutableSet<Whatever> Values { get; } = [a, b, c];

Needing to be explicit about the concrete type here seems like it would just be an unnecessary burden most of the time. In the rare (IMO) case where the default you pick is wrong then the user can always sidestep by supplying the target themselves like so:

public IImutableSet<Whatever> Values { get; } = (ADifferentImmutableSetType<Whatever>)[a, b, c];

But that keeps the common case very reasonable, while making hte rare case the one that has to pay the price.

@bartonjs
Copy link
Member

bartonjs commented Jul 13, 2023

For the IImutable* interfaces it might be fine, when I asked my variant of the "are we OK with..." question I meant for collections interfaces in general.

Ignoring the fact that current layering says there's actually only one choice, ISet<T> would have several "reasonable" choices: SortedSet<T>, ImmutableSet<T>, FrozenSet<T>, and HashSet<T>. We'd have to pick one to be what we return from (e.g.) public ISet<T> CollectionMarshal::CreateSet(ReadOnlySpan<T> values), and then that's it, it is what it is forever (unless we were clever and made the actual returned instance have an opaque type so it wasn't castable, but the read-only vs writable details would still be locked in "forever").

To me, it seems fine to say

ISet<int> values = { 1, 2, 3, 4 };

doesn't compile. The code author can just type the variable to which of the four they want, or do a casting assignment. And now they get to decide "general purpose mutable", "fast readonly", "slow mutable but prints in a pretty way", and "whatever reason someone would pick immutable set over frozen set" (not dissing ImmutableSet, I just don't have domain knowledge here to pick one).

@CyrusNajmabadi
Copy link
Member Author

For the IImutable* interfaces it might be fine, when I asked my variant of the "are we OK with..." question I meant for collections interfaces in general.

Got it. To me the question is: "is there a reasonable default that is the right choice 95% of hte time". For things like ISet i'm not sure that exists. For IEnumerable/IList/IReadOnlyList/IDictionary (once we support dictionaries), then i think such a case would exist.

That said, even if you don't support things like IEnumerable/IList/IReadOnlyList, the Lang/Compiler will (and the interfaces on Dictionary as well), so it's not critical that you do anything there.

So really, it's just about IImmutableXXX. :)

@stephentoub
Copy link
Member

That said, even if you don't support things like IEnumerable/IList/IReadOnlyList, the Lang/Compiler will (and the interfaces on Dictionary as well), so it's not critical that you do anything there.
So really, it's just about IImmutableXXX. :)

Why would the compiler special-case IEnumerable/IList/IReadOnlyList but not IImmutableXXX? Why are the immutable interfaces important to handle but not important enough to special-case?

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 17, 2023
@dotnet dotnet locked as resolved and limited conversation to collaborators Aug 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections blocking Marks issues that we want to fast track in order to unblock other important work
Projects
None yet
Development

Successfully merging a pull request may close this issue.