Skip to content

Latest commit

 

History

History
161 lines (113 loc) · 7.53 KB

File metadata and controls

161 lines (113 loc) · 7.53 KB

Collection literals working group meeting for June 12, 2023

Agenda

  • API shape for BCL collections to opt-into collection literals support.

Discussion

LDM and Runtime members met and discussed the shapes we think we'll want in the runtime to allow a type to either opt into collection-literals support entirely, or to opt into an extremely efficient approach to constructing an instance of itself.

Note: for the following discussion, the code shown is a rough approximation of what we think is needed. Exact naming, order of arguments, return vs out discussion will happen later.

All the approaches require the following attribute to be defined:

[AttributeUsage(AttributeTargets.Interface | AttributeTargets.Class | AttributeTargets.Struct))
public sealed class CollectionBuilderAttribute(Type factoryType, string methodName) : System.Attribute
{
    public readonly Type FactoryType;
    public readonly string MethodName;
}

When present on a type T, this identifies a static factory method MethodName found on type FactoryType which can be used to construct an instance of T. Several factory shapes will be supported. Note: The type T and the FactoryType are allowed to be the same type. These shapes may also be expanded upon in the future if motivating scenarios exist for them.

Shape 1: ReadOnlySpan<T> as input

The first shape we support is the following:

[CollectionBuilder(typeof(ImmutableHashSet), "Create")]
public class ImmutableHashSet<T> { /*...*/ }

public static class ImmutableHashSet
{
    public static ImmutableHashSet<T> Create<T>(ReadOnlySpan<T> values);
}

Here, the factory method takes a ReadOnlySpan of values and returns an instance of the final collection. The compiler would then translate in the following fashion:

ImmutableHashSet<int> values = [1, 2, 3];

// translates to:
ReadOnlySpan<int> __values = [1, 2, 3];
ImmutableHashSet<int> values = ImmutableHashSet.Create<int>(__values);

This pattern will allow pointing at methods that already exist for this purpose, as well as allowing the runtime to add new methods to existing types to allow them to be created from spans. A similar shape that we want to support is practically the same as the above, except that the factory method can be a constructor of the type. Continuing the example above, that would be:

[CollectionBuilder(typeof(ImmutableHashSet<>), ".ctor")]
public class ImmutableHashSet<T>
{
    public ImmutableHashSet(ReadOnlySpan<T> values);
}

// code:
ImmutableHashSet<int> values = [1, 2, 3];

// translates to:
ReadOnlySpan<int> __values = [1, 2, 3];
ImmutableHashSet<int> values = new ImmutableHashSet<int>(__values);

Shape 1 consequences

The above approach is desired by the runtime so they can add overloads to their existing construction methods that allow the more efficient use of spans over more expensive collection types (arrays, IEnumerables, etc.). However, this raises a problem given that these types normally have existing construction methods that take an IEnumerable<T>. Consider, for example:

public class HashSet<T>
{
    // existing constructor
    public HashSet(IEnumerable<T> values);

    // newly added
    public HashSet(ReadOnlySpan<T> values);
}

// code:
new HashSet<int>(new int[] { 1, 2, 3 });

With the rules of the language today, the above is ambiguous. This is unfortunate as a prime reason to create Span overloads of methods is to allow for efficient options for callers that do not need to use the heap. We believe this should be fixed, and the WG will present a proposal to the LDM to add overload resolution rules that consider Spans better than items in the inheritance chain for arrays.

Shape 2: Writable Span<T> or T[] storage.

While the above pattern is good for collection types that own their own storage arrangement, it still involves producing the sequence one end, and then processing it to produce the final result within the factory method. This is unnecessary overhead for a couple of very important, widely used, collections in the .NET BCL. Specifically, List<T> and ImmutableArray<T>. Both of these types are thin wrappers around a backing array, and it is highly desirable to just be able to write directly into that array without any overhead at all. To that end, the above attribute can be used to point to a factory method with the following shape:

[CollectionBuilder(typeof(CollectionsMarshall), "Create")]
public struct ImmutableArray<T> { /*...*/ }

// In System.Runtime.CompilerServices
public static class CollectionsMarshall
{
    public static Span<T> Create<T>(int capacity, out ImmutableArray<T> result); // or:
    public static void Create<T>(int capacity, out ImmutableArray<T> result, out Span<T> storage);
}

// code:
ImmutableArray<string> values = ["a", "b", "c"];

// translation:
Span<string> __storage = CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values);
__storage[0] = "a";
__storage[1] = "b";
__storage[2] = "c";

Depending on how the compiler encodes constant information, this could also be a case where the data is stored in the read-only section of the dll and memcpy'ed to the destination.

Note: the runtime already has methods that expose this data for high performance scenarios. So this does not introduce any new safety concerns. The methods are intentionally kept out of the way in the CompilerServices namespace to help indicate they are not for normal use, and should be treated very carefully.

Shape 2 consequences

The above pattern will not work as efficiently in the case where the Span cannot be kept on the stack while evaluating the elements of the collection literal (for example, if there are await calls in the literal. An alternative/complimentary approach that could be offered would be to have the following:

public static class CollectionsMarshall
{
    public static void Create<T>(int capacity, out ImmutableArray<T> result, out Span<T> storage); // or:
    public static void Create<T>(int capacity, out ImmutableArray<T> result, out T[] storage);
}

// code:
ImmutableArray<string> values = [await a, await b, await c];

// translation:
CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out string[] __storage);
__storage[0] = await a;
__storage[1] = await b;
__storage[2] = await c;

During discussion, there was mixed feeling on this. Sentiment indicated that it somehow felt worse to be handing out the array directly, where any consumer could then hold onto it forever. While exposing the Span still gave access to all the data, it felt slightly more palatable as the Span was limited to the stack.

However, not having the array option means the compiler would have to translate like so:

// code:
ImmutableArray<string> values = [await a, await b, await c];

// translation:
string[] __temp = new string[] { await a, await b, await c };

CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out Span<string> __storage);
__temp.CopyTo(__storage);

This array seems wasteful, and could occur reasonably often in normal coding patterns.

Builders

WG discussed if we should support a more flexible 'builder' pattern. Both WG and Runtime felt this was unnecessary. As per earlier decisions, it can also come in the future if there is a compelling case where they are needed.

Next steps

  1. LDM and Runtime to continue to drive the final shape for these APIs, and the work to get them into the runtime.
  2. WG to go back to LDM to request overload resolution tweaks for ReadOnlySpan<T> and IEnumerable<T>.