Skip to content

Proposal: Remove covariance on array #66376

@hez2010

Description

@hez2010

Proposal: Remove covariance on array

Prev discussion: dotnet/csharplang#5153

Basically covariance on array allows us to upcast an array. But it also leads to the following issue:

Base[] array = new Derived[10];
array[0] = new Base(); // exception here, type system is broken

class Base {}
class Derived : Base {}

It's breaking the type system: a type error occurred in runtime regardless the compile-time type check.

And also, an interesting case:

Random[] a = new Random[1];
object[] c = a;
MemoryMarshal.GetArrayDataReference(c) = "a string";
var l = a[0].Next(); // guess what method is actually being called
Console.WriteLine(l);

Covariance array is not only breaking the type system, but also has huge performance impact, let's see the benchmark (provided by @huoyaoyuan):

    class Base
    {
    }

    class Derived : Base
    {
    }

    [SimpleJob(RuntimeMoniker.Net60)]
    public class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<Program>();
        }

        private readonly Base[] array = new Base[256];
        private Derived derived = new();
        private Base @base = new();
        private readonly int[] intArray = new int[256];

        [Benchmark]
        public object Array()
        {
            for (int i = 0; i < array.Length; i++)
                array[i] = derived;
            return array;
        }

        [Benchmark]
        public object ArrayBase()
        {
            for (int i = 0; i < array.Length; i++)
                array[i] = @base;
            return array;
        }
        
        [Benchmark]
        public object ArrayInt()
        {
            for (int i = 0; i < intArray.Length; i++)
                intArray[i] = 19260817;
            return intArray;
        }

        [Benchmark]
        public object SpanIndex()
        {
            Span<Base> span = array;
            for (int i = 0; i < span.Length; i++)
                span[i] = derived;
            return array;
        }

        [Benchmark]
        public object SpanRef()
        {
            Span<Base> span = array;
            foreach (ref var x in span)
                x = derived;
            return array;
        }
    }

Benchmark result:

Method Mean Error StdDev
Array 1,490.6 ns 6.19 ns 5.17 ns
ArrayBase 600.3 ns 1.13 ns 1.06 ns
ArrayInt 113.1 ns 0.72 ns 0.67 ns
SpanIndex 435.9 ns 1.28 ns 1.00 ns
SpanRef 435.1 ns 0.43 ns 0.40 ns

The overhead of array covariance is significant.

As you can see, array covariance makes code completely unsafe and can lead to undefined behavior, and it also makes writes to array much slower. Its benefit-harm ratio is minus:

Benefits:

  • You can upcast an array

Harm:

  • It can easily bypass the compile-time type check without a line of unsafe code
  • It breaks the type system and leads to runtime exception of type mismatch despite successful compilation
  • It introduces unnecessary runtime check against type of elements in an array which introduces more runtime overhead
  • It makes JIT hard to optimize against arrays
  • It can lead to undefined behavior easily and can bypass the runtime check

To examine the usage of covariance array, I wrote an analyzer to help detect such usage in assembiles. I ran it against .NET + ASP.NET Core assembiles, and it turned out that the usage is rare.

The analyzer: https://github.com/hez2010/ArrayUpcastAnalyzer
Here is the analysis report: report.txt

Note that there may be some false positives, such as object[] array = (object[])obj; where obj is object.

Most usages can be avoided via casting the type of element in array initializer (i.e. change object[] x = new T[] { new T() ... } to object[] x = new object[] { (object)new T()... }), which can be done by Roslyn automatically, and after this change, the left usages are nearly none. So it's totally fine to remove covariance on array, since it has never been widely adopted.

Array is a fundamental thing, and its performance affects vast range of applications. Now that the usage of covariance on array is actually rare, let's remove covariance on array to deliver type-safe and high-performance array for everyone

Approaches

  • Make arrays invariant by default, and introduce a new array type which is covariant
  • Keep arrays covariant by default, and introduce a new array type which is invariant
  • Completely disallow covariance on arrays

I will prefer the first approach though it will introduce breaking changes, but we can have a guidance to help users migrate their code (basically find'n'replace). The second one seems doable but it requires users to adopt it manually, which makes no difference with today's Span<>: ton of code will still use the slow covariant arrays. While the third one is more aggressive, but it may make existing code hard to migrate.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions