-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Proposal: Remove covariance on array
Prev discussion: dotnet/csharplang#5153
Basically covariance on array allows us to upcast an array. But it also leads to the following issue:
Base[] array = new Derived[10];
array[0] = new Base(); // exception here, type system is broken
class Base {}
class Derived : Base {}It's breaking the type system: a type error occurred in runtime regardless the compile-time type check.
And also, an interesting case:
Random[] a = new Random[1];
object[] c = a;
MemoryMarshal.GetArrayDataReference(c) = "a string";
var l = a[0].Next(); // guess what method is actually being called
Console.WriteLine(l);Covariance array is not only breaking the type system, but also has huge performance impact, let's see the benchmark (provided by @huoyaoyuan):
class Base
{
}
class Derived : Base
{
}
[SimpleJob(RuntimeMoniker.Net60)]
public class Program
{
static void Main(string[] args)
{
BenchmarkRunner.Run<Program>();
}
private readonly Base[] array = new Base[256];
private Derived derived = new();
private Base @base = new();
private readonly int[] intArray = new int[256];
[Benchmark]
public object Array()
{
for (int i = 0; i < array.Length; i++)
array[i] = derived;
return array;
}
[Benchmark]
public object ArrayBase()
{
for (int i = 0; i < array.Length; i++)
array[i] = @base;
return array;
}
[Benchmark]
public object ArrayInt()
{
for (int i = 0; i < intArray.Length; i++)
intArray[i] = 19260817;
return intArray;
}
[Benchmark]
public object SpanIndex()
{
Span<Base> span = array;
for (int i = 0; i < span.Length; i++)
span[i] = derived;
return array;
}
[Benchmark]
public object SpanRef()
{
Span<Base> span = array;
foreach (ref var x in span)
x = derived;
return array;
}
}Benchmark result:
| Method | Mean | Error | StdDev |
|---|---|---|---|
| Array | 1,490.6 ns | 6.19 ns | 5.17 ns |
| ArrayBase | 600.3 ns | 1.13 ns | 1.06 ns |
| ArrayInt | 113.1 ns | 0.72 ns | 0.67 ns |
| SpanIndex | 435.9 ns | 1.28 ns | 1.00 ns |
| SpanRef | 435.1 ns | 0.43 ns | 0.40 ns |
The overhead of array covariance is significant.
As you can see, array covariance makes code completely unsafe and can lead to undefined behavior, and it also makes writes to array much slower. Its benefit-harm ratio is minus:
Benefits:
- You can upcast an array
Harm:
- It can easily bypass the compile-time type check without a line of unsafe code
- It breaks the type system and leads to runtime exception of type mismatch despite successful compilation
- It introduces unnecessary runtime check against type of elements in an array which introduces more runtime overhead
- It makes JIT hard to optimize against arrays
- It can lead to undefined behavior easily and can bypass the runtime check
To examine the usage of covariance array, I wrote an analyzer to help detect such usage in assembiles. I ran it against .NET + ASP.NET Core assembiles, and it turned out that the usage is rare.
The analyzer: https://github.com/hez2010/ArrayUpcastAnalyzer
Here is the analysis report: report.txt
Note that there may be some false positives, such as object[] array = (object[])obj; where obj is object.
Most usages can be avoided via casting the type of element in array initializer (i.e. change object[] x = new T[] { new T() ... } to object[] x = new object[] { (object)new T()... }), which can be done by Roslyn automatically, and after this change, the left usages are nearly none. So it's totally fine to remove covariance on array, since it has never been widely adopted.
Array is a fundamental thing, and its performance affects vast range of applications. Now that the usage of covariance on array is actually rare, let's remove covariance on array to deliver type-safe and high-performance array for everyone
Approaches
- Make arrays invariant by default, and introduce a new array type which is covariant
- Keep arrays covariant by default, and introduce a new array type which is invariant
- Completely disallow covariance on arrays
I will prefer the first approach though it will introduce breaking changes, but we can have a guidance to help users migrate their code (basically find'n'replace). The second one seems doable but it requires users to adopt it manually, which makes no difference with today's Span<>: ton of code will still use the slow covariant arrays. While the third one is more aggressive, but it may make existing code hard to migrate.