-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Casting to a variant generic interface is much slower than to non-variant #4090
Comments
Some thoughts. If I understand all correctly, the code of casting is here: I see two possible optimizations:
if (GetTypeDefRid() != pTargetMT->GetTypeDefRid() || GetModule() != pTargetMT->GetModule()) {...} I'm not sure that this can help noticeably. But I believe that .NET team can dig out the true reason and fix the problem if it is possible. |
wow. That's a big difference. |
This issue was fixed for CoreRT by caching the casting results dotnet/corert@ede4733 . We may consider implementing similar cache for CoreCLR as well. |
I stumbled into this myself recently and did some investigation. Here are the details I provided on dotnet/coreclr#11094 that relate to this issue. Results SummaryUsing BenchmarkDotNet I've found that covariant and contravariant casting is approximately:
The code to reproduce these experiments and my investigation is found in this blog post. Using BenchmarkDotNet I've found that covariant and contravariant casting + method call is approximately:
The code to reproduce these experiments and my investigation is found in another blog post. Note: these results exclude the first call to dynamic, which is ~1200x slower than the first call using covariant and contravariant casting. Also, see the comments on my blog posts where readers have reproduced these results and conducted their own insightful experiments that lead to my findings. Test Environment and Raw .NET Core ResultsPlease note that my blog posts only include the results from .NET Framework on 64bit RyuJIT. The results above are from .NET Framework on 64bit RyuJIT and .NET Core on 64bit RyuJIT. I have also run tests on .NET Framework on 32bit LegacyJIT and while there are absolute performance differences, the results are on the same order of magnitude as those presented above. Here are the test environments as reported by BenchmarkDotNet for the results presented on my blog posts: BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128907 Hz, Resolution=319.6004 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128910 Hz, Resolution=319.6001 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0 Here are the results for .NET Core that are not on my blog post and the .NET Core test environment (the target framework is set to .NETCoreApp 1.1): BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows 10.0.14393
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128910 Hz, Resolution=319.6001 ns, Timer=TSC
dotnet cli version=1.0.3
[Host] : .NET Core 4.6.25009.03, 64bit RyuJIT
DefaultJob : .NET Core 4.6.25009.03, 64bit RyuJIT
Casting Results:
==================================================
Method | Mean | StdDev | Scaled | Scaled-StdDev |
------------------- |----------- |---------- |------- |-------------- |
ObjectCast | 0.4848 ns | 0.0008 ns | 0.42 | 0.00 |
ImplementationCast | 1.1616 ns | 0.0006 ns | 1.00 | 0.00 |
InterfaceCast | 2.8750 ns | 0.0049 ns | 2.48 | 0.00 |
GenericCast | 4.0288 ns | 0.0090 ns | 3.47 | 0.01 |
CovariantCast | 70.5734 ns | 0.0317 ns | 60.76 | 0.04 |
ContravariantCast | 71.1789 ns | 0.0116 ns | 61.28 | 0.03 |
Covariant Casting + Method Call Results:
==================================================
| Method | Mean | StdDev | Median | Scaled | Scaled-StdDev |
|--------- |----------- |---------- |----------- |------- |-------------- |
| Direct | 18.4310 ns | 0.1172 ns | 18.3872 ns | 1.00 | 0.00 |
| Implicit | 18.3568 ns | 0.1045 ns | 18.2858 ns | 1.00 | 0.01 |
| Explicit | 84.4047 ns | 0.2345 ns | 84.4041 ns | 4.58 | 0.03 |
| Dynamic | 30.6939 ns | 0.0023 ns | 30.6943 ns | 1.67 | 0.01 |
Contravariant Casting + Method Call Results:
==================================================
| Method | Mean | StdDev | Scaled | Scaled-StdDev |
|--------- |----------- |---------- |------- |-------------- |
| Direct | 17.7818 ns | 0.0041 ns | 1.00 | 0.00 |
| Implicit | 17.7725 ns | 0.0047 ns | 1.00 | 0.00 |
| Explicit | 83.4591 ns | 0.0097 ns | 4.69 | 0.00 |
| Dynamic | 28.2057 ns | 0.0670 ns | 1.59 | 0.00 | |
* TODO: Casting to covariant interface is up to 200x slower: https://github.com/dotnet/coreclr/issues/603 * Huffman * More optimizations * Optimizations * Optimizations * Precompute codeMax * Rename tests * Cleanup * Comments * Rename classes * Add more units using real data
* TODO: Casting to covariant interface is up to 200x slower: https://github.com/dotnet/coreclr/issues/603 * Huffman * More optimizations * Optimizations * Optimizations * Precompute codeMax * Rename tests * Cleanup * Comments * Rename classes * Add more units using real data * Feature/huffman (#253) * New test * Jump * Fix bench * Manual merge
Fixed in dotnet/coreclr#23548 |
In the CoreFx issue (https://github.com/dotnet/corefx/issues/1182) was the idea to do explicit check whether
IReadOnlyCollection<T>
interface is supported. But it was rejected due to the slow casting to a variant generic interface. According to my tests it is ~20 times slower than casting to non-variant.Source code of the test: CastToInterfaceTest.cs
Test results:
List<int>
ICollection<int>
List<double>
ICollection<int>
Thread
IReadOnlyCollection<int>
List<int>
IReadOnlyCollection<int>
List<double>
IReadOnlyCollection<int>
It would be great to see some improvements here.
The text was updated successfully, but these errors were encountered: