Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.VisualBasic.Strings.InStr with CompareMethod.Text is 10x slower in .NET 7 than in .NET Framework 4.8 #88416

Closed
HeinziAT opened this issue Jul 5, 2023 · 10 comments
Labels

Comments

@HeinziAT
Copy link

HeinziAT commented Jul 5, 2023

Description

Microsoft.VisualBasic.Strings.InStr with CompareMethod.Text is 10x slower in .NET 7 than in .NET Framework 4.8.

Full repro:

using Microsoft.VisualBasic;
using System.Diagnostics;
using System;

// Warmup
Strings.InStr("a", "b", CompareMethod.Text);

var sw = new Stopwatch();
sw.Start();
for (int i = 1; i <= 10_000_000; i++)
{
    Strings.InStr("a", "b", CompareMethod.Text);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

yields on my system:

  • ~500ms with .NET Framework 4.8,
  • ~400ms with .NET Core 3.1
  • ~22000ms [sic!] with .NET 5
  • ~5000ms with .NET 6 and 7.

csproj file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net48;netcoreapp3.1;net5.0;net6.0;net7.0</TargetFrameworks>
    <LangVersion>latest</LangVersion>
  </PropertyGroup>

  <ItemGroup Condition="'$(TargetFramework)' == 'net48'">
    <Reference Include="Microsoft.VisualBasic" />
  </ItemGroup>
</Project>

Configuration

  • Can be reproduced with x86 and x64 builds.

Regression?

  • This is a regression from .NET Framework 4.8/.NET Core 3.1. I'm aware that newly-written (performance-critical) code will probably not use InStr, but this could be a problem for legacy code ported from .NET Framework 4.8 (which is how we discovered this issue).
@HeinziAT HeinziAT added the tenet-performance Performance related issue label Jul 5, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 5, 2023
@ghost
Copy link

ghost commented Jul 5, 2023

Tagging subscribers to this area: @cston
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Microsoft.VisualBasic.Strings.InStr with CompareMethod.Text is 10x slower in .NET 7 than in .NET Framework 4.8.

Full repro:

using Microsoft.VisualBasic;
using System.Diagnostics;
using System;

// Warmup
Strings.InStr("a", "b", CompareMethod.Text);

var sw = new Stopwatch();
sw.Start();
for (int i = 1; i <= 10_000_000; i++)
{
    Strings.InStr("a", "b", CompareMethod.Text);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

yields on my system:

  • ~500ms with .NET Framework 4.8,
  • ~400ms with .NET Core 3.1
  • ~22000ms [sic!] with .NET 5
  • ~5000ms with .NET 6 and 7.

csproj file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net48;netcoreapp3.1;net5.0;net6.0;net7.0</TargetFrameworks>
    <LangVersion>latest</LangVersion>
  </PropertyGroup>

  <ItemGroup Condition="'$(TargetFramework)' == 'net48'">
    <Reference Include="Microsoft.VisualBasic" />
  </ItemGroup>
</Project>

Configuration

  • Can be reproduced with x86 and x64 builds.

Regression?

  • This is a regression from .NET Framework 4.8/.NET Core 3.1.
Author: HeinziAT
Assignees: -
Labels:

area-Microsoft.VisualBasic, tenet-performance

Milestone: -

@stephentoub
Copy link
Member

I put your test into benchmarkdotnet, and I see the opposite, with .NET 7's throughput being 3x as fast as .NET Framework 4.8:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.VisualBasic;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

public class Tests
{
    [Benchmark]
    public int Compare() => Strings.InStr("a", "b", CompareMethod.Text);
}
Method Runtime Mean Error StdDev Ratio
Compare .NET 7.0 26.18 ns 0.207 ns 0.194 ns 0.32
Compare .NET Framework 4.8 82.29 ns 0.994 ns 0.930 ns 1.00

@GerardSmit
Copy link

Even without BenchmarkDotNet, .NET 7.0 is still faster on my machine (AMD Ryzen 9 5950X).

$ dotnet run -c Release -f net48
632

$ dotnet run -c Release -f net7.0
238

When running the project in Rider and Debug mode (with debugger attached):

.NET Framework 4.8: 898
.NET 7.0: 247

And with BenchmarkDotNet (I've added .NET 8.0 as well):

Method Runtime Mean Error StdDev Ratio
Compare .NET 5.0 21.78 ns 0.439 ns 0.488 ns 0.36
Compare .NET 6.0 22.22 ns 0.055 ns 0.049 ns 0.37
Compare .NET 7.0 21.71 ns 0.053 ns 0.047 ns 0.36
Compare .NET 8.0 11.98 ns 0.250 ns 0.234 ns 0.20
Compare .NET Framework 4.8 60.69 ns 0.280 ns 0.248 ns 1.00

@stephentoub stephentoub added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 5, 2023
@ghost
Copy link

ghost commented Jul 5, 2023

This issue has been marked needs-author-action and may be missing some important information.

@HeinziAT
Copy link
Author

HeinziAT commented Jul 5, 2023

That's interesting. Using Stephen's BenchmarkDotNet code, I get the following results:


.NET 7:

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3086/22H2/2022Update)
12th Gen Intel Core i7-12700, 1 CPU, 20 logical and 12 physical cores
.NET SDK=7.0.304
  [Host]     : .NET 7.0.7 (7.0.723.27404), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.7 (7.0.723.27404), X64 RyuJIT AVX2
Method Mean Error StdDev
Compare 505.7 ns 1.78 ns 1.58 ns

.NET Framework 4.8:

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3086/22H2/2022Update)
12th Gen Intel Core i7-12700, 1 CPU, 20 logical and 12 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4644.0), X64 RyuJIT VectorSize=256
  DefaultJob : .NET Framework 4.8 (4.8.4644.0), X64 RyuJIT VectorSize=256
Method Mean Error StdDev
Compare 50.50 ns 0.138 ns 0.122 ns

.NET 5: [Note the µs instead of ns]

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3086/22H2/2022Update)
12th Gen Intel Core i7-12700, 1 CPU, 20 logical and 12 physical cores
.NET SDK=7.0.304
  [Host]     : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2
  DefaultJob : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2
Method Mean Error StdDev
Compare 2.233 us 0.0053 us 0.0044 us

Is there any additional information I can provide to find out what's different about my system?

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 5, 2023
@stephentoub
Copy link
Member

stephentoub commented Jul 5, 2023

You're running in release, right?

Or maybe the particular code paths used by this are causing an issue in the version of ICU used by the version of Windows 10 you're on? What do you get if you set the DOTNET_SYSTEM_GLOBALIZATION_USENLS environment variable to true?

What's your default culture, i.e. if you output CultureInfo.CurrentCulture in a console app, what does it show?

@HeinziAT
Copy link
Author

HeinziAT commented Jul 6, 2023

You're running in release, right?

Yes, of course.

Or maybe the particular code paths used by this are causing an issue in the version of ICU used by the version of Windows 10 you're on? What do you get if you set the DOTNET_SYSTEM_GLOBALIZATION_USENLS environment variable to true?

What's your default culture, i.e. if you output CultureInfo.CurrentCulture in a console app, what does it show?

Bingo! Both NLS/ICU and the current culture are relevant. My default culture is de-AT.

dotnet run -c Release -f net7.0 --filter Tests*

|  Method | Culture |      Mean |    Error |   StdDev |
|-------- |-------- |----------:|---------:|---------:|
| Compare |         |  14.45 ns | 0.080 ns | 0.071 ns |
| Compare |   de-AT | 501.97 ns | 6.946 ns | 6.497 ns |
| Compare |   de-DE | 496.74 ns | 2.750 ns | 2.573 ns |
| Compare |   en-US |  14.44 ns | 0.054 ns | 0.050 ns |

powershell -Command { $env:DOTNET_SYSTEM_GLOBALIZATION_USENLS = 'true'; dotnet run -c Release -f net7.0 --filter Tests* }

|  Method | Culture |     Mean |    Error |   StdDev |
|-------- |-------- |---------:|---------:|---------:|
| Compare |         | 36.04 ns | 0.207 ns | 0.194 ns |
| Compare |   de-AT | 35.40 ns | 0.065 ns | 0.061 ns |
| Compare |   de-DE | 35.78 ns | 0.050 ns | 0.042 ns |
| Compare |   en-US | 35.52 ns | 0.107 ns | 0.089 ns |

Digging deeper into Strings.InStr, it appears that CultureInfo.CompareInfo.IndexOf is slow(er) with ICU (a) for non-English cultures when doing (b) a culture-aware search (i.e., without CompareOptions.Ordinal). Here's a test case without Microsoft.VisualBasic:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

public class Tests
{
    public IEnumerable<CultureInfo> CultureValues => new[]
    {
        CultureInfo.InvariantCulture,
        CultureInfo.GetCultureInfo("en-US"),
        CultureInfo.GetCultureInfo("en-GB"),
        CultureInfo.GetCultureInfo("de-DE"),
        CultureInfo.GetCultureInfo("fr-FR"),
        CultureInfo.GetCultureInfo("it-IT"),
    };

    [Benchmark]
    [ArgumentsSource(nameof(CultureValues))]
    public int CultureSensitive(CultureInfo culture) => culture.CompareInfo.IndexOf("a", "b");

    [Benchmark]
    [ArgumentsSource(nameof(CultureValues))]
    public int Ordinal(CultureInfo culture) => culture.CompareInfo.IndexOf("a", "b", CompareOptions.Ordinal);
}
|           Method | culture |       Mean |     Error |    StdDev |
|----------------- |-------- |-----------:|----------:|----------:|
| CultureSensitive |         |  11.388 ns | 0.0379 ns | 0.0316 ns |
|          Ordinal |         |   4.406 ns | 0.1019 ns | 0.0953 ns |
| CultureSensitive |   de-DE | 486.738 ns | 1.7168 ns | 1.5219 ns |
|          Ordinal |   de-DE |   4.347 ns | 0.0666 ns | 0.0590 ns |
| CultureSensitive |   en-GB |  12.522 ns | 0.0767 ns | 0.0718 ns |
|          Ordinal |   en-GB |   4.327 ns | 0.0384 ns | 0.0359 ns |
| CultureSensitive |   en-US |  12.536 ns | 0.0924 ns | 0.0864 ns |
|          Ordinal |   en-US |   4.313 ns | 0.0214 ns | 0.0190 ns |
| CultureSensitive |   fr-FR | 489.453 ns | 2.3341 ns | 2.0691 ns |
|          Ordinal |   fr-FR |   4.357 ns | 0.0331 ns | 0.0310 ns |
| CultureSensitive |   it-IT | 492.162 ns | 4.9068 ns | 4.0974 ns |
|          Ordinal |   it-IT |   4.376 ns | 0.0406 ns | 0.0317 ns |

I did a bit more reading on the NLS/ICU switch, and, apparently, worse performance for culture-aware string operations is a known issue.

Thanks for all your help in getting to the root of this (and for introducing me to DotNetBenchmark). Since this is already known, I will close this issue.

@HeinziAT HeinziAT closed this as completed Jul 6, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2023
@stephentoub
Copy link
Member

stephentoub commented Jul 6, 2023

@tarekgh, is this expected? That's a huge gap based on which culture is used for the comparison (vs each other, not vs ordinal)

@GerardSmit
Copy link

@stephentoub there is an issue about the performance loss of switching to ICU: #40942

I also did see that Windows 10 uses an old version of ICU (#40942 (comment)), so I was curious if my installation also has the same results.

My installation of Windows 11 22H2 uses 68.2.0.10.
When running the benchmark @HeinziAT provided I'm getting the following results (.NET 7.0.5):

Method culture Mean Error StdDev
CultureSensitive 14.356 ns 0.1878 ns 0.1568 ns
Ordinal 6.042 ns 0.0388 ns 0.0324 ns
CultureSensitive de-DE 584.932 ns 2.5675 ns 2.2760 ns
Ordinal de-DE 6.066 ns 0.0835 ns 0.0781 ns
CultureSensitive en-GB 14.324 ns 0.0573 ns 0.0508 ns
Ordinal en-GB 6.063 ns 0.0743 ns 0.0695 ns
CultureSensitive en-US 14.648 ns 0.3151 ns 0.3628 ns
Ordinal en-US 6.024 ns 0.0322 ns 0.0301 ns
CultureSensitive fr-FR 587.005 ns 2.2767 ns 2.1296 ns
Ordinal fr-FR 6.057 ns 0.0545 ns 0.0455 ns
CultureSensitive it-IT 583.164 ns 2.3537 ns 2.2016 ns
Ordinal it-IT 5.999 ns 0.0102 ns 0.0080 ns

So having a newer version of ICU sadly doesn't help.

@tarekgh
Copy link
Member

tarekgh commented Jul 6, 2023

is this expected? That's a huge gap based on which culture is used for the comparison (vs each other, not vs ordinal)

If you remember, we have added manual optimization code for en cultures when handling ASCII characters. This is because en cultures doesn't have any special behavior with these characters. So, the difference in the perf here is mostly because of this optimization.
We know ICU is slow in some of string search operations and we even worked with the owners to optimize it more. So far that is the best we have gotten. We are keeping the other perf issue open to look more if we can find a better solution or optimize more for other non-English cultures.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants