Investigate GetUnicodeCategory performance for non-ASCII characters#124320
Investigate GetUnicodeCategory performance for non-ASCII characters#124320
Conversation
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
|
@copilot Validate the performance improvement using performance-benchmark skill |
Running benchmark to validate the performance improvement: @EgorBot -x64 -arm using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser]
public class Bench
{
private char _hebrewChar;
private char _arabicChar;
private char _chineseChar;
private char _cyrillicChar;
[GlobalSetup]
public void Setup()
{
_hebrewChar = '\x05D0'; // Hebrew letter Aleph (from original issue)
_arabicChar = '\x0623'; // Arabic letter Alef with Hamza above
_chineseChar = '\u4E00'; // CJK Unified Ideograph
_cyrillicChar = '\u0410'; // Cyrillic Capital Letter A
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Hebrew()
{
return char.GetUnicodeCategory(_hebrewChar);
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Arabic()
{
return char.GetUnicodeCategory(_arabicChar);
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Chinese()
{
return char.GetUnicodeCategory(_chineseChar);
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Cyrillic()
{
return char.GetUnicodeCategory(_cyrillicChar);
}
}This will compare the PR against the base branch to confirm the ~1-3ns improvement for non-ASCII characters. |
|
@copilot Analyze why the fix has not produced the expected improvements. |
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
I analyzed why the initial fix didn't produce the expected improvement. The issue was that Root cause: While the fix correctly avoids the Solution: Added This ensures that:
The change is in commit ca339b9. Please re-run the benchmark to verify the improvement. |
|
@copilot Run the above microbenchmark #124320 (comment) using performance-benchmark skill |
Running the benchmark with the updated code (including AggressiveInlining): @EgorBot -x64 -arm using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser]
public class Bench
{
private char _hebrewChar;
private char _arabicChar;
private char _chineseChar;
private char _cyrillicChar;
[GlobalSetup]
public void Setup()
{
_hebrewChar = '\x05D0'; // Hebrew letter Aleph (from original issue)
_arabicChar = '\x0623'; // Arabic letter Alef with Hamza above
_chineseChar = '\u4E00'; // CJK Unified Ideograph
_cyrillicChar = '\u0410'; // Cyrillic Capital Letter A
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Hebrew()
{
return char.GetUnicodeCategory(_hebrewChar);
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Arabic()
{
return char.GetUnicodeCategory(_arabicChar);
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Chinese()
{
return char.GetUnicodeCategory(_chineseChar);
}
[Benchmark]
public UnicodeCategory GetUnicodeCategory_Cyrillic()
{
return char.GetUnicodeCategory(_cyrillicChar);
}
}This will verify the ~1-3ns improvement now that both fixes are in place (char overload + AggressiveInlining). |
|
Also, @EgorBo, it looks like the bot is including the wrong link in the comment it posts. |
The EgorBot benchmark results show the changes caused a 1.88x regression on x64 (AMD EPYC), not the expected improvement. Main: ~0.29ns, PR: ~0.55ns per call for non-ASCII chars. On ARM64 results were neutral (ratio ~1.00). Reverting both changes: - Char.cs: restore call to CharUnicodeInfo.GetUnicodeCategory((int)c) - CharUnicodeInfo.cs: remove AggressiveInlining attribute Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
I reviewed the benchmark results from both EgorBot runs (#612, #613). Both runs show the changes caused a regression, not an improvement: x64 (AMD EPYC 9V74): PR is 1.88x slower across all characters:
ARM64 (Neoverse-N2): Neutral (ratio ~0.98-1.02). Adding The JIT is already optimizing the I've reverted all changes in 8a07341. The original regression from the issue appears to have already been resolved by subsequent JIT/runtime optimizations. |
Description
Investigated the reported 1-3ns regression in
char.GetUnicodeCategory(char)for non-ASCII characters since .NET 5.0. The original issue suggested callingCharUnicodeInfo.GetUnicodeCategory(char)instead ofCharUnicodeInfo.GetUnicodeCategory((int)c)to avoid a redundantIsValidCodePointcheck.Changes
Two approaches were attempted and benchmarked via EgorBot:
Char.GetUnicodeCategory(char c)to callCharUnicodeInfo.GetUnicodeCategory(c)(char overload) instead ofCharUnicodeInfo.GetUnicodeCategory((int)c)(int overload)[MethodImpl(MethodImplOptions.AggressiveInlining)]toCharUnicodeInfo.GetUnicodeCategory(char)Both changes were reverted after benchmark results showed they caused a regression rather than an improvement.
Benchmark Results
EgorBot results (run 1, run 2):
x64 (AMD EPYC 9V74): PR was 1.88x slower across all non-ASCII characters:
ARM64 (Neoverse-N2): Neutral (ratio ~0.98-1.02).
Adding
AggressiveInliningmade no difference — both runs produced identical x64 results.Analysis
The JIT already optimizes the
(int)c→CharUnicodeInfo.GetUnicodeCategory(int)path extremely well. TheIsValidCodePointcheck is trivially eliminated for char-width values, and the resulting code is more efficient than routing through the char overload. Main's ~0.29ns is already ~10x faster than the 3-4ns the original issue reported for .NET 5.0, indicating the regression has already been resolved by subsequent JIT/runtime optimizations.All changes have been reverted. No code modifications remain in this PR.
Original prompt
This section details on the original issue you should resolve
<issue_title>Minor regression in System.Tests.Perf_Char.GetUnicodeCategory for non-ascii characters</issue_title>
<issue_description>I am now running an independent 3.1 vs 5.0 comparison and I've confirmed that it's actual regression introduced in 5.0.
The issue has been originally reported by a bot owned by @DrewScoggins in DrewScoggins/performance-2#574
Repro
Details
System.Tests.Perf_Char.GetUnicodeCategory(c: '?')
No regressions for ascii chars:
System.Tests.Perf_Char.GetUnicodeCategory(c: '.')
System.Tests.Perf_Char.GetUnico...
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.