New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ascii.Equals and Ascii.EqualsIgnoreCase #84886
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue Detailsnull
|
src/libraries/System.Net.Http/src/System/Net/Http/ByteArrayHelpers.cs
Outdated
Show resolved
Hide resolved
|
||
namespace System.Text.Tests | ||
{ | ||
public static class EqualsTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding a could tests that show real-world use cases like what we see in networking or aspnetcore, comparing against a constant. Such tests could use a non-ascii utf8 sequence that could match something provided by client input in one of those scenarios.
for (int i = 0; i < right.Length; i++) | ||
{ | ||
char c = right[i]; | ||
byte b = left[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about the bound check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've already confirmed left.Length == right.Length
above, I think we're OK without another bounds check here.
src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Equality.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Equality.cs
Outdated
Show resolved
Hide resolved
{ | ||
Debug.Assert(left.Length == right.Length); | ||
|
||
for (int i = 0; i < left.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is no vectorization needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps vectorization here would be a further perf opportunity that was missed. I recommend we proceed with mering this without it and potentially follow up with vectorizing it in a separate PR.
} | ||
else | ||
{ | ||
(Vector64<ushort> lower, Vector64<ushort> upper) = Vector64.Widen(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC Vector64
uses the software fallback on x64.
crossgen2 with main-branch shows that too.
I don't know if there are plans or WIP to hw-accelerate Vector64 in the .NET 8 timeframe.
Otherwise maybe read the value as long into a Vector128 and do the widening then. Something like
static Vector128<ushort> LoadAndWiden(ref byte ptr)
{
if (AdvSimd.IsSupported)
{
Vector64<byte> vec64 = Vector64.LoadUnsafe(ref ptr);
return AdvSimd.ZeroExtendWideningLower(vec64);
}
else if (Sse2.IsSupported)
{
ulong value = Unsafe.ReadUnaligned<ulong>(ref ptr);
Vector128<byte> vec = Vector128.CreateScalarUnsafe(value).AsByte();
return Sse2.UnpackLow(vec, Vector128<byte>.Zero).AsUInt16();
}
else
{
Vector64<byte> vec64 = Vector64.LoadUnsafe(ref ptr);
(Vector64<ushort> lower, Vector64<ushort> upper) = Vector64.Widen(vec64);
return Vector128.Create(lower, upper);
}
}
} | ||
|
||
private static bool VectorContainsNonAsciiChar(Vector64<byte> bytes) | ||
=> !Utf8Utility.AllBytesInUInt64AreAscii(bytes.AsUInt64().ToScalar()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here for Vector64. Pass in ulong
as Utf8Utility.AllBytesInUInt64AreAscii
expects that.
So get rid of Vector64 where possible and use ulong
instead?
Edit: the vector is stored to stack, then the ulong
read from there. Not ideal, but not too bad.
Co-authored-by: Günther Foidl <gue@korporal.at>
@gfoidl thank you for your suggestions and ideas! I won't have the time to implement all perf improvements now (I started paternity leave very recently and now I just want to get this issue off my TODO list). Would you be interested in sending a PR with improvements once this PR is merged? |
Yep, I can do the follow up.
Congratulations! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @adamsitnik requested we merge this without the vectorization improvements recommended by @gfoidl so that we can get in ahead of the Preview 4 snap (and while Adam is unavailable), and then potentially follow up with an additional PR to improve the vectorization.
@gfoidl -- would you like to submit a follow-up PR with those recommendations you made?
for (int i = 0; i < right.Length; i++) | ||
{ | ||
char c = right[i]; | ||
byte b = left[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've already confirmed left.Length == right.Length
above, I think we're OK without another bounds check here.
{ | ||
Debug.Assert(left.Length == right.Length); | ||
|
||
for (int i = 0; i < left.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps vectorization here would be a further perf opportunity that was missed. I recommend we proceed with mering this without it and potentially follow up with vectorizing it in a separate PR.
@jeffhandley yes I'll do a follow up PR to address the perf-points (maybe soon, otherwise in about 2 weeks as rough timeframe). |
Awesome; thank you! I'll merge when green. |
@jeffhandley follow up PR is here: #85926 |
fixes #84887