Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ascii.Equals and Ascii.EqualsIgnoreCase #84886

Merged
merged 6 commits into from Apr 22, 2023
Merged

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Apr 15, 2023

fixes #84887

@ghost ghost assigned adamsitnik Apr 15, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners new-api-needs-documentation labels Apr 15, 2023
@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@adamsitnik adamsitnik added area-System.Text.Encoding and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 15, 2023
@ghost
Copy link

ghost commented Apr 15, 2023

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: adamsitnik
Assignees: adamsitnik
Labels:

area-System.Text.Encoding, new-api-needs-documentation

Milestone: -


namespace System.Text.Tests
{
public static class EqualsTests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a could tests that show real-world use cases like what we see in networking or aspnetcore, comparing against a constant. Such tests could use a non-ascii utf8 sequence that could match something provided by client input in one of those scenarios.

for (int i = 0; i < right.Length; i++)
{
char c = right[i];
byte b = left[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about the bound check here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've already confirmed left.Length == right.Length above, I think we're OK without another bounds check here.

{
Debug.Assert(left.Length == right.Length);

for (int i = 0; i < left.Length; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is no vectorization needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps vectorization here would be a further perf opportunity that was missed. I recommend we proceed with mering this without it and potentially follow up with vectorizing it in a separate PR.

}
else
{
(Vector64<ushort> lower, Vector64<ushort> upper) = Vector64.Widen(bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Vector64 uses the software fallback on x64.
crossgen2 with main-branch shows that too.

I don't know if there are plans or WIP to hw-accelerate Vector64 in the .NET 8 timeframe.
Otherwise maybe read the value as long into a Vector128 and do the widening then. Something like

static Vector128<ushort> LoadAndWiden(ref byte ptr)
{
    if (AdvSimd.IsSupported)
    {
        Vector64<byte> vec64 = Vector64.LoadUnsafe(ref ptr);
        return AdvSimd.ZeroExtendWideningLower(vec64);
    }
    else if (Sse2.IsSupported)
    {
        ulong value = Unsafe.ReadUnaligned<ulong>(ref ptr);
        Vector128<byte> vec = Vector128.CreateScalarUnsafe(value).AsByte();
        return Sse2.UnpackLow(vec, Vector128<byte>.Zero).AsUInt16();
    }
    else
    {
        Vector64<byte> vec64 = Vector64.LoadUnsafe(ref ptr);
        (Vector64<ushort> lower, Vector64<ushort> upper) = Vector64.Widen(vec64);
        return Vector128.Create(lower, upper);
    }
}

}

private static bool VectorContainsNonAsciiChar(Vector64<byte> bytes)
=> !Utf8Utility.AllBytesInUInt64AreAscii(bytes.AsUInt64().ToScalar());
Copy link
Member

@gfoidl gfoidl Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern here for Vector64. Pass in ulong as Utf8Utility.AllBytesInUInt64AreAscii expects that.
So get rid of Vector64 where possible and use ulong instead?

Edit: the vector is stored to stack, then the ulong read from there. Not ideal, but not too bad.

Co-authored-by: Günther Foidl <gue@korporal.at>
@adamsitnik adamsitnik marked this pull request as ready for review April 21, 2023 18:23
@adamsitnik
Copy link
Member Author

@gfoidl thank you for your suggestions and ideas! I won't have the time to implement all perf improvements now (I started paternity leave very recently and now I just want to get this issue off my TODO list). Would you be interested in sending a PR with improvements once this PR is merged?

@gfoidl
Copy link
Member

gfoidl commented Apr 21, 2023

Would you be interested in sending a PR with improvements once this PR is merged?

Yep, I can do the follow up.

paternity leave

Congratulations!

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @adamsitnik requested we merge this without the vectorization improvements recommended by @gfoidl so that we can get in ahead of the Preview 4 snap (and while Adam is unavailable), and then potentially follow up with an additional PR to improve the vectorization.

@gfoidl -- would you like to submit a follow-up PR with those recommendations you made?

for (int i = 0; i < right.Length; i++)
{
char c = right[i];
byte b = left[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've already confirmed left.Length == right.Length above, I think we're OK without another bounds check here.

{
Debug.Assert(left.Length == right.Length);

for (int i = 0; i < left.Length; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps vectorization here would be a further perf opportunity that was missed. I recommend we proceed with mering this without it and potentially follow up with vectorizing it in a separate PR.

@gfoidl
Copy link
Member

gfoidl commented Apr 21, 2023

@jeffhandley yes I'll do a follow up PR to address the perf-points (maybe soon, otherwise in about 2 weeks as rough timeframe).
Bring this PR in...

@jeffhandley
Copy link
Member

Awesome; thank you! I'll merge when green.

@jeffhandley jeffhandley merged commit 880d44c into dotnet:main Apr 22, 2023
167 checks passed
gfoidl added a commit to gfoidl/dotnet-runtime that referenced this pull request May 8, 2023
@gfoidl
Copy link
Member

gfoidl commented May 8, 2023

@jeffhandley follow up PR is here: #85926

@dotnet dotnet locked as resolved and limited conversation to collaborators Jun 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ascii.Equals
5 participants