Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for Guid.Equals using SSE #53012

Closed
wants to merge 2 commits into from
Closed

Performance improvement for Guid.Equals using SSE #53012

wants to merge 2 commits into from

Conversation

bill-poole
Copy link

Fixes #52296.

Baseline performance:

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
EqualsSame 1.785 ns 0.0558 ns 0.0494 ns 1.791 ns 1.702 ns 1.849 ns - - - -
EqualsOperator 1.709 ns 0.0133 ns 0.0125 ns 1.711 ns 1.687 ns 1.730 ns - - - -
NotEqualsOperator 1.728 ns 0.0188 ns 0.0167 ns 1.727 ns 1.701 ns 1.749 ns - - - -

Updated performance:

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
EqualsSame 0.3771 ns 0.0073 ns 0.0061 ns 0.3786 ns 0.3650 ns 0.3852 ns - - - -
EqualsOperator 0.4260 ns 0.0101 ns 0.0095 ns 0.4285 ns 0.4060 ns 0.4413 ns - - - -
NotEqualsOperator 0.5709 ns 0.0349 ns 0.0327 ns 0.5577 ns 0.5437 ns 0.6605 ns - - - -

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 20, 2021
@dnfadmin
Copy link

dnfadmin commented May 20, 2021

CLA assistant check
All CLA requirements met.

Comment on lines +820 to +821
var result = Sse2.CompareEqual(g1, g2);
return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var result = Sse2.CompareEqual(g1, g2);
return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;
return g1.Equals(g2);

Results in same codegen and is simpler to read.
If this hinders inlining, etc. keep it as is (except var -> concrete type).

Ideally the JIT should emit code that takes SSE4.1's _mm_testz_si128 into account?
(Iif this variant is always faster on the supported cpus)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector128<T>.Equals(Vector128<T>) currently does not have a specific/optimized code path for SSE 4.1's _mm_testz_si128, which I why I called out an explicit code path for SSE 4.1 in Guid.EqualsCore.

But yes, under the SSE2 code path, it should be equivalent to invoke g1.Equals(g2) on the assumption that the JIT will inline the call, which it should because Vector128<T>.Equals(Vector128<T>) is decorated with "aggressive inlining".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently does not have a specific/optimized code path for SSE 4.1's

Yeah, my question was intented for @tannergooding (?) to have a look at this on the JIT-side 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the JIT should emit code that takes SSE4.1's _mm_testz_si128 into account?

There are a few improvements that could happen for Vector128.Equals. I've been waiting on #49397 before tackling any of them, however.

ref int rB = ref Unsafe.AsRef(in right._a);
if (Sse2.IsSupported)
{
var g1 = Unsafe.As<Guid, Vector128<byte>>(ref Unsafe.AsRef(left));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: use explicit types, not var, to follow the coding guidelines in this repo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated accordingly.


// Compare each element
static bool SoftwareFallback(in Guid left, in Guid right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re in: just to double-check the discussion in the issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to take direction on this either way. I have found in my performance tests thus far that passing a Guid by reference outperforms passing by value - I suspect because the JIT has difficulty keeping the Guid parameters in registers because they have 11 fields.

&& Unsafe.Add(ref rA, 3) == Unsafe.Add(ref rB, 3);
// Compare each element

return rA == rB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we special case 64-bits processes by comparing with longs?
E.g. on ARM 64 this code will be hit.
Cf. #35654

if (Environment.Is64BitProcess)
{
    // code with long
}
else
{
    // code with int
}

The "fast out behavior" should be given with longs too, but I'm not sure how about alignment on ARM 64.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to update with a special case for 64-bit software fallback if desired. The current Guid.EqualsCore method doesn't cater for this, but that's no reason to not cater for it now.

private static bool EqualsCore(in Guid left, in Guid right)
{
ref int rA = ref Unsafe.AsRef(in left._a);
ref int rB = ref Unsafe.AsRef(in right._a);
if (Sse2.IsSupported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ARM64? We care about ARM64 as much as we care about x64.

Copy link
Contributor

@FilipToth FilipToth May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if SSE2 is not supported, we can just use a method without SSE2? Much like what we have been doing so far. But still, in most cases SSE2is supported.

@jkotas
Copy link
Member

jkotas commented May 20, 2021

FWIW, I think that this is way too complex change for the benefit that it provides. I think it would be better to wait for the general Vector128 improvements.

@sandreenko sandreenko removed the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2021
@stephentoub stephentoub added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jun 11, 2021
@ghost
Copy link

ghost commented Jun 14, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #52296.

Baseline performance:

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
EqualsSame 1.785 ns 0.0558 ns 0.0494 ns 1.791 ns 1.702 ns 1.849 ns - - - -
EqualsOperator 1.709 ns 0.0133 ns 0.0125 ns 1.711 ns 1.687 ns 1.730 ns - - - -
NotEqualsOperator 1.728 ns 0.0188 ns 0.0167 ns 1.727 ns 1.701 ns 1.749 ns - - - -

Updated performance:

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
EqualsSame 0.3771 ns 0.0073 ns 0.0061 ns 0.3786 ns 0.3650 ns 0.3852 ns - - - -
EqualsOperator 0.4260 ns 0.0101 ns 0.0095 ns 0.4285 ns 0.4060 ns 0.4413 ns - - - -
NotEqualsOperator 0.5709 ns 0.0349 ns 0.0327 ns 0.5577 ns 0.5437 ns 0.6605 ns - - - -
Author: billpoole-mi
Assignees: -
Labels:

* NO MERGE *, area-System.Runtime

Milestone: -

@terrajobst terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021
@jeffhandley
Copy link
Member

FWIW, I think that this is way too complex change for the benefit that it provides. I think it would be better to wait for the general Vector128 improvements.

Based on this comment and the lack of activity on this PR since, I'm going to go ahead and close it. Thanks for submitting the PR @billpoole-mi; it led to an interesting and valuable discussion here.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime community-contribution Indicates that the PR has been added by a community member NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance improvement for Guid.Equals