Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use double VectorTableLookup on ARM in ProbabilisticMap #85189

Merged
merged 3 commits into from
Apr 24, 2023

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Apr 22, 2023

Contributes to #84328

Before

Method Mean Error
IndexOfAnyProb 3.776 µs 0.0024 µs

After

Method Mean Error
IndexOfAnyProb 2.576 µs 0.0011 µs

@MihaZupan MihaZupan added this to the 8.0.0 milestone Apr 22, 2023
@MihaZupan MihaZupan self-assigned this Apr 22, 2023
@ghost
Copy link

ghost commented Apr 22, 2023

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #84328

When I vectorized this path in #80963, I ran benchmarks on ARM with the initial version of the PR, but later regressed it with 2295370 by using the >>> operator, which apparently isn't turned into AdvSimd.ShiftRightLogical. The first commit 3bfd7d3 fixes that, changing perf from ~36 µs back to ~3.7 µs on this benchmark. cc: @tannergooding

The change to use VectorTableLookup with two table vectors further improves performance:

Before

Method Mean Error
IndexOfAnyProb 3.776 µs 0.0024 µs

After

Method Mean Error
IndexOfAnyProb 2.576 µs 0.0011 µs
Author: MihaZupan
Assignees: MihaZupan
Labels:

area-System.Buffers

Milestone: 8.0.0

@danmoseley
Copy link
Member

the >>> operator, which apparently isn't turned into AdvSimd.ShiftRightLogical.

Bug?

@MihaZupan
Copy link
Member Author

Likely just not implemented yet, #75770 may be the right tracking issue for that.

@kunalspathak
Copy link
Member

cc: @a74nh

@a74nh
Copy link
Contributor

a74nh commented Apr 24, 2023

Code looks good to me. Out of curiosity, do you know roughly how much of the performance gain comes from the VectorTableLookup and how much comes from the ShiftRightLogical ?

@tannergooding
Copy link
Member

but later regressed it with 2295370 by using the >>> operator, which apparently isn't turned into AdvSimd.ShiftRightLogical

Can you log a bug for this. Looks like a simple fix as we should just need to recognize the operators in addition to the named methods: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicarm64.cpp#L1520-L1555

@MihaZupan
Copy link
Member Author

how much of the performance gain comes from the VectorTableLookup

The 3.776 µs => 2.576 µs change on this benchmark is entirely due to VectorTableLookup.

Can you log a bug for this.

#85257

@kunalspathak
Copy link
Member

The 3.776 µs => 2.576 µs change on this benchmark is entirely due to VectorTableLookup.

That's 30% gain. Good to see that.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MihaZupan
Copy link
Member Author

Failures are known according to Build Analysis

@MihaZupan MihaZupan merged commit 28bd253 into dotnet:main Apr 24, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants