Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector.Sum(Vector<T>) API implementation for horizontal add. #53527

Merged
merged 14 commits into from
Jun 11, 2021

Conversation

zlatanov
Copy link
Contributor

@zlatanov zlatanov commented Jun 1, 2021

This PR adds implementation for the proposed Vector.Sum(Vector<T>) API.

At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it.

Closes #35626

//cc @tannergooding @Sergio0694

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Jun 1, 2021

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds implementation for the proposed Vector.Sum(Vector<T>) API.

At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it.

Closes #35626

//cc @tannergooding @Sergio0694

Author: zlatanov
Assignees: -
Labels:

area-System.Numerics, new-api-needs-documentation

Milestone: -

@tannergooding
Copy link
Member

At the moment the 128bit version depends on SSSE3 and falls back to software implementation if missing. I can add hardware accelerated version using SSE2 (shuffles and adds), but doing this will add at least 100 lines of code in the JIT. Let me know if it's worth it and I will add it.

I think requiring SSSE3 is fine, this will be basically any CPU from 2006 onwards, and at least according to the Steam Hardware Survey is 99.17% of all CPUs. I imagine this could be even higher for enterprise and cloud scenarios.

@zlatanov
Copy link
Contributor Author

zlatanov commented Jun 2, 2021

@tannergooding for some reason one of the tests fails on ARM64 for uint.

Assert failure(PID 93 [0x0000005d], Thread: 112 [0x0070]): Assertion failed 'isVectorRegister(reg1)' in 'System.Numerics.Tests.GenericVectorTests:TestSum(System.Func`2[[System.UInt32[], System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.UInt32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]])' during 'Generate code' (IL size 52)

File: /__w/1/s/src/coreclr/jit/emitarm64.cpp Line: 4387

I am unfamiliar with the assertions there. Can you provide assistance to figure out what is wrong, please?

@tannergooding
Copy link
Member

Thanks again for all the work here, this is looking like its near completion with the last couple comments and should hopefully pass CI here soon <3

@zlatanov
Copy link
Contributor Author

zlatanov commented Jun 3, 2021

FYI, the auto formatting settings and using Visual Studio constantly change a few of the lines that aren't related to this PR and are in conflict with CI checks. See this undo commit for example 1b484d7.

Is something I'm doing wrong? I have muscle memory for "Format Document" and use it from time to time to make sure the formatting is correct. Should I not use it when working here, and instead rely on "Format Section"? How do you work here and make sure the code is formatted accordingly to style rules?

@tannergooding
Copy link
Member

I have muscle memory for "Format Document" and use it from time to time to make sure the formatting is correct

This sounds like a disconnect between the clangformat settings and what VS is deciding. An issue should probably get logged so this can be fixed.

To setup things, I'd do:

  1. Clone https://github.com/dotnet/jitutils
  2. Build using bootstrap.cmd
  3. Add <jitutils-repo-root>/bin to your path

Then, any time you need to format you can simply run: jit-format -c <runtime-repo-root>\src\coreclr -f

@tannergooding
Copy link
Member

CC. @dotnet/jit-contrib, @echesakovMSFT; this community PR should be ready to review now.

@tannergooding
Copy link
Member

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@tannergooding
Copy link
Member

Ping @dotnet/jit-contrib, @echesakovMSFT. Community PR should be ready for review.

@echesakov echesakov self-requested a review June 9, 2021 00:14
@echesakov
Copy link
Contributor

@tannergooding Thanks for the ping. I will take a look tomorrow.

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your change @zlatanov!

I have some suggestions how to improve usage of arm64 intrinsics

src/coreclr/jit/simdashwintrinsic.cpp Show resolved Hide resolved
}
case TYP_LONG:
case TYP_ULONG:
case TYP_FLOAT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TYP_LONG TYP_ULONG and TYP_DOUBLE we should avoid cloning op1 by emitting AddPairwiseScalar intrinsics instead (these correspond to addp faddp (scalar) see https://developer.arm.com/architectures/instruction-sets/intrinsics/vpaddd_f64) that operates on a single 16-byte vector.

@zlatanov zlatanov requested a review from echesakov June 10, 2021 16:04
@ghost
Copy link

ghost commented Jun 10, 2021

Hello @tannergooding!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost removed the auto-merge label Jun 10, 2021
@zlatanov
Copy link
Contributor Author

zlatanov commented Jun 11, 2021

@tannergooding I don't think the pipeline failures are related to this PR.

Edit:

Actually, some of the failures do seem to be related here. See https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-53527-merge-2c257229936a4cea87/System.Numerics.Vectors.Tests/console.eb12607d.log?sv=2019-07-07&se=2021-07-01T08%3A05%3A12Z&sr=c&sp=rl&sig=vtS0xe9l4%2FFLeluPICLMiAigJBqjabpwigm46IuCIm0%3D

I don't see what could ne causing NullReferenceException in the tests though.

@tannergooding
Copy link
Member

I don't see what could ne causing NullReferenceException in the tests though.

Given it was on OSX and passed on rerun, it might be an issue with hardware that has AVX but not AVX2. However, I don't see anything that would be causing it in this PR.

I'll keep an eye on CI for additional flakiness and see if I can repro locally as well.

@tannergooding tannergooding merged commit 6afe03e into dotnet:main Jun 11, 2021
@zlatanov zlatanov deleted the vector-sum branch July 2, 2021 14:55
@ghost ghost locked as resolved and limited conversation to collaborators Aug 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Vector.Sum(Vector<T>) API for horizontal add
5 participants