-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Improvements to the "Scale" SIMD algorithm #1143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Results in some minor perf improvements for the
|
|
@tannergooding can you please review it ? |
|
@Anipik are before and after numbers backwards? They all got worse (albeit not necessarily significantly)\ Also are these perf tests of aligned or unaligned (or a random mixture) |
|
There is a lot of fiddly code -- how good is the code coverage (not sure how easy it is to measure in this repo -- worst case you can set lots of breakpoints and clear them as you run tests) |
|
Oh I see most of it is from Tanner's |
The data could be aligned or unaligned. The before algorithm was considering data to be always unaligned.
I have tested all the code paths using breakpoints. But I will add more tests before the MatMul change because in that change there are lot of cases. No much cases in scale algorithm.
Yes, tanners change acts as a blueprint. There will be always these four code paths. just the code and operations in them will slightly vary.
No the numbers are correct. yeah but they are within the error range |
tannergooding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM
|
@TomFinley @eerhardt can you also take a look ? I need one more approval |
|
@danmosemsft I updated the numbers, there is slight improvement in the numbers in most of the runs. |
| using System.Runtime.CompilerServices; | ||
| using System.Runtime.Intrinsics; | ||
| using System.Runtime.Intrinsics.X86; | ||
| using nuint = System.UInt64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we aren't just using ulong here. ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realistically, we should #ifdef this and use System.UInt32 on 32-bit systems and System.UInt64 on 64-bit systems (since C# still doesn't expose the appropriate operators for System.IntPtr and System.UIntPtr).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have the capability of building architecture-specific IL today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what does nuint stand for?
| remainder = length; | ||
| } | ||
|
|
||
| if (remainder != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be outside the initial if/else block so that it is always called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we can enforce this path for some tests, as this mistake was not picked up by the unittests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure how to get "normal" .NET code into this situation.
Potentially making a byte[], and then casting a float* into an unaligned portion of that byte[] so the pointer is pointing to an odd memory address?
|
@tannergooding @eerhardt can you please take another look ? I have addressed the feedback |
eerhardt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
For inputs with fewer elements than can fit in the Vector type, it falls back to scalar code.
For inputs that are not naturally aligned (the alignment is not a multiple of 4), it does exclusively unaligned loads
For all other inputs, it will do at most two unaligned loads (one each for any leading/trailing unaligned elements) and all other loads will be aligned.
cc @eerhardt @tannergooding @danmosemsft