Skip to content

Conversation

@Anipik
Copy link
Contributor

@Anipik Anipik commented Oct 3, 2018

For inputs with fewer elements than can fit in the Vector type, it falls back to scalar code.
For inputs that are not naturally aligned (the alignment is not a multiple of 4), it does exclusively unaligned loads
For all other inputs, it will do at most two unaligned loads (one each for any leading/trailing unaligned elements) and all other loads will be aligned.

cc @eerhardt @tannergooding @danmosemsft

@Anipik
Copy link
Contributor Author

Anipik commented Oct 3, 2018

Results in some minor perf improvements for the Microsoft.ML.CpuMath.PerformanceTests

Before
Method Mean Error StdDev
AvxScaleU 182.9 us 2.029 us 1.898 us
NativeScaleU 173.5 us 0.3190 us 0.2955 us
SseScaleU 173.7 us 0.3903 us 0.3460 us
After
Method Mean Error StdDev
AvxScale 180.3 us 2.860 us 2.676 us
NativeScale 173.1 us 1.082 us 0.8446 us
SseScale 172.6 us 0.9265 us 0.8213 us

@Anipik
Copy link
Contributor Author

Anipik commented Oct 5, 2018

@tannergooding can you please review it ?

@danmoseley
Copy link
Member

@Anipik are before and after numbers backwards? They all got worse (albeit not necessarily significantly)\

Also are these perf tests of aligned or unaligned (or a random mixture)

@danmoseley
Copy link
Member

There is a lot of fiddly code -- how good is the code coverage (not sure how easy it is to measure in this repo -- worst case you can set lots of breakpoints and clear them as you run tests)

@danmoseley
Copy link
Member

Oh I see most of it is from Tanner's

@Anipik
Copy link
Contributor Author

Anipik commented Oct 5, 2018

Also are these perf tests of aligned or unaligned (or a random mixture)

The data could be aligned or unaligned. The before algorithm was considering data to be always unaligned.

how good is the code coverage

I have tested all the code paths using breakpoints. But I will add more tests before the MatMul change because in that change there are lot of cases. No much cases in scale algorithm.

Oh I see most of it is from Tanner's

Yes, tanners change acts as a blueprint. There will be always these four code paths. just the code and operations in them will slightly vary.

are before and after numbers backwards? They all got worse (albeit not necessarily significantly)\

No the numbers are correct. yeah but they are within the error range

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM

@Anipik
Copy link
Contributor Author

Anipik commented Oct 8, 2018

@TomFinley @eerhardt can you also take a look ? I need one more approval

@Anipik
Copy link
Contributor Author

Anipik commented Oct 8, 2018

@danmosemsft I updated the numbers, there is slight improvement in the numbers in most of the runs.

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
using nuint = System.UInt64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we aren't just using ulong here. ??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically, we should #ifdef this and use System.UInt32 on 32-bit systems and System.UInt64 on 64-bit systems (since C# still doesn't expose the appropriate operators for System.IntPtr and System.UIntPtr).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have the capability of building architecture-specific IL today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what does nuint stand for?

remainder = length;
}

if (remainder != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be outside the initial if/else block so that it is always called

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can enforce this path for some tests, as this mistake was not picked up by the unittests ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how to get "normal" .NET code into this situation.

Potentially making a byte[], and then casting a float* into an unaligned portion of that byte[] so the pointer is pointing to an odd memory address?

@Anipik
Copy link
Contributor Author

Anipik commented Oct 10, 2018

@tannergooding @eerhardt can you please take another look ? I have addressed the feedback

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@eerhardt eerhardt merged commit 0b84350 into dotnet:master Oct 10, 2018
@Anipik Anipik deleted the scale branch December 11, 2018 19:54
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants