Perf: Implement branchless compare #13187

thinkbeforecoding · 2022-05-23T23:15:53Z

This is an implementation of #13098
It implements a branchless compare:
cgt x y - clt x y
When x > y: 1 - 0 = 1
When x < y: 0 - 1 = -1
When x = y: 0 - 0 = 0

Benchmarks show that it is very slightly slower (10%) when predictions are always correct, but it is 3x faster on random values where prediction fail more often

The code emitted by the JIT is the same size.

thinkbeforecoding · 2022-05-24T00:36:04Z

Some IL tests are failing due to the change.

There are some build errors on my machine (it seems to be new F# features like [x..] indexing... How can I build this locally ?

vzarytovskii · 2022-05-24T08:23:14Z

Some IL tests are failing due to the change.

There are some build errors on my machine (it seems to be new F# features like [x..] indexing... How can I build this locally ?

Build scripts (from the devguide) should work just fine. You can use it with noVisualStudio switch.

Devguide also describes how to update IL baselines (if test uses baseline files).

dsyme · 2022-05-24T16:34:50Z

I'm curious if we know that this is the fastest implementation. Is there any information on this on the web e.g. for C++ or assembly code?

dsyme · 2022-05-24T16:54:55Z

I'd like to see a systematic correctness test matrix for comparisons on all the basic types affected by this, including

MaxValue
MaxValue - 1
MinValue
MinValue + 1
0
1
-1

Also don't forget the char and bool types

src/FSharp.Core/prim-types.fs

thinkbeforecoding · 2022-05-24T18:21:02Z

There could be some improvements in the jit.
For instance, the cmp is done twice.
Since setgt setlt don't modify flags, both could be done with a single cmp.
And the double zero extension could be done after the sub as a sign extension.

thinkbeforecoding · 2022-05-24T20:20:50Z

I think the fastest version would be in x64
cmp eax ecx
setg al
setl dl
sub dl, al
movsx eax, dl

There is a 4 instr version found on SO

sub %1, %0
jno 1f
cmc
rcr %0
https://stackoverflow.com/questions/10996418/efficient-integer-compare-function

But it involve a conditional jump that will be slower on branch prediction. Moreover I see no way to make the jit compile something like this.

thinkbeforecoding · 2022-05-24T20:25:21Z

I was also looking at movcc (conditional move) instructions, but the source is a register or memory, it cannot be a constant. So it requires extra instructions to load constants and won't we shorter.
That could be interesting to implement min and max for instance.

thinkbeforecoding · 2022-05-25T09:04:31Z

After a check, the requirement on IComparable.CompareTo is:

when x > y, result is > 0
when x < y, result is < 0
when x=y result is 0

for shorter types (byte, short, char) it is implemented as:

int x - int y

This is valid since there is no risk of overflow. But the result is not necessarily 0,1,-1

For longer types, it uses conditional jumps and returns always 0,1,-1

Do we have that constraint on the compare function ?

thinkbeforecoding · 2022-05-25T09:18:52Z

Since compare fallbacks to ICompare.Compare for other types, there is no guarantee that the output is always 0,1,-1.

https://docs.microsoft.com/en-us/dotnet/api/system.collections.icomparer.compare?view=net-6.0#system-collections-icomparer-compare(system-object-system-object)

thinkbeforecoding · 2022-05-25T09:37:32Z

This would not be a breaking change at the specification level, but it would be a breaking change at the implementation level...

code like:
if compare x y = 1 then ... would be broken.

Same thing for code like:
ls |> List.sum (fun (x,y) -> compare x y)

But such code is not following the specification of compare.

src/FSharp.Core/prim-types.fs

vzarytovskii · 2022-05-25T10:38:11Z

tests/fsharp/Compiler/Language/ComparisonOptimizationTest.fs

@@ -0,0 +1,381 @@
+namespace FSharp.Compiler.UnitTests


It's fine in this PR, but for future, we suggest creating new tests in FSharp.Compiler.ComponentTests, it has nicer APIs and better support for IL baseline tests (instead of having an inline IL, it can be a separate file, which does not require recompiling the suite in case of change, and it is easier to re-generate using the env variable).

thinkbeforecoding · 2022-05-25T14:10:11Z

There are a few tests that compare generated IL to expected IL that fail due to new code generation.

How do you usually regenerate expected IL ?

vzarytovskii · 2022-05-25T16:25:29Z

There are a few tests that compare generated IL to expected IL that fail due to new code generation.

How do you usually regenerate expected IL ?

So, if they are inline (i.e. the IL itself is the string in the test case itself, then, unfortunately, only one-by-one).
If they're verified as part of DirectoryAttribute-style testing, then, an ENV variable can be set, and then run tests locally:|

// on Windows
set TEST_UPDATE_BSL=1
// or, on macOS/Linux
export TEST_UPDATE_BSL=1
// and then
build.cmd -testCoreClr
// or
build.sh --testcoreclr

(https://github.com/dotnet/fsharp/blob/main/DEVGUIDE.md#updating-baselines-in-tests)

thinkbeforecoding · 2022-05-26T12:07:24Z

There are a lot of culture dependent tests that fail on my french machine . Fixed it.
There was also a lot of over specific autogenerated tests for compare. They were checking for the exact response of compare... The check must be done on the sign, not the exact value. Fixed it.

thinkbeforecoding · 2022-05-26T13:46:15Z

Some of the builds fail but it doesn't seem related to the change.. Any idea?

vzarytovskii · 2022-05-26T16:49:43Z

Some of the builds fail but it doesn't seem related to the change.. Any idea?

The latest run failed because of the tests:
https://dev.azure.com/dnceng/public/_build/results?buildId=1791081&view=ms.vss-test-web.build-test-results-tab&runId=47885210&resultId=100293&paneView=debug

thinkbeforecoding · 2022-05-26T17:32:45Z

These are baseline tests, but they don't seem to be updated with the instructions above...

vzarytovskii · 2022-05-26T17:59:22Z

I've updated them + updated the guide, If I may ask, which shell were you using?

vzarytovskii · 2022-05-26T18:51:33Z

Great, now it's even more failures...I will take a look.

thinkbeforecoding · 2022-05-26T19:09:13Z

I was using powershell so I used
$env:TEST_UPDATE_BSL=1

vzarytovskii · 2022-05-26T19:34:31Z

I was using powershell so I used
$env:TEST_UPDATE_BSL=1

Hm, interesting. I will probably add a separate switch to the build scripts which runs tests and updates baselines.

vzarytovskii · 2022-05-27T15:08:13Z

@thinkbeforecoding I have updated 2 remaining baselines for net472 framework + updated devguide describing how it works.
To update them successfully, It requires some knowledge of how IL baselines work. I will add a separate command to build scripts for updating those (#13204)

The rest of the tests which are failing are checking the compare result.

thinkbeforecoding · 2022-06-07T21:45:11Z

When I used the command it said:
Commenter does not have sufficient privileges for PR 13187 in repo dotnet/fsharp

thinkbeforecoding · 2022-06-07T21:47:07Z

There are still some cancelled builds... strange

thinkbeforecoding · 2022-06-08T05:36:07Z

These canceled builds again.

KevinRansom

Nice, looks good. Thanks

vzarytovskii · 2022-06-08T11:49:02Z

These canceled builds again.

Yeah, some infra hiccups, I suppose.

thinkbeforecoding · 2022-06-09T12:50:33Z

@dsyme , I think this just needs your review, but should be good.

KevinRansom · 2022-06-13T16:29:37Z

@dsyme, this is good to go, do you want to review your change request.

dsyme · 2022-06-14T13:41:23Z

@thinkbeforecoding Thank you so much for this improvement!

forki · 2022-06-14T13:44:50Z

Wohooo! It landed. Awesome work!

EgorBo · 2024-02-18T11:37:40Z

NOTE: as of .NET 7.0 (or was it 8.0), JIT is expected to use branchless instructions even when F#/C# emits branches (recognizes cmov like idioms)

thinkbeforecoding · 2024-02-18T12:39:01Z

Yes I've noted this in some tests in fasmi. I think it's in dotnet 8.0

EgorBo · 2024-02-18T12:51:31Z

Sure, I just wanted to note that it's better when some sub-optimal codegen is filed against JIT rather than silently (for JIT team) fixed 🙂

thinkbeforecoding force-pushed the perf/branchlesscompare branch 3 times, most recently from df4f4bc to 98aba34 Compare May 24, 2022 00:15

dsyme mentioned this pull request May 24, 2022

Emit cgt, likewise clt, for comparisons dotnet/roslyn#61483

Closed

dsyme reviewed May 24, 2022

View reviewed changes

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved

dsyme reviewed May 24, 2022

View reviewed changes

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved

dsyme reviewed May 24, 2022

View reviewed changes

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved

vzarytovskii reviewed May 25, 2022

View reviewed changes

vzarytovskii force-pushed the perf/branchlesscompare branch from 9a53139 to d7021b3 Compare May 26, 2022 18:00

vzarytovskii reopened this Jun 7, 2022

thinkbeforecoding and others added 11 commits June 7, 2022 23:49

Perf: Implement branchless compare

de74c3e

Use simple subtraction for compare where possible

2696113

Fix typo in comments

003506e

Fix comparison for int16

b27c8e0

Fix baseline tests

6eefaf5

Updated tests baselines + guide for TEST_UPDATE_BSL

77b9893

Updated GenericComparison baselines + DEVGUIDE

463a2cf

Return -1/0/1 for compare

09f92a5

Use cgt-clt for byteOrder

c85b681

Fix compare tests to reflect actual emitted code

9787554

Constant optimizations for cgt/clt returning int value

f2bb4fa

thinkbeforecoding force-pushed the perf/branchlesscompare branch from 6f3ef91 to f2bb4fa Compare June 7, 2022 21:49

thinkbeforecoding closed this Jun 8, 2022

thinkbeforecoding reopened this Jun 8, 2022

KevinRansom approved these changes Jun 8, 2022

View reviewed changes

dsyme approved these changes Jun 13, 2022

View reviewed changes

Merge branch 'main' into perf/branchlesscompare

c694cee

dsyme merged commit a65ace7 into dotnet:main Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: Implement branchless compare #13187

Perf: Implement branchless compare #13187

thinkbeforecoding commented May 23, 2022

thinkbeforecoding commented May 24, 2022

vzarytovskii commented May 24, 2022

dsyme commented May 24, 2022

dsyme commented May 24, 2022 •

edited

thinkbeforecoding commented May 24, 2022

thinkbeforecoding commented May 24, 2022

thinkbeforecoding commented May 24, 2022

thinkbeforecoding commented May 25, 2022

thinkbeforecoding commented May 25, 2022

thinkbeforecoding commented May 25, 2022

vzarytovskii May 25, 2022

thinkbeforecoding commented May 25, 2022

vzarytovskii commented May 25, 2022 •

edited

thinkbeforecoding commented May 26, 2022

thinkbeforecoding commented May 26, 2022

vzarytovskii commented May 26, 2022

thinkbeforecoding commented May 26, 2022

vzarytovskii commented May 26, 2022

vzarytovskii commented May 26, 2022

thinkbeforecoding commented May 26, 2022

vzarytovskii commented May 26, 2022

vzarytovskii commented May 27, 2022

thinkbeforecoding commented Jun 7, 2022

thinkbeforecoding commented Jun 7, 2022

thinkbeforecoding commented Jun 8, 2022

KevinRansom left a comment

vzarytovskii commented Jun 8, 2022

thinkbeforecoding commented Jun 9, 2022

KevinRansom commented Jun 13, 2022

dsyme commented Jun 14, 2022

forki commented Jun 14, 2022

EgorBo commented Feb 18, 2024

thinkbeforecoding commented Feb 18, 2024

EgorBo commented Feb 18, 2024

Perf: Implement branchless compare #13187

Perf: Implement branchless compare #13187

Conversation

thinkbeforecoding commented May 23, 2022

thinkbeforecoding commented May 24, 2022

vzarytovskii commented May 24, 2022

dsyme commented May 24, 2022

dsyme commented May 24, 2022 • edited

thinkbeforecoding commented May 24, 2022

thinkbeforecoding commented May 24, 2022

thinkbeforecoding commented May 24, 2022

thinkbeforecoding commented May 25, 2022

thinkbeforecoding commented May 25, 2022

thinkbeforecoding commented May 25, 2022

vzarytovskii May 25, 2022

Choose a reason for hiding this comment

thinkbeforecoding commented May 25, 2022

vzarytovskii commented May 25, 2022 • edited

thinkbeforecoding commented May 26, 2022

thinkbeforecoding commented May 26, 2022

vzarytovskii commented May 26, 2022

thinkbeforecoding commented May 26, 2022

vzarytovskii commented May 26, 2022

vzarytovskii commented May 26, 2022

thinkbeforecoding commented May 26, 2022

vzarytovskii commented May 26, 2022

vzarytovskii commented May 27, 2022

thinkbeforecoding commented Jun 7, 2022

thinkbeforecoding commented Jun 7, 2022

thinkbeforecoding commented Jun 8, 2022

KevinRansom left a comment

Choose a reason for hiding this comment

vzarytovskii commented Jun 8, 2022

thinkbeforecoding commented Jun 9, 2022

KevinRansom commented Jun 13, 2022

dsyme commented Jun 14, 2022

forki commented Jun 14, 2022

EgorBo commented Feb 18, 2024

thinkbeforecoding commented Feb 18, 2024

EgorBo commented Feb 18, 2024

dsyme commented May 24, 2022 •

edited

vzarytovskii commented May 25, 2022 •

edited