Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: Implement branchless compare #13187

Merged
merged 12 commits into from Jun 14, 2022

Conversation

thinkbeforecoding
Copy link
Contributor

This is an implementation of #13098
It implements a branchless compare:
cgt x y - clt x y
When x > y: 1 - 0 = 1
When x < y: 0 - 1 = -1
When x = y: 0 - 0 = 0

Benchmarks show that it is very slightly slower (10%) when predictions are always correct, but it is 3x faster on random values where prediction fail more often

The code emitted by the JIT is the same size.

@thinkbeforecoding thinkbeforecoding force-pushed the perf/branchlesscompare branch 3 times, most recently from df4f4bc to 98aba34 Compare May 24, 2022 00:15
@thinkbeforecoding
Copy link
Contributor Author

Some IL tests are failing due to the change.

There are some build errors on my machine (it seems to be new F# features like [x..] indexing... How can I build this locally ?

@vzarytovskii
Copy link
Member

Some IL tests are failing due to the change.

There are some build errors on my machine (it seems to be new F# features like [x..] indexing... How can I build this locally ?

Build scripts (from the devguide) should work just fine. You can use it with noVisualStudio switch.

Devguide also describes how to update IL baselines (if test uses baseline files).

@dsyme
Copy link
Contributor

dsyme commented May 24, 2022

I'm curious if we know that this is the fastest implementation. Is there any information on this on the web e.g. for C++ or assembly code?

@dsyme
Copy link
Contributor

dsyme commented May 24, 2022

I'd like to see a systematic correctness test matrix for comparisons on all the basic types affected by this, including

  • MaxValue
  • MaxValue - 1
  • MinValue
  • MinValue + 1
  • 0
  • 1
  • -1

Also don't forget the char and bool types

@thinkbeforecoding
Copy link
Contributor Author

There could be some improvements in the jit.
For instance, the cmp is done twice.
Since setgt setlt don't modify flags, both could be done with a single cmp.
And the double zero extension could be done after the sub as a sign extension.

@thinkbeforecoding
Copy link
Contributor Author

I think the fastest version would be in x64
cmp eax ecx
setg al
setl dl
sub dl, al
movsx eax, dl

There is a 4 instr version found on SO

sub %1, %0
jno 1f
cmc
rcr %0
https://stackoverflow.com/questions/10996418/efficient-integer-compare-function

But it involve a conditional jump that will be slower on branch prediction. Moreover I see no way to make the jit compile something like this.

@thinkbeforecoding
Copy link
Contributor Author

I was also looking at movcc (conditional move) instructions, but the source is a register or memory, it cannot be a constant. So it requires extra instructions to load constants and won't we shorter.
That could be interesting to implement min and max for instance.

@thinkbeforecoding
Copy link
Contributor Author

After a check, the requirement on IComparable.CompareTo is:

  • when x > y, result is > 0
  • when x < y, result is < 0
  • when x=y result is 0

for shorter types (byte, short, char) it is implemented as:

int x - int y

This is valid since there is no risk of overflow. But the result is not necessarily 0,1,-1

For longer types, it uses conditional jumps and returns always 0,1,-1

Do we have that constraint on the compare function ?

@thinkbeforecoding
Copy link
Contributor Author

Since compare fallbacks to ICompare.Compare for other types, there is no guarantee that the output is always 0,1,-1.

https://docs.microsoft.com/en-us/dotnet/api/system.collections.icomparer.compare?view=net-6.0#system-collections-icomparer-compare(system-object-system-object)

@thinkbeforecoding
Copy link
Contributor Author

This would not be a breaking change at the specification level, but it would be a breaking change at the implementation level...

code like:
if compare x y = 1 then ... would be broken.

Same thing for code like:
ls |> List.sum (fun (x,y) -> compare x y)

But such code is not following the specification of compare.

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
@@ -0,0 +1,381 @@
namespace FSharp.Compiler.UnitTests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine in this PR, but for future, we suggest creating new tests in FSharp.Compiler.ComponentTests, it has nicer APIs and better support for IL baseline tests (instead of having an inline IL, it can be a separate file, which does not require recompiling the suite in case of change, and it is easier to re-generate using the env variable).

@thinkbeforecoding
Copy link
Contributor Author

There are a few tests that compare generated IL to expected IL that fail due to new code generation.

How do you usually regenerate expected IL ?

@vzarytovskii
Copy link
Member

vzarytovskii commented May 25, 2022

There are a few tests that compare generated IL to expected IL that fail due to new code generation.

How do you usually regenerate expected IL ?

So, if they are inline (i.e. the IL itself is the string in the test case itself, then, unfortunately, only one-by-one).
If they're verified as part of DirectoryAttribute-style testing, then, an ENV variable can be set, and then run tests locally:|

// on Windows
set TEST_UPDATE_BSL=1
// or, on macOS/Linux
export TEST_UPDATE_BSL=1
// and then
build.cmd -testCoreClr
// or
build.sh --testcoreclr

(https://github.com/dotnet/fsharp/blob/main/DEVGUIDE.md#updating-baselines-in-tests)

@thinkbeforecoding
Copy link
Contributor Author

There are a lot of culture dependent tests that fail on my french machine . Fixed it.
There was also a lot of over specific autogenerated tests for compare. They were checking for the exact response of compare... The check must be done on the sign, not the exact value. Fixed it.

@thinkbeforecoding
Copy link
Contributor Author

Some of the builds fail but it doesn't seem related to the change.. Any idea?

@vzarytovskii
Copy link
Member

Some of the builds fail but it doesn't seem related to the change.. Any idea?

The latest run failed because of the tests:
https://dev.azure.com/dnceng/public/_build/results?buildId=1791081&view=ms.vss-test-web.build-test-results-tab&runId=47885210&resultId=100293&paneView=debug

@thinkbeforecoding
Copy link
Contributor Author

These are baseline tests, but they don't seem to be updated with the instructions above...

@vzarytovskii
Copy link
Member

I've updated them + updated the guide, If I may ask, which shell were you using?

@vzarytovskii
Copy link
Member

Great, now it's even more failures...I will take a look.

@thinkbeforecoding
Copy link
Contributor Author

I was using powershell so I used
$env:TEST_UPDATE_BSL=1

@vzarytovskii
Copy link
Member

I was using powershell so I used
$env:TEST_UPDATE_BSL=1

Hm, interesting. I will probably add a separate switch to the build scripts which runs tests and updates baselines.

@vzarytovskii
Copy link
Member

@thinkbeforecoding I have updated 2 remaining baselines for net472 framework + updated devguide describing how it works.
To update them successfully, It requires some knowledge of how IL baselines work. I will add a separate command to build scripts for updating those (#13204)

The rest of the tests which are failing are checking the compare result.

@vzarytovskii vzarytovskii reopened this Jun 7, 2022
@thinkbeforecoding
Copy link
Contributor Author

When I used the command it said:
Commenter does not have sufficient privileges for PR 13187 in repo dotnet/fsharp

@thinkbeforecoding
Copy link
Contributor Author

There are still some cancelled builds... strange

@thinkbeforecoding
Copy link
Contributor Author

These canceled builds again.

Copy link
Member

@KevinRansom KevinRansom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good. Thanks

@vzarytovskii
Copy link
Member

These canceled builds again.

Yeah, some infra hiccups, I suppose.

@thinkbeforecoding
Copy link
Contributor Author

@dsyme , I think this just needs your review, but should be good.

@KevinRansom
Copy link
Member

@dsyme, this is good to go, do you want to review your change request.

@dsyme dsyme merged commit a65ace7 into dotnet:main Jun 14, 2022
@dsyme
Copy link
Contributor

dsyme commented Jun 14, 2022

@thinkbeforecoding Thank you so much for this improvement!

@forki
Copy link
Contributor

forki commented Jun 14, 2022

Wohooo! It landed. Awesome work!

@EgorBo
Copy link
Member

EgorBo commented Feb 18, 2024

NOTE: as of .NET 7.0 (or was it 8.0), JIT is expected to use branchless instructions even when F#/C# emits branches (recognizes cmov like idioms)

@thinkbeforecoding
Copy link
Contributor Author

Yes I've noted this in some tests in fasmi. I think it's in dotnet 8.0

@EgorBo
Copy link
Member

EgorBo commented Feb 18, 2024

Sure, I just wanted to note that it's better when some sub-optimal codegen is filed against JIT rather than silently (for JIT team) fixed 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perf improvement for primitive comparison
6 participants