Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace (val / 2) with (val * 0.5) in Jit #24584

Open
wants to merge 31 commits into
base: master
from

Conversation

@EgorBo
Copy link
Contributor

commented May 15, 2019

vmulss/vmulsd has better both latency and throughput than vdivss/vdivsd at least for the hardware I have.
So if a divisor is a constant power of two we can optimize it, e.g.:

a = b / 2;
becomes:
a = b * 0.5;

See https://godbolt.org/z/rz9h4E (clang, gcc, msvc, x86, AMD64, AArch64 - everywhere this optimization is applied. Btw, LLVM also helps Mono to optimize this case for C#)

I wrote a small benchmark:

[Benchmark]
[Arguments(MathF.PI)]
public float Div_old(float value)
{
    for (int i = 0; i < 10; i++)
        value = value / 2f;
    return value;
}

[Benchmark(Baseline = true)]
[Arguments(MathF.PI)]
public float Div_new(float value)
{
    for (int i = 0; i < 10; i++)
        value = value * 0.5f;
    return value;
}

and the results are (Haswell):

BenchmarkDotNet=v0.11.5, OS=macOS Mojave 10.14 (18A391) [Darwin 18.0.0]
Intel Core i7-4980HQ CPU 2.80GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview5-011568
  [Host]   : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT
  ShortRun : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT

Job=ShortRun  IterationCount=3  LaunchCount=1
WarmupCount=3

|  Method |     value |      Mean |     Error |    StdDev | Ratio | RatioSD |
|-------- |---------- |----------:|----------:|----------:|------:|--------:|
| Div_old | 3.1415927 | 16.848 ns | 1.6733 ns | 0.0917 ns |  3.14 |    0.02 |
| Div_new | 3.1415927 |  5.359 ns | 0.5572 ns | 0.0305 ns |  1.00 |    0.00 |

UPD the difference is only 20-30% on Coffee Lake (i7 8700K). Didn't check ARM for it.

PS: Also I noticed that x / 1 is not optimized to just x see sharplab.io
(I found this in morph.cpp but it seems it's not used for floats)

/cc: @tannergooding

EgorBo added some commits May 15, 2019

src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved

EgorBo added some commits May 15, 2019

src/jit/utils.h Outdated Show resolved Hide resolved
src/jit/utils.h Outdated Show resolved Hide resolved
src/jit/utils.h Outdated Show resolved Hide resolved
@mikedn

This comment has been minimized.

Copy link
Contributor

commented May 19, 2019

FWIW I tried to do this or a similar FP optimization in the past but didn't bother with it because it wasn't very clear how useful it is, the framework doesn't have a lot of FP code. But now we have WPF in .NET Core - its PresentationCore.dll and PresentationFramework.dll contain a bit more than 100 hits. So it's quite useful, while many developers do this optimization manually there are still enough opportunities.

I haven't noticed any regressions. In theory this can block CSE because VN doesn't understand that x * 0.5 == x / 2.0. That can be fixed later if necessary, VN has other similar issues anyway.

EgorBo added some commits May 22, 2019

src/jit/utils.cpp Outdated Show resolved Hide resolved
@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

@mikedn @tannergooding sorry for the delayed response. I've refactored it to frexp (I need to double check if it's 100% portable) based on your HasInverse
the float / const pattern can be found in game engines too, already found several places.

@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

Also, I did some testing locally, e.g. https://gist.github.com/EgorBo/866a49334291c1ac3b108eb9341681ae (similar for double and for non-power-of-two constants)

src/jit/morph.cpp Outdated Show resolved Hide resolved
src/pal/src/cruntime/math.cpp Outdated Show resolved Hide resolved

EgorBo added some commits May 23, 2019

@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

@mikedn @tannergooding could you please take a look one more time? I think I handled all the cases. Also added a test.
isNormal implementation was copied from the managed double.IsNormal() impl.

Also: should I take care about Big Endian?

@jkotas jkotas added the area-CodeGen label May 23, 2019

@EgorBo EgorBo force-pushed the EgorBo:opt-div-pow2 branch from bbf7ee6 to 877c97a Jun 18, 2019

EgorBo added some commits Jun 19, 2019

@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

@tannergooding @mikedn @danmosemsft so does it look good? the CI failures seem unrelated

src/jit/utils.cpp Outdated Show resolved Hide resolved
src/jit/utils.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/utils.cpp Show resolved Hide resolved
tests/src/JIT/opt/InstructionCombining/DivToMul.cs Outdated Show resolved Hide resolved
@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

A small Roslyn-based script to find places where this optimization can be applied (found some in various math/graphics related C# repositories): https://gist.github.com/EgorBo/74b034fe1936c43fcd0b42934322557c
Also, there are places in CoreFX where * 0.5 is used (I guess authors knew it's better than / 2.0 😄) e.g.: https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Numerics/src/System/Numerics/Complex.cs#L418

@mikedn

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

A small Roslyn-based script to find places where this optimization can be applied (found some in various math/graphics related C# repositories):

Erm, why do you need such a contraption? Perhaps you're not aware how run JIT diffs?

@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

@mikedn I wanted to quickly find such places without even downloading repositories (via HttpClient) 🙂. Also I made a list of patterns that LLVM is able to optimize (InstCombine transforms) and looking for them in those repos.

@mikedn

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

I wanted to quickly find such places without even downloading repositories

That's not going to find the interesting case, where expressions like x / 2.0 appear as a result of other JIT optimizations (inlining, constant folding & propagation etc.). Cases where x / 2.0 literally appears in the code aren't very interesting, developers can easily change that to x * 0.5 if they wish so. Adding a zillion of such expression transformations to the JIT might not be the best thing to do, especially in the current JIT state (morph is really a nightmare).

Anyway, here's a x64 FX diff:

Summary:
(Lower is better)
Total bytes of diff: 0 (0.00% of base)
0 total files with size differences (0 improved, 0 regressed), 129 unchanged.
0 total methods with size differences (0 improved, 0 regressed), 146772 unchanged.
3 files had text diffs but not size diffs.
Microsoft.Diagnostics.Tracing.TraceEvent.dasm had 12 diffs
System.Private.CoreLib.dasm had 8 diffs
Microsoft.DotNet.Cli.Utils.dasm had 4 diffs
Completed analysis in 28.14s

As I mentioned in a previous post, it's a bit of a strange case because divss and mulss instructions have the same size so nothing shows up in terms of size differences. Text diff shows that there are only 6 diffs in the entire framework. Not surprising as there's not a lot of floating point code in corelib and corefx.

@EgorBo

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

@mikedn yeah, but as you mentioned earlier there are cases in dotnet/wpf repo:

ReachFramework\AlphaFlattener\BrushProxy.cs:
        (-B + root) / A / 2

ReachFramework\AlphaFlattener\BrushProxy.cs:
        (-B - root) / A / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (rRadSquared + rDot) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (xEnd - xStart) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (yEnd - yStart) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (xEnd + xStart) / 2

ReachFramework\Serialization\DrawingContextFlattener.cs:
        (yEnd + yStart) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        - (dstwidth - width) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        (dstwidth - width) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        - (dstheight - height) / 2

ReachFramework\Serialization\VisualSerializer.cs:
        (dstheight - height) / 2

PresentationCore\MS\internal\Ink\StrokeRenderer.cs:
        Math.PI * 3.0 / 2.0

PresentationCore\MS\internal\Shaping\CompositeTypefaceMetrics.cs:
        (-UnderlineOffsetDefaultInEm) / 2

PresentationCore\System\Windows\Media\CompositionTarget.cs:
        Double.MinValue / 2.0

PresentationCore\System\Windows\Media\CompositionTarget.cs:
        Double.MinValue / 2.0

PresentationCore\System\Windows\Media\GlyphRun.cs:
        (firstStopAdvance + currentAdvance) / 2.0

PresentationFramework\MS\Internal\documents\DocumentGrid.cs:
        (centerWidth - ExtentWidth) / 2.0

PresentationFramework\MS\Internal\documents\DocumentGrid.cs:
        (ViewportHeight - ExtentHeight) / 2.0

PresentationFramework\MS\Internal\documents\DocumentGrid.cs:
        (ExtentWidth - _lastRowChangeExtentWidth) / 2.0

PresentationFramework\MS\Internal\documents\TextBoxView.cs:
        (width - lineWidth) / 2

PresentationFramework\MS\Internal\Ink\PenCursorManager.cs:
        (width - originalWidth) / 2

PresentationFramework\MS\Internal\Ink\PenCursorManager.cs:
        (height - originalHeight) / 2

PresentationFramework\System\Windows\Controls\Grid.cs:
        (minPower + maxPower) / 2.0

PresentationFramework\System\Windows\Documents\CaretElement.cs:
        double.MaxValue/2

PresentationFramework\System\Windows\Documents\CaretElement.cs:
        double.MaxValue/2

PresentationCore\System\Windows\Media\Animation\KeySpline.cs:
        (_parameter + top) / 2

PresentationCore\System\Windows\Media\Animation\KeySpline.cs:
        (_parameter + bottom) / 2

PresentationCore\System\Windows\Media\Animation\KeySpline.cs:
        (bottom + top) / 2
@mikedn

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

Unfortunately running diffs on wpf assemblies is a bit more tricky at the moment so I haven't done it again. The ~100 hits estimation from my previous post on the matter likely still stands.

EgorBo added some commits Jun 20, 2019

@EgorBo EgorBo closed this Jun 28, 2019

@EgorBo EgorBo reopened this Jun 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.