Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This ports DoubleToNum and supporting code to be a managed implementation. #19999

Merged
merged 11 commits into from Sep 20, 2018

Conversation

@tannergooding
Copy link
Member

commented Sep 17, 2018

This ports the rest of the formatting code to be written in managed code. It is, currently, mostly a naive port of the native code and only has a few minor fix ups to the code.

I locally ran the Roslyn RealParser suite, as well as did a basic benchmark on both float and double, covering 267,386,880 values in the input range (including both denormal and normal inputs).

Benchmarking was done with Tiered Jitting disabled.

For the Roslyn Real Parser Tests

This was mostly to validate that I didn't mess up any code during the port. Their own parsing code takes up the majority in each, but overall has a 1.52% regression in elapsed time.

Native
image

Managed
image

For float.ToString() on 267,386,880 values between float.MinValue and float.MaxValue

Grisu3.DigitGen is the major player in both here. This showed a 9.89% regression in elapsed time. float.ToString() will default to 7 digits.

Native
image

Managed
image

For double.ToString() on 267,386,880 values between double.MinValue and double.MaxValue

Grisu3.DigitGen is again the major time taker here. This showed a 16.43% regression in elapsed time. double.ToString() will default to 15 digits.

Native
image

Managed
image

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

Looking at the perf gaps, some of it is just things that the C++ Compiler gives you for free, that we don't have enabled quite yet (like combined DivMod) or where the native code was using a more optimized algorithm (like memcpy and memset), or where they have intrinsics which we don't have yet (like BitScanReverse).

These should, however, be easy enough to fix up.

For example:

Native
image

Managed
image

_length = (upper == 0) ? 1 : 2;
}

public static uint BitScanReverse(uint mask)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

Native code just calls the intrinsic instead.

private int _length;
private BlocksBuffer _blocks;

public BigInteger(uint value)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

Native code can configure the default constructor to efficiently initialize this, rather than requiring us to specialize case value == 0

_length += (int)(blocksToShift);

// Zero the remaining low blocks
Buffer.ZeroMemory((byte*)(_blocks.GetPointer()), (blocksToShift * sizeof(uint)));

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

This was not being done properly in native code (size was blocksToShift, rather than blocksToShift * sizeof(uint)).

Without Grisu3 enabled as the first code-path, the Roslyn RealParser tests exposed this issue.


public void Multiply(ref DiyFp rhs)
{
ulong lf = _f;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

This is just a 64x64=128 multiply, so using an intrinsic is possible.

decimalExponent = CachedPowerDecimalExponents[index];
}

private static bool DigitGen(ref DiyFp mp, int precision, char* digits, out int length, out int k)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

This is the function that is causing the most perf diff between the managed and native implementations.

We could likely close the gap by caching some field/properties we lookup multiple times (into locals), and by using Math.DivRem (or better, a proper intrinsic), rather than independent divide and remainder operations.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

bcltype/bignum.cpp was ported as part of this, which will make it easier to test porting the roslyn RealParser.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

After a small cleanup to DigitGen, the perf regression (for doubles) is down to 11.18% (from 16.43%)
image

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 17, 2018

Do you propose we commit this as is or work to close the gap further? I suggest the latter?

}

[StructLayout(LayoutKind.Sequential, Size = (MaxBlockCount * sizeof(uint)))]
private struct BlocksBuffer

This comment has been minimized.

Copy link
@jkotas

jkotas Sep 17, 2018

Member

fixed uint _buffer[MaxBlockCount] ?

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

Won't that cause a perf hit due to UnsafeValueTypeAttribute?

This comment has been minimized.

Copy link
@jkotas

jkotas Sep 17, 2018

Member

You can measure it.

You are using unsafe stackallocated buffers, so you do want to pay for the GS cookie checks. I assume that the C/C++ code paid for them too.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

We should move to a C# 7.3 compiler version (we are still on 2.8.0-beta4 right now) so that we can index a fixed-buffer without pinning: https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-7-3#indexing-fixed-fields-does-not-require-pinning

This comment has been minimized.

Copy link
@tannergooding

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 17, 2018

Author Member

This is blocked more generally on: #19878 (comment).

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 17, 2018

Do you propose we commit this as is or work to close the gap further

The regression is relatively small (~10%, depends on how you measure it). I think it is fine to take this into master since this moves us in the right direction, and unblocks other work. (And it is paying back the shortcut that was made during implementation of the faster number float formatting algorithms earlier.)

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

Fixing up Buffer::ZeroMemory (which is currently a naive while loop) brings it down to 64.579s (or a 5.84% regression).
-- This can be a separate PR.

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 17, 2018

I think it is fine to take this into master since this moves us in the right direction

sounds good to me

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 17, 2018

BitScanReverse and BitScanReverse64 are now dead in pal.h

@tannergooding tannergooding force-pushed the tannergooding:doubletonum-managed branch from 77ac424 to 23027f3 Sep 17, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

BitScanReverse and BitScanReverse64 are now dead in pal.h

Removed.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

The current numbers, with everything currently up in this PR (5.68% regression):
image

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

Everything except for moving BigInteger to use fixed-sized buffers has been addressed. Moving to fixed-sized buffers is just pending a BuildTools update to move us to C# 7.3 (so we don't have to scatter fixed statements around the code, or add "hacks" to work around it).

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 17, 2018

@tannergooding what do you plan to do about tests? You found at least one bug we weren't catching. Does it make sense to port the Roslyn ones into CoreFX?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2018

what do you plan to do about tests? You found at least one bug we weren't catching. Does it make sense to port the Roslyn ones into CoreFX?

Until parsing is also fixed, we can't port the Roslyn test-bed directly. I'm working on pulling out some of the tests, however, so that we do have more coverage on important areas.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 19, 2018

@jkotas, are you fine with this being merged and with me logging a bug/following up with a second PR to move to fixed-sized buffers after the build-tools change (bringing in C# 7.3 support) goes in, or would you rather this just wait and be done all at once?

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 19, 2018

I would add a unused fixed field to struct BlocksBuffer so that it gets marked properly as structure containing unchecked buffers.

Otherwise, I am fine with this being merged and logging bug/following up with second PR later.

@tannergooding tannergooding force-pushed the tannergooding:doubletonum-managed branch from b52696e to 9f259bc Sep 19, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 19, 2018

I would add a unused fixed field to struct BlocksBuffer so that it gets marked properly as structure containing unchecked buffers.

Fixed.


private static long ExtractFractionAndBiasedExponent(double value, out int exponent)
{
long fraction = GetMantissa(value);

This comment has been minimized.

Copy link
@jkotas

jkotas Sep 19, 2018

Member

You are likely losing some cycles by converting double to long twice in a row.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 19, 2018

Author Member

Fixed.
-- I wish we had the ability to specify a function was const so that the JIT could just do the right/expected thing here.

@tannergooding tannergooding force-pushed the tannergooding:doubletonum-managed branch from 9f259bc to b5e3986 Sep 19, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 20, 2018

@jkotas, any other feedback you would like me to address? I think I've gotten it all at this point.

_e = e;
}

public ulong f

This comment has been minimized.

Copy link
@jkotas

jkotas Sep 20, 2018

Member

Nit: There is no value in these being properties. Just makes the JIT to work harder.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 20, 2018

Author Member

I'll track this is part of the fixed buffer cleanup after the build-tools update.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 20, 2018

Author Member

Logged #20077

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 20, 2018

Do the perf numbers still look good with all the feedback incorporated?

@jkotas

jkotas approved these changes Sep 20, 2018

Copy link
Member

left a comment

LGTM

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 20, 2018

Do the perf numbers still look good with all the feedback incorporated?

Yes. The current run shows the best numbers so far (only a 3.93% regression from the original 61.02s):

  • Noting that there is some noise here. I have also seen it take up to 65.83s, which is a 7.89% regression
  • I plan on looking at benchview after this pumps back to CoreFX as well, for another comparison point

image

@tannergooding tannergooding merged commit 97e8b44 into dotnet:master Sep 20, 2018

26 of 28 checks passed

Ubuntu arm Cross Checked Innerloop Build and Test Started.
Details
Ubuntu arm Cross Checked no_tiered_compilation_innerloop Build and Test Started.
Details
CROSS Check Build finished.
Details
CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
Linux-musl x64 Debug Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Checked CoreFX Tests Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
WIP ready for review
Details
Windows_NT arm Cross Debug Innerloop Build Build finished.
Details
Windows_NT arm64 Cross Debug Innerloop Build Build finished.
Details
Windows_NT x64 Checked CoreFX Tests Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release CoreFX Tests Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
license/cla All CLA requirements met.
Details
@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 20, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 20, 2018

CoreCLR/CoreFX convention is to merge using "Squash and Merge". https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/contributing.md#merging-pull-requests-for-contributors-with-write-access (next time...)

Can we set that to be the default then? We have multiple repos across dotnet and many of them do not agree on this behavior. As such, I normally just use whatever the default merge option is configured for (which is currently Rebase and Merge)

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 20, 2018

Where do you set it as default?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 20, 2018

It should be under Settings, it looks like they've changed it around since the last time I looked:
I believe https://help.github.com/articles/configuring-commit-squashing-for-pull-requests/ has more details

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 20, 2018

Right, that let's you choose what is allowed, not what should be the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.