Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some cleanup of the System.Number class #20619

Merged
merged 10 commits into from Oct 29, 2018

Conversation

@tannergooding
Copy link
Member

commented Oct 25, 2018

This is the first part of dotnet/corefx#33053, and does some initial cleanup of the System.Number class:

  • Resolve the general formatting errors that were showing up in Number.Formatting and Number.Parsing
  • Removing duplicated code from the Parse/TryParse calls (doing the same for Format/TryFormat is not as easy)
  • Some minor renames/relocations
  • Updating NumberBuffer to take custom-sized digit buffers
}

private static void FormatCurrency(ref ValueStringBuilder sb, ref NumberBuffer number, int nMaxDigits, NumberFormatInfo info)
private static unsafe void FormatCurrency(ref ValueStringBuilder sb, bool sign, int scale, char* dig, int nMaxDigits, NumberFormatInfo info)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 25, 2018

Author Member

The Format* functions previously took a ref NumberBuffer; they were changed to take the sign, scale, and digit pointer individually so they could be shared between the IntegerBuffer and NumberBuffer implementations.

}
}

internal static unsafe void IntegerBufferToString(ref ValueStringBuilder sb, ref IntegerBuffer integer, char format, int nMaxDigits, NumberFormatInfo info)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 25, 2018

Author Member

The ref NumberBuffer number parameters for the integer case where changed to ref IntegerBuffer integer

@@ -12,6 +12,11 @@ internal static partial class Number
{
private static unsafe void Dragon4(double value, int precision, ref NumberBuffer number)
{
const double Log10V2 = 0.30102999566398119521373889472449;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 25, 2018

Author Member

These were moved out of the NumberBuffer.cs file because they are only used by Dragon4

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2018

CC. @danmosemsft, @eerhardt, @jkotas.

I explicitly called out the few cases where it wasn't just copy/paste. I made this PR separate since it is just the initial refactoring and will make the actual implementation PR easier to review.

// * 19 for int64
// * 20 for uint64
// * 39 for int128/uint128
internal const int MaxDigits = 50;

This comment has been minimized.

Copy link
@stephentoub

stephentoub Oct 25, 2018

Member

Based on the comment, will a subsequent PR lower the value of MaxDigits to 39?

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 25, 2018

Author Member

This one can be fixed in a follow-up PR, but is not required for correctness.

// * 113 for Single
// * 768 for Double
// * 11563 for Quad
internal const int MaxDigits = 50;

This comment has been minimized.

Copy link
@stephentoub

stephentoub Oct 25, 2018

Member

Similarly, the comment makes this sound wrong. I assume it'll be fixed subsequently as part of all of this work?

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 25, 2018

Author Member

Yes, this will be fixed in a follow-up PR.

@jkotas

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

+4,879 −3,775

The PR is adding a net new 1000+ lines. Where is it coming from? Is it a duplicated 1000 lines? I was expecting much smaller duplication out from this.

@tannergooding tannergooding force-pushed the tannergooding:parse-format branch from 76579d8 to 6a52e21 Oct 26, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 26, 2018

Split into 3 commits to make this easier to review:

  • The first just formats the document (to get rid of the noise from copying/pasting the code between files)
  • The second contains the refactoring (splitting NumberBuffer into IntegerBuffer and FloatingPointBuffer)
  • The third splits the code into the respective files
@@ -1460,9 +1457,471 @@ internal static unsafe char ParseFormatSpecifier(ReadOnlySpan<char> format, out
'\0';
}

internal static unsafe void NumberToString(ref ValueStringBuilder sb, ref NumberBuffer number, char format, int nMaxDigits, NumberFormatInfo info)
internal static unsafe void IntegerBufferToString(ref ValueStringBuilder sb, ref IntegerBuffer integer, char format, int nMaxDigits, NumberFormatInfo info)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 26, 2018

Author Member

@jkotas, the bulk of the "new lines" comes from duplicating BufferToString and BufferToStringFormat.

I'll think about this some more and see if we can reduce this duplication.

return false;
}

private static unsafe bool ParseNumber(ref char* str, char* strEnd, NumberStyles styles, ref FloatingPointBuffer floatingPoint, NumberFormatInfo info, bool parseDecimal)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 26, 2018

Author Member

This is the other big "duplicated" code, but it will likely need to change a bit between the two implementations. I'll also give it some more thought, to see if we can share some of the code here.

@jkotas

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

It may be interesting to look at storing pointer to the buffer in the structure - it is an extra indirection in a few places, but it should not really show up on the radar. Or just use the big buffer everywhere.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 26, 2018

It may be interesting to look at storing pointer to the buffer in the structure - it is an extra indirection in a few places, but it should not really show up on the radar. Or just use the big buffer everywhere.

Right. I was considering just using the big buffer everywhere for float/double. It shouldn't be expensive with the .locals init stripping we do and is what the MSVCRT does already.

}
}

internal static unsafe void IntegerBufferToString(ref ValueStringBuilder sb, ref IntegerBuffer integer, char format, int nMaxDigits, NumberFormatInfo info)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 26, 2018

Author Member

As commented here, the majority of the code duplication comes from NumberBufferToString, NumberBufferToStringFormat, and ParseNumber.

After looking more closely at the methods, I don't believe there is much that can be done with code sharing. These methods depend on RoundIntegerBuffer and RoundFloatingPointBuffer on each code path which will end up dealing with the buffers in very different ways. The former thinks in terms of value, scale and uses "normal" rounding logic; while the latter needs to think in terms of mantissa, exponent, and uses IEEE-compliant rounding logic (which will ultimately make the end formatting/parsing logic differ between these as well).

However, I think there is a bit of duplication we can remove in other parts of these code files. For example, we have quite a bit of duplication between several of the X and TryX code paths, namely due to the X code path needing to decide if it wants to throw an OverflowException or FormatException (which could be handled by returning an enum or a secondary bool).

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 26, 2018

Author Member

Hmmm, actually, we might be able to do it by keeping a single buffer:

ref struct NumberBuffer
{
    public int precision;
    public int scale;
    public int exponent; // New field, only used by floating-point buffers
    public bool sign;
    public NumberBufferKind kind;
    public Span<char> digits; // Allocated at creation so it is sized as needed
}

The rounding only happens in a few places, at the end of parsing and isn't in the hot-path, so an additional check on kind shouldn't hurt. We only need scale for integer types, but we can make use of both scale and exponent for floating-point (the former can still be used to help with the fast path check).

@tannergooding tannergooding force-pushed the tannergooding:parse-format branch from 6a52e21 to f84081a Oct 26, 2018

@tannergooding tannergooding changed the title Splitting the Number.Parsing and Number.Formatting code into "Number" and "Integer" Some cleanup of the System.Number class Oct 26, 2018

@tannergooding tannergooding force-pushed the tannergooding:parse-format branch from f84081a to 86f9ff3 Oct 26, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 26, 2018

@jkotas, @stephentoub. This has changed a bit.

We are back to having a single NumberBuffer type, but it now takes a custom-sized digit buffer to handle the differences between integer and floating-point values.

The NumberBuffer type will need to carry an additional exponent field that will be unused for Integer parsing/formatting, but I don't believe that will be a problem.

@stephentoub

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

Could you do some perf testing to ensure that some of these refactorings don't impact things like int.TryParse?

// DriftFactor = 1 - Log10V2 - epsilon (a small number account for drift of floating point multiplication)
private const double DriftFactor = 0.69;
// We need 1 additional byte, per length, for the terminating null
private const int DecimalNumberBufferLength = 50 + 1;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 27, 2018

Author Member

I kept DecimalNumberBufferLength at the current value (50 + 1). However, I believe we can make it 30 (29 digits + 1 for rounding).

This comment has been minimized.

Copy link
@pentp

pentp Oct 29, 2018

Collaborator

If the rounding digit is 5 then up to 20 additional digits are checked in NumberBufferToDecimal (so 50 in total). The number 20 looks arbitrary, theoretically it would be correct to check the entire string...

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 29, 2018

Author Member

@pentp, maybe I am misremembering, but I thought decimal only supported 29 digits max. That is, it has 96-bits for the mantissa and it uses 8-bits for the exponent, which is strictly restricted to a value between 0 and 28, inclusive.

This should mean that you need to consider 29 digits, plus 1 for rounding. Anything more than that shouldn't be impactful.

This comment has been minimized.

Copy link
@pentp

pentp Oct 29, 2018

Collaborator

It supports only 29 digits, but if the rounding digit is 5 and the number is even, then it checks the following digits to decide if it should round up or down.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 30, 2018

Author Member

@pentp, So you still only need 30 digits, but you also need a bool that indicates whether or not there was any trailing non-zero digits in the rest of the input string (same as double and float), correct?

This comment has been minimized.

Copy link
@pentp

pentp Oct 30, 2018

Collaborator

Yes, that's correct.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 28, 2018

Here are the perf results from my workbox:

Numbers for the Integer tests generally look good some are marginally faster/slower, but all looked to be within the tolerance ranges (running a few times, and looking at benchview, some of the tests aren't super stable). The floating-point tests are similar, but most tend to look slightly on the slower side for the parsing (which all went through TryParse); the formatting all tended to be faster.

@stephentoub

This comment has been minimized.

Copy link
Member

commented Oct 29, 2018

Thank you for checking.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 29, 2018

@jkotas, @stephentoub. Any other feedback here?

@jkotas

jkotas approved these changes Oct 29, 2018

Copy link
Member

left a comment

Thanks

@tannergooding tannergooding merged commit aef0bc2 into dotnet:master Oct 29, 2018

31 checks passed

CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
Linux-musl x64 Debug Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked crossgen_comparison Build and Test Build finished.
Details
Ubuntu arm Cross Checked no_tiered_compilation_innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Release crossgen_comparison Build and Test Build finished.
Details
Ubuntu x64 Checked CoreFX Tests Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
Ubuntu16.04 arm64 Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu16.04 arm64 Cross Checked no_tiered_compilation_innerloop Build and Test Build finished.
Details
WIP Ready for review
Details
Windows_NT arm Cross Debug Innerloop Build Build finished.
Details
Windows_NT arm64 Cross Debug Innerloop Build Build finished.
Details
Windows_NT x64 Checked CoreFX Tests Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release CoreFX Tests Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
license/cla All CLA requirements met.
Details

dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corefx that referenced this pull request Oct 29, 2018

Some cleanup of the System.Number class (dotnet/coreclr#20619)
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corert that referenced this pull request Oct 29, 2018

Some cleanup of the System.Number class (dotnet/coreclr#20619)
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

tannergooding added a commit to dotnet/corefx that referenced this pull request Oct 29, 2018

Some cleanup of the System.Number class (dotnet/coreclr#20619) (#33113)
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

jkotas added a commit to dotnet/corert that referenced this pull request Oct 29, 2018

Some cleanup of the System.Number class (dotnet/coreclr#20619)
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

kouvel added a commit to kouvel/coreclr that referenced this pull request Nov 3, 2018

Some cleanup of the System.Number class (dotnet#20619)
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

A-And added a commit to A-And/coreclr that referenced this pull request Nov 20, 2018

Some cleanup of the System.Number class (dotnet#20619)
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.