Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Some cleanup of the System.Number class #20619

Merged
merged 10 commits into from
Oct 29, 2018
Merged

Some cleanup of the System.Number class #20619

merged 10 commits into from
Oct 29, 2018

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Oct 25, 2018

This is the first part of https://github.com/dotnet/corefx/issues/33053, and does some initial cleanup of the System.Number class:

  • Resolve the general formatting errors that were showing up in Number.Formatting and Number.Parsing
  • Removing duplicated code from the Parse/TryParse calls (doing the same for Format/TryFormat is not as easy)
  • Some minor renames/relocations
  • Updating NumberBuffer to take custom-sized digit buffers

}

private static void FormatCurrency(ref ValueStringBuilder sb, ref NumberBuffer number, int nMaxDigits, NumberFormatInfo info)
private static unsafe void FormatCurrency(ref ValueStringBuilder sb, bool sign, int scale, char* dig, int nMaxDigits, NumberFormatInfo info)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Format* functions previously took a ref NumberBuffer; they were changed to take the sign, scale, and digit pointer individually so they could be shared between the IntegerBuffer and NumberBuffer implementations.

}
}

internal static unsafe void IntegerBufferToString(ref ValueStringBuilder sb, ref IntegerBuffer integer, char format, int nMaxDigits, NumberFormatInfo info)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ref NumberBuffer number parameters for the integer case where changed to ref IntegerBuffer integer

@@ -12,6 +12,11 @@ internal static partial class Number
{
private static unsafe void Dragon4(double value, int precision, ref NumberBuffer number)
{
const double Log10V2 = 0.30102999566398119521373889472449;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were moved out of the NumberBuffer.cs file because they are only used by Dragon4

@tannergooding
Copy link
Member Author

CC. @danmosemsft, @eerhardt, @jkotas.

I explicitly called out the few cases where it wasn't just copy/paste. I made this PR separate since it is just the initial refactoring and will make the actual implementation PR easier to review.

// * 19 for int64
// * 20 for uint64
// * 39 for int128/uint128
internal const int MaxDigits = 50;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment, will a subsequent PR lower the value of MaxDigits to 39?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can be fixed in a follow-up PR, but is not required for correctness.

// * 113 for Single
// * 768 for Double
// * 11563 for Quad
internal const int MaxDigits = 50;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the comment makes this sound wrong. I assume it'll be fixed subsequently as part of all of this work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be fixed in a follow-up PR.

@jkotas
Copy link
Member

jkotas commented Oct 25, 2018

+4,879 −3,775

The PR is adding a net new 1000+ lines. Where is it coming from? Is it a duplicated 1000 lines? I was expecting much smaller duplication out from this.

@tannergooding
Copy link
Member Author

Split into 3 commits to make this easier to review:

  • The first just formats the document (to get rid of the noise from copying/pasting the code between files)
  • The second contains the refactoring (splitting NumberBuffer into IntegerBuffer and FloatingPointBuffer)
  • The third splits the code into the respective files

@@ -1460,9 +1457,471 @@ internal static unsafe char ParseFormatSpecifier(ReadOnlySpan<char> format, out
'\0';
}

internal static unsafe void NumberToString(ref ValueStringBuilder sb, ref NumberBuffer number, char format, int nMaxDigits, NumberFormatInfo info)
internal static unsafe void IntegerBufferToString(ref ValueStringBuilder sb, ref IntegerBuffer integer, char format, int nMaxDigits, NumberFormatInfo info)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas, the bulk of the "new lines" comes from duplicating BufferToString and BufferToStringFormat.

I'll think about this some more and see if we can reduce this duplication.

return false;
}

private static unsafe bool ParseNumber(ref char* str, char* strEnd, NumberStyles styles, ref FloatingPointBuffer floatingPoint, NumberFormatInfo info, bool parseDecimal)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the other big "duplicated" code, but it will likely need to change a bit between the two implementations. I'll also give it some more thought, to see if we can share some of the code here.

@jkotas
Copy link
Member

jkotas commented Oct 26, 2018

It may be interesting to look at storing pointer to the buffer in the structure - it is an extra indirection in a few places, but it should not really show up on the radar. Or just use the big buffer everywhere.

@tannergooding
Copy link
Member Author

It may be interesting to look at storing pointer to the buffer in the structure - it is an extra indirection in a few places, but it should not really show up on the radar. Or just use the big buffer everywhere.

Right. I was considering just using the big buffer everywhere for float/double. It shouldn't be expensive with the .locals init stripping we do and is what the MSVCRT does already.

}
}

internal static unsafe void IntegerBufferToString(ref ValueStringBuilder sb, ref IntegerBuffer integer, char format, int nMaxDigits, NumberFormatInfo info)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented here, the majority of the code duplication comes from NumberBufferToString, NumberBufferToStringFormat, and ParseNumber.

After looking more closely at the methods, I don't believe there is much that can be done with code sharing. These methods depend on RoundIntegerBuffer and RoundFloatingPointBuffer on each code path which will end up dealing with the buffers in very different ways. The former thinks in terms of value, scale and uses "normal" rounding logic; while the latter needs to think in terms of mantissa, exponent, and uses IEEE-compliant rounding logic (which will ultimately make the end formatting/parsing logic differ between these as well).

However, I think there is a bit of duplication we can remove in other parts of these code files. For example, we have quite a bit of duplication between several of the X and TryX code paths, namely due to the X code path needing to decide if it wants to throw an OverflowException or FormatException (which could be handled by returning an enum or a secondary bool).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, actually, we might be able to do it by keeping a single buffer:

ref struct NumberBuffer
{
    public int precision;
    public int scale;
    public int exponent; // New field, only used by floating-point buffers
    public bool sign;
    public NumberBufferKind kind;
    public Span<char> digits; // Allocated at creation so it is sized as needed
}

The rounding only happens in a few places, at the end of parsing and isn't in the hot-path, so an additional check on kind shouldn't hurt. We only need scale for integer types, but we can make use of both scale and exponent for floating-point (the former can still be used to help with the fast path check).

@tannergooding tannergooding changed the title Splitting the Number.Parsing and Number.Formatting code into "Number" and "Integer" Some cleanup of the System.Number class Oct 26, 2018
@tannergooding
Copy link
Member Author

@jkotas, @stephentoub. This has changed a bit.

We are back to having a single NumberBuffer type, but it now takes a custom-sized digit buffer to handle the differences between integer and floating-point values.

The NumberBuffer type will need to carry an additional exponent field that will be unused for Integer parsing/formatting, but I don't believe that will be a problem.

@stephentoub
Copy link
Member

Could you do some perf testing to ensure that some of these refactorings don't impact things like int.TryParse?

// DriftFactor = 1 - Log10V2 - epsilon (a small number account for drift of floating point multiplication)
private const double DriftFactor = 0.69;
// We need 1 additional byte, per length, for the terminating null
private const int DecimalNumberBufferLength = 50 + 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept DecimalNumberBufferLength at the current value (50 + 1). However, I believe we can make it 30 (29 digits + 1 for rounding).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rounding digit is 5 then up to 20 additional digits are checked in NumberBufferToDecimal (so 50 in total). The number 20 looks arbitrary, theoretically it would be correct to check the entire string...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentp, maybe I am misremembering, but I thought decimal only supported 29 digits max. That is, it has 96-bits for the mantissa and it uses 8-bits for the exponent, which is strictly restricted to a value between 0 and 28, inclusive.

This should mean that you need to consider 29 digits, plus 1 for rounding. Anything more than that shouldn't be impactful.

Copy link

@pentp pentp Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports only 29 digits, but if the rounding digit is 5 and the number is even, then it checks the following digits to decide if it should round up or down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentp, So you still only need 30 digits, but you also need a bool that indicates whether or not there was any trailing non-zero digits in the rest of the input string (same as double and float), correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

@tannergooding
Copy link
Member Author

tannergooding commented Oct 28, 2018

Here are the perf results from my workbox:

Numbers for the Integer tests generally look good some are marginally faster/slower, but all looked to be within the tolerance ranges (running a few times, and looking at benchview, some of the tests aren't super stable). The floating-point tests are similar, but most tend to look slightly on the slower side for the parsing (which all went through TryParse); the formatting all tended to be faster.

@stephentoub
Copy link
Member

Thank you for checking.

@tannergooding
Copy link
Member Author

@jkotas, @stephentoub. Any other feedback here?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@tannergooding tannergooding merged commit aef0bc2 into dotnet:master Oct 29, 2018
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corefx that referenced this pull request Oct 29, 2018
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corert that referenced this pull request Oct 29, 2018
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
tannergooding pushed a commit to dotnet/corefx that referenced this pull request Oct 29, 2018
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request Oct 29, 2018
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
kouvel pushed a commit to kouvel/coreclr that referenced this pull request Nov 3, 2018
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success
A-And pushed a commit to A-And/coreclr that referenced this pull request Nov 20, 2018
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Formatting Number.Formatting.cs and Number.Parsing.cs

* Removing some duplicated parsing code by having the Parse method call TryParse

* Moving two constants from NumberBuffer to Dragon4

* Rename FloatPrecision to SinglePrecision

* Updating the casing of the NumberBuffer fields

* Updating NumberBuffer to allow taking a custom-sized digit buffer.

* Updating the various NumberBufferLength constants to be the exact needed lengths

* Fixing DoubleNumberBufferLength and SingleNumberBufferLength to account for the rounding digit.

* Fixing TryParseNumber to use the correct maxDigCount

* Ensure the TryParseSingle out result is assigned on success


Commit migrated from dotnet/coreclr@aef0bc2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants