Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Grisu3 algorithm support for double.ToString(). #14646

Merged
merged 6 commits into from Jan 29, 2018

Conversation

@mazong1123
Copy link
Collaborator

commented Oct 21, 2017

  • Implemented Grisu3 algorithm.
  • When calling double.ToString(), try Grisu3 first, if it fails, fall back to Dragon4.

Fix #14478

Added Grisu3 algorithm support for double.ToString().
- Implemented Grisu3 algorithm.
- When calling double.ToString(), try Grisu3 first, if it fails, fall back to Dragon4.

Fix #14478
@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 22, 2017

Overview:

Grisu3 is much faster (some numbers become almost 9 times faster, for instance, -1.79769313486232E+308's runtime change: 237.492 -> 28.660 ) than Dragon4. But when a number fails in Grisu3, we have to switch to Dragon4, obviously it is slower than the sole Dragon4 implementation in this case. For instance, comparing the test result of perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 250, innerIterations: 2000000). Because 250 in 17 digits precision fails in Grisu3, it becomes much slower in the new implementation.

Following are the benchmark:

Average runtime comparison:

Test Name Metric Before After
perftest.DoubleToStringTest.Decimal_ToString Duration 762.895 734.575
perftest.DoubleToStringTest.DefaultToString(number: -1.79769313486232E+308, innerIterations: 100000) Duration 237.492 28.660
perftest.DoubleToStringTest.DefaultToString(number: -8.98846567431158E+307, innerIterations: 100000) Duration 227.782 29.921
perftest.DoubleToStringTest.DefaultToString(number: ∞, innerIterations: 10000000) Duration 732.380 690.268
perftest.DoubleToStringTest.DefaultToString(number: 1.79769313486232E+308, innerIterations: 100000) Duration 245.580 30.721
perftest.DoubleToStringTest.DefaultToString(number: 104234.343, innerIterations: 1000000) Duration 620.308 298.880
perftest.DoubleToStringTest.DefaultToString(number: 2.2250738585072E-308, innerIterations: 100000) Duration 226.605 38.423
perftest.DoubleToStringTest.DefaultToString(number: NaN, innerIterations: 10000000) Duration 681.121 688.780
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: -1.79769313486232E+308, innerIterations: 100000) Duration 252.797 26.215
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: ∞, innerIterations: 20000000) Duration 1130.165 1072.641
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 0, innerIterations: 4000000) Duration 519.740 519.731
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 1.79769313486232E+308, innerIterations: 100000) Duration 239.098 27.831
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 104234.343, innerIterations: 1000000) Duration 626.809 269.358
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: NaN, innerIterations: 20000000) Duration 1042.436 1129.455
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: -1.79769313486232E+308, innerIterations: 100000) Duration 241.084 30.417
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 0, innerIterations: 2000000) Duration 332.165 316.559
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 1.79769313486232E+308, innerIterations: 100000) Duration 217.656 28.694
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 250, innerIterations: 2000000) Duration 706.236 813.597
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 4.94065645841247E-324, innerIterations: 100000) Duration 222.350 40.334
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: -1.79769313486232E+308, innerIterations: 100000) Duration 324.054 132.538
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 0, innerIterations: 2000000) Duration 535.282 538.055
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 1.79769313486232E+308, innerIterations: 100000) Duration 324.008 132.549
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 250, innerIterations: 2000000) Duration 862.649 1052.656
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 4.94065645841247E-324, innerIterations: 100000) Duration 220.087 49.086
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: -1.79769313486232E+308, innerIterations: 100000) Duration 218.129 30.110
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: ∞, innerIterations: 20000000) Duration 1373.222 1383.536
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 0, innerIterations: 2000000) Duration 293.404 278.052
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 1.79769313486232E+308, innerIterations: 100000) Duration 235.778 28.887
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 250, innerIterations: 2000000) Duration 733.715 746.424
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 4.94065645841247E-324, innerIterations: 100000) Duration 213.085 39.974
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: NaN, innerIterations: 20000000) Duration 1356.486 1285.336
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: -1.79769313486232E+308, innerIterations: 100000) Duration 239.188 29.196
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 0, innerIterations: 2000000) Duration 333.325 324.171
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 1.79769313486232E+308, innerIterations: 100000) Duration 239.025 29.986
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 250, innerIterations: 2000000) Duration 776.125 881.487
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 4.94065645841247E-324, innerIterations: 100000) Duration 233.128 41.974
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -1.79769313486232E+308, innerIterations: 100000) Duration 443.718 45.578
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 0, innerIterations: 2000000) Duration 306.752 283.209
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1.79769313486232E+308, innerIterations: 100000) Duration 506.462 46.712
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 250, innerIterations: 2000000) Duration 821.277 856.164
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 4.94065645841247E-324, innerIterations: 100000) Duration 231.865 49.403

Following are the details of each performance test result:

Before change (With only Dragon4):

Test Name Metric Iterations AVERAGE STDEV.S MIN MAX
perftest.DoubleToStringTest.Decimal_ToString Duration 14 762.895 86.520 682.385 882.800
perftest.DoubleToStringTest.DefaultToString(number: -1.79769313486232E+308, innerIterations: 100000) Duration 43 237.492 27.334 214.093 283.049
perftest.DoubleToStringTest.DefaultToString(number: -8.98846567431158E+307, innerIterations: 100000) Duration 44 227.782 25.729 197.547 261.339
perftest.DoubleToStringTest.DefaultToString(number: ∞, innerIterations: 10000000) Duration 14 732.380 98.418 622.394 882.606
perftest.DoubleToStringTest.DefaultToString(number: 1.79769313486232E+308, innerIterations: 100000) Duration 41 245.580 28.099 214.045 281.792
perftest.DoubleToStringTest.DefaultToString(number: 104234.343, innerIterations: 1000000) Duration 17 620.308 2.119 616.989 624.958
perftest.DoubleToStringTest.DefaultToString(number: 2.2250738585072E-308, innerIterations: 100000) Duration 45 226.605 26.536 203.087 270.209
perftest.DoubleToStringTest.DefaultToString(number: NaN, innerIterations: 10000000) Duration 15 681.121 76.504 595.638 783.120
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: -1.79769313486232E+308, innerIterations: 100000) Duration 40 252.797 31.236 215.753 299.873
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: ∞, innerIterations: 20000000) Duration 9 1130.165 98.503 985.048 1247.706
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 0, innerIterations: 4000000) Duration 20 519.740 58.589 455.855 596.709
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 1.79769313486232E+308, innerIterations: 100000) Duration 43 239.098 27.827 214.150 282.854
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 104234.343, innerIterations: 1000000) Duration 17 626.809 47.565 600.915 773.405
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: NaN, innerIterations: 20000000) Duration 10 1042.436 128.899 905.122 1227.495
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: -1.79769313486232E+308, innerIterations: 100000) Duration 42 241.084 28.454 216.058 289.477
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 0, innerIterations: 2000000) Duration 31 332.165 1.843 329.610 338.415
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 1.79769313486232E+308, innerIterations: 100000) Duration 46 217.656 1.671 214.852 221.147
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 250, innerIterations: 2000000) Duration 15 706.236 21.612 693.022 773.103
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 4.94065645841247E-324, innerIterations: 100000) Duration 45 222.350 18.677 209.732 263.866
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: -1.79769313486232E+308, innerIterations: 100000) Duration 31 324.054 1.322 322.386 327.643
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 0, innerIterations: 2000000) Duration 19 535.282 7.039 531.923 563.779
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 1.79769313486232E+308, innerIterations: 100000) Duration 31 324.008 2.466 322.167 335.947
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 250, innerIterations: 2000000) Duration 12 862.649 1.081 861.432 864.766
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 4.94065645841247E-324, innerIterations: 100000) Duration 46 220.087 1.112 218.273 223.328
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: -1.79769313486232E+308, innerIterations: 100000) Duration 46 218.129 1.078 215.268 221.593
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: ∞, innerIterations: 20000000) Duration 8 1373.222 5.147 1367.093 1380.559
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 0, innerIterations: 2000000) Duration 35 293.404 8.743 289.282 336.534
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 1.79769313486232E+308, innerIterations: 100000) Duration 43 235.778 24.380 214.380 271.813
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 250, innerIterations: 2000000) Duration 14 733.715 4.734 727.828 744.614
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 4.94065645841247E-324, innerIterations: 100000) Duration 47 213.085 4.213 210.354 240.162
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: NaN, innerIterations: 20000000) Duration 8 1356.486 4.812 1352.167 1366.296
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: -1.79769313486232E+308, innerIterations: 100000) Duration 42 239.188 1.206 237.702 244.364
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 0, innerIterations: 2000000) Duration 31 333.325 37.702 291.221 384.541
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 1.79769313486232E+308, innerIterations: 100000) Duration 42 239.025 1.668 237.056 246.427
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 250, innerIterations: 2000000) Duration 13 776.125 25.877 756.554 847.363
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 4.94065645841247E-324, innerIterations: 100000) Duration 43 233.128 1.049 231.429 236.670
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -1.79769313486232E+308, innerIterations: 100000) Duration 23 443.718 2.183 441.390 450.867
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 0, innerIterations: 2000000) Duration 33 306.752 9.509 301.374 347.536
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1.79769313486232E+308, innerIterations: 100000) Duration 20 506.462 55.926 441.193 575.617
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 250, innerIterations: 2000000) Duration 13 821.277 2.140 816.872 824.743
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 4.94065645841247E-324, innerIterations: 100000) Duration 44 231.865 19.151 220.482 276.559

After change (Grisu3 + Dragon4):

Test Name Metric Iterations AVERAGE STDEV.S MIN MAX
perftest.DoubleToStringTest.Decimal_ToString Duration 14 734.575 1.725 731.841 737.709
perftest.DoubleToStringTest.DefaultToString(number: -1.79769313486232E+308, innerIterations: 100000) Duration 349 28.660 2.491 26.927 35.211
perftest.DoubleToStringTest.DefaultToString(number: -8.98846567431158E+307, innerIterations: 100000) Duration 335 29.921 3.489 26.307 36.381
perftest.DoubleToStringTest.DefaultToString(number: ∞, innerIterations: 10000000) Duration 15 690.268 76.088 583.543 783.951
perftest.DoubleToStringTest.DefaultToString(number: 1.79769313486232E+308, innerIterations: 100000) Duration 326 30.721 3.561 26.674 37.145
perftest.DoubleToStringTest.DefaultToString(number: 104234.343, innerIterations: 1000000) Duration 34 298.880 32.274 275.273 360.823
perftest.DoubleToStringTest.DefaultToString(number: 2.2250738585072E-308, innerIterations: 100000) Duration 261 38.423 4.520 35.706 51.440
perftest.DoubleToStringTest.DefaultToString(number: NaN, innerIterations: 10000000) Duration 15 688.780 83.619 590.821 815.194
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: -1.79769313486232E+308, innerIterations: 100000) Duration 382 26.215 0.492 25.769 31.469
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: ∞, innerIterations: 20000000) Duration 10 1072.641 119.324 903.355 1224.387
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 0, innerIterations: 4000000) Duration 20 519.731 56.467 461.462 605.618
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 1.79769313486232E+308, innerIterations: 100000) Duration 360 27.831 2.824 25.603 34.895
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: 104234.343, innerIterations: 1000000) Duration 38 269.358 15.466 261.895 325.118
perftest.DoubleToStringTest.ToStringWithCultureInfo(cultureName: "zh", number: NaN, innerIterations: 20000000) Duration 9 1129.455 126.770 917.095 1257.702
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: -1.79769313486232E+308, innerIterations: 100000) Duration 329 30.417 3.810 26.543 37.772
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 0, innerIterations: 2000000) Duration 32 316.559 4.442 312.930 330.704
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 1.79769313486232E+308, innerIterations: 100000) Duration 349 28.694 3.437 26.043 48.303
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 250, innerIterations: 2000000) Duration 13 813.597 81.900 733.799 943.832
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 4.94065645841247E-324, innerIterations: 100000) Duration 248 40.334 3.457 38.120 49.285
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: -1.79769313486232E+308, innerIterations: 100000) Duration 76 132.538 0.801 131.197 134.821
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 0, innerIterations: 2000000) Duration 19 538.055 9.700 533.715 577.721
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 1.79769313486232E+308, innerIterations: 100000) Duration 76 132.549 0.862 131.257 135.052
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 250, innerIterations: 2000000) Duration 10 1052.656 106.536 987.218 1251.984
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 4.94065645841247E-324, innerIterations: 100000) Duration 204 49.086 0.546 48.530 52.471
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: -1.79769313486232E+308, innerIterations: 100000) Duration 332 30.110 3.206 27.742 37.711
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: ∞, innerIterations: 20000000) Duration 8 1383.536 142.176 1270.398 1633.423
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 0, innerIterations: 2000000) Duration 37 278.052 19.509 270.044 338.078
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 1.79769313486232E+308, innerIterations: 100000) Duration 347 28.887 2.463 27.277 36.683
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 250, innerIterations: 2000000) Duration 14 746.424 2.525 743.420 753.535
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 4.94065645841247E-324, innerIterations: 100000) Duration 251 39.974 0.662 39.270 43.060
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: NaN, innerIterations: 20000000) Duration 8 1285.336 2.808 1282.338 1291.024
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: -1.79769313486232E+308, innerIterations: 100000) Duration 343 29.196 0.609 28.425 33.425
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 0, innerIterations: 2000000) Duration 31 324.171 37.880 273.118 376.826
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 1.79769313486232E+308, innerIterations: 100000) Duration 334 29.986 2.936 28.026 38.707
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 250, innerIterations: 2000000) Duration 12 881.487 116.209 761.810 1086.569
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 4.94065645841247E-324, innerIterations: 100000) Duration 239 41.974 3.493 39.770 51.080
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -1.79769313486232E+308, innerIterations: 100000) Duration 220 45.578 0.543 44.847 48.269
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 0, innerIterations: 2000000) Duration 36 283.209 1.340 281.582 287.001
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1.79769313486232E+308, innerIterations: 100000) Duration 215 46.712 4.459 43.934 58.510
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 250, innerIterations: 2000000) Duration 12 856.164 26.094 841.487 937.493
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 4.94065645841247E-324, innerIterations: 100000) Duration 203 49.403 0.728 48.735 55.378

Above results were generated by the code: https://github.com/dotnet/corefx/blob/master/src/System.Runtime/tests/Performance/Perf.Double.cs

@mazong1123 mazong1123 changed the title [WIP] Added Grisu3 algorithm support for double.ToString(). Added Grisu3 algorithm support for double.ToString(). Oct 22, 2017

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 22, 2017

@tarekgh

This comment has been minimized.

Copy link
Member

commented Oct 23, 2017

Because 250 in 17 digits precision fails in Grisu3, it becomes much slower in the new implementation.

Do we know how much slower when we hit such cases? is there a way can detect such numbers early and call Dragon directly? I am trying to find a mitigation for the cases we are slower here.

@tarekgh

This comment has been minimized.

Copy link
Member

commented Oct 23, 2017

@dotnet-bot test Windows_NT x64 corefx_baseline
@dotnet-bot test Ubuntu x64 corefx_baseline

@dotnet dotnet deleted a comment from dotnet-bot Oct 23, 2017

@dotnet dotnet deleted a comment from dotnet-bot Oct 23, 2017

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 24, 2017

Do we know how much slower when we hit such cases?

Following are the test results failed in Grisu3. The minimum regression is around 4%, the maximum regression is around 22%. The regression depends on the requested precision. We can exit Grisu3 quicker when requested precision smaller (If we can exhaust all the requested digit count then we success in Grisu3).

Test Name Metric Before After
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 250, innerIterations: 2000000) Duration 706.236 813.597
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 250, innerIterations: 2000000) Duration 862.649 1052.656
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 250, innerIterations: 2000000) Duration 821.277 856.164

is there a way can detect such numbers early and call Dragon directly?

AFAIK we do not have an existing algorithm to do this (except the check algorithm in Grisu3 itself, which I'm using in the implementation). I may try to find a way to jump to Dragon4 earlier but we may need to set a bottom line for this - what's the acceptable regression rate for those numbers fail in Grisu3?

Note that even if we fall back to Dragon4, we still have a great performance improvement compare to current (2.0.0) implementation.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 24, 2017

BTW, the CI failure seems caused by some jit compile error, which should not be introduced by my code.

@tarekgh

This comment has been minimized.

Copy link
Member

commented Oct 24, 2017

but we may need to set a bottom line for this - what's the acceptable regression rate for those numbers fail in Grisu3?

I care about the common scenarios (e.g. Double.ToString() and Double.ToString("R")). from your data, we'll have around 14% regression with "E" and around 4% of "R". what is the regression with "G"? I think the regression with "R" is acceptable. if the regression with "G" is small too (around 5%) that will be acceptable. Also if the regression occurs with small set/ranges of numbers, that can mitigate the regression too.

In general, I am seeing the perf gain with Grisu3 is encouraging and worth to have it.

BTW, the CI failure seems caused by some jit compile error, which should not be introduced by my code.

right I looked early at the failures and I am not seeing it related to your changes

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 25, 2017

@tarekgh Following are the test for the failing numbers. I tested two numbers 1 and 250:

Test Name Before After Regression
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 1, innerIterations: 2000000) 492.063 602.687 22.48%
perftest.DoubleToStringTest.ToStringWithFormat(format: "E", number: 250, innerIterations: 2000000) 673.127 729.817 8.42%
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 1, innerIterations: 2000000) 740.304 843.066 13.8%
perftest.DoubleToStringTest.ToStringWithFormat(format: "F50", number: 250, innerIterations: 2000000) 843.800 976.053 15.67%
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 1, innerIterations: 2000000) 556.792 686.154 23.23%
perftest.DoubleToStringTest.ToStringWithFormat(format: "G", number: 250, innerIterations: 2000000) 719.671 828.729 15.15%
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 1, innerIterations: 2000000) 511.849 638.235 24.69%
perftest.DoubleToStringTest.ToStringWithFormat(format: "G17", number: 250, innerIterations: 2000000) 749.973 848.533 13.14%
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1, innerIterations: 2000000) 644.725 774.531 20.13%
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 250, innerIterations: 2000000) 817.666 845.490 3.4%

The regression of different numbers are variety, basically depends one the digits of numbers and the requested precision. We need to have a tradeoff - is it worthwhile to make 99.5% numbers 90% quicker but sacrifice performance of the left 0.5% numbers (up to 24% regression).

Meanwhile, I'm trying to mitigate the regression. Currently I don't have any good approach yet. Cache the failed numbers and precision in a limited memory space may be an easy way to go (e.g., a micro LRU cache system). Or we need to design a new algorithm for pre-checking the numbers specifically other than Grisu3's self-checking during producing digits. Or combine the above two approaches.

@tarekgh

This comment has been minimized.

Copy link
Member

commented Oct 25, 2017

is it worthwhile to make 99.5% numbers 90% quicker but sacrifice performance of the left 0.5% numbers (up to 24% regression).

if we are sure that the failing numbers are in the range of 0.5% then I think it is worth it.

Cache the failed numbers and precision in a limited memory space may be an easy way to go (e.g., a micro LRU cache system).

I think we don't need to create that complication. if we don't have a simple way to check if the number will fail, then we shouldn't add any complexity and we may just accept the regression.

@tannergooding

This comment has been minimized.

Copy link
Member

commented Oct 25, 2017

@mazong1123, @tarekgh: I still think it would be useful to know where these failing numbers fall.

Floating-point values that are normalized between -1.0 and +1.0 are fairly common in a number of different applications. If a large portion of the failing 0.5% (which is still 92.2 Quadrillion values) fall in that range, than it may actually be more common than the percentage would lead you to believe.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 25, 2017

@tannergooding Sure I'm going to collect the number range of 0.5% in the next step. The paper didn't tell us this information. I need to collect it and prove it by myself so it may take sometime. But I agree it's worth doing it.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 30, 2017

@tannergooding @tarekgh I've spent sometime to investigate what kind of numbers will fail in Grisu3 with fixed precision. And here is the brief result:

There're 3 factors can impact the result:

  • The requested precision: count.
  • The integral parts of the input double value: p1.
  • The fractional parts of the input double value: p2.

The algorithm is first to produce digits according to p1. If the count can be exhausted during or after this process, that's good, we can use Grisu3 for this value. If there're left count to generate, we go to p2. If we can exhaust left count while producing digits of p2, we success in Grisu3 either. Otherwise, we fail in Grisu3.

We're easy to generate the numbers fail in Grisu3 - just make the precision large enough, and p1 and p2 small enough.

For example, let's say count is 17 (which is our round-trip precision), the input value is 1 (the p2 is 0, obviously). In fact we cannot generate 17 digits when producing the integral parts (p1), and since p2 is 0 which can generate nothing, we'll fail in Grisu3.

However, 1.1 with 17 digits precision can success because the fractional part (p2) is large enough so that we can generate the left count completely.

So you can imagine, in 17 digits precision. a large set of double numbers without factional parts will fail in Grisu3 (1, 2, ...100, 1000 will fail, 1.1, 1.2, 2.1 will success). Of cause, if the requested precision is small enough, those numbers can success in Grisu3 - For instance, 1 can success in Grisu3 with 5 digits precision.

Also, I think the 0.5% miss is probably for free format - generate for shortest length, not for fixed format, which is our case. I didn't do comparing test but you can think of it - I can give a very large requested count, and all the double values without fractional part will fail, it definitely does not limit to 0.5% seems like it's still around 99.5%.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 1, 2017

I'm collecting the hit rate for all double numbers for 17 digits precision. Hope this can give us a brief idea. I'd like to collect the hit rate from 1 digits precision to 30 digits precision later on.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 1, 2017

I've calculated to 109500000000 by now and the hit rate is 99.27%. My machine does not have that power to calculate all double numbers I guess:

The complete Grisu3 hit rate for precision count 17 in 109390000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109400000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109410000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109420000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109430000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109440000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109450000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109460000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109470000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109480000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109490000000 numbers is 99.27
The complete Grisu3 hit rate for precision count 17 in 109500000000 numbers is 99.27

Wrap up:

  • The Grisu3 is applicable for 99.27%+ double numbers even for fixed format. (haven't finished the validation, but it should be close to 99.5% - claimed in the paper).
  • The fail numbers are not particularly subnormals or NaNs etc. It depends on the required precision, its integral and fractional parts.
  • The numbers success in Grisu3 have nearly 90% performance gain, whereas those fail in Grisu3 have up to 24% regression, compare to sole Dragon4.

For my opinion, it's worth having it. Not only for the 90% performance gain for 99.5% numbers, but also because it has been widely adopted in other frameworks/libraries (V8, Rust, Go).

Additionally, Python using Errol3, which claims have improvement from Grisu3, but there're some issues of this algorithm so I still think Grisu3 is a better choice for now. And maybe next time we can upgrade it to Errol3 - well, that's another topic.

@tarekgh @tannergooding thoughts?

@tarekgh

This comment has been minimized.

Copy link
Member

commented Nov 3, 2017

@mazong1123 does Grisu3 always succeed with the lower precisions? I mean, you mentioned before it always fails with 17 digit precisions with the numbers doesn't have fractions. Is it true for 15-digits precisions?

What I am trying to get into is if we can have Grisu3 is on by default when lower precision is used and we can use Dragon for higher precision.

I am inclining to have Grisu3 enabled but in same time just want to reduce the regression in the failed case.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 5, 2017

does Grisu3 always succeed with the lower precisions? I mean, you mentioned before it always fails with 17 digit precisions with the numbers doesn't have fractions. Is it true for 15-digits precisions?

Depends on the number, it will fail in 15-digits precisions either. For example, 10000000000000000000000.0 fail in both 17 and 15 digits precisions, but can be success in digits 1 ~ 3 precisions.

if we can have Grisu3 is on by default when lower precision is used and we can use Dragon for higher precision.

Unfortunately, it is not linear. For example, 10000000000000000000000.0 will fail in digits 5 precision, but 1000000000000000000000.0 (with 1 zero less) and 100000000000000000000000.0(with 1 additional zero) will both success in digits 5. Although the lower precision has greater chance to be success, we cannot compute the success rate simply relies on the input number's digit length.

To estimate the number being success in Grisu3 is part of Errol3's job. Although Errol3 can always success, it is slower than Grisu3 at sometime. See the paper: https://cseweb.ucsd.edu/~mandrysc/pub/dtoa.pdf

One possible optimization I think is to extract the error analysis from Errol3 and apply to Grisu3. Someone had the same idea: https://news.ycombinator.com/item?id=10922347

@tarekgh

This comment has been minimized.

Copy link
Member

commented Nov 6, 2017

@mazong1123 thanks for the details, it is very helpful.

Is it possible we detect the numbers that don't have fraction part and then use Dragon? do you think this will reduce the possibilities of having Grisu3 to fail when it is used? As you see, I am trying to reduce the possibility of regressions as much as possible.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 6, 2017

Is it possible we detect the numbers that don't have fraction part and then use Dragon?

@tarekgh Yes I'm trying to do that. The best is to extract the error analysis (determine whether the number will fail in Grisu3 quickly) from Errol3 and apply to Grisu3. That should increase the success rate from 99.5% to 99.95% in theory.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 12, 2017

I finally have time to work on this issue. I found it's too complicated to extract the error analysis from Errorl and apply it to Grisu3. So I just made a shortcut in DigitGen: When p2 (fractional part) is 0, predicate if p1 is good to produce the numbers in requested digit count. I'll run the micro-benchmark to get the measurement result this week.

    if (p2 == 0 && (count >= 12 || p1 < GetPowerOfTen(count - 1)))
    {
        return false;
    }
static const int SIGNIFICAND_LENGTH = 64;

private:
UINT64 m_f;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Jan 4, 2018

Member

Just using f and e as the names is confusing without additional context indicating that they are the significand and exponent.

It also isn't clear if this supports a sign and whether or not the exponent is biased.

{
if (((FPDOUBLE*)&value)->exp != 0)
{
// For normalized value, according to https://en.wikipedia.org/wiki/Double-precision_floating-point_format

This comment has been minimized.

Copy link
@tannergooding

tannergooding Jan 4, 2018

Member

This is missing the sign handling and the case where the implicit significand bit is 0.

Edit: Nevermind, I see the case where the implicit significant bit is 0 is below.

#include <math.h>

// 1/lg(10)
const double Grisu3::D_1_LOG2_10 = 0.30102999566398114;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Jan 4, 2018

Member

This is incorrect. 1 / log2(10) is 0.301029995663981195..., which (when rounded) is exactly representable in binary64 as 0.30102999566398120

// This implementation is based on the paper: http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
// You must read this paper to fully understand the code.
//
// Note: Instead of generating shortest digits, we generate the digits according to the input count.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Jan 4, 2018

Member

You should probably list this as a deviation, rather than a note.

@tannergooding

This comment has been minimized.

Copy link
Member

commented Jan 4, 2018

I did a quick walkthrough of the new code and the paper and this looks to be generally correct.

I still want to dig into this more in depth, but we can probably start the other process while I do that.

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 4, 2018

@dotnet-bot test Ubuntu x64 Checked Build and Test (Jit - CoreFx)

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 4, 2018

Seems like I'm not able to restart the test Ubuntu x64 Checked Build and Test (Jit - CoreFx). Hmm... I'll focus on the review comments first.

@tarekgh

This comment has been minimized.

Copy link
Member

commented Jan 4, 2018

@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@tarekgh

This comment has been minimized.

Copy link
Member

commented Jan 9, 2018

@tarekgh

This comment has been minimized.

Copy link
Member

commented Jan 9, 2018

I have asked @joperezr to start following up with the LCA so we can proceed with the changes. meanwhile, @tannergooding can finish his code review so we'll be ready.

@joperezr

This comment has been minimized.

Copy link
Member

commented Jan 12, 2018

I have talked to LCA on our side and we are good to go to take this as long as @tannergooding approves the implementation, which looks good to me.

@tannergooding

This comment has been minimized.

Copy link
Member

commented Jan 12, 2018

It looks correct to me as well, having dug more in depth.

I'd like to see the requested comments/documentation added to the source code and the minor error with the 1 / log2(10) constant resolved first (#14646 (comment)).

@tarekgh

This comment has been minimized.

Copy link
Member

commented Jan 12, 2018

@mazong1123 could you please address the remaining comments on the code review so we can proceed to merge it?

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 13, 2018

Great! Will address the code review comments within next week :)

@tarekgh

This comment has been minimized.

Copy link
Member

commented Jan 24, 2018

@mazong1123 did you have a chance to finish this one?

@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 25, 2018

@tarekgh Will do this weekend. Sorry for the late. Busy for onboarding the new team...

Updated according to the review comments.
Added more comments. Changed the value of D_1_LOG2_10
@mazong1123

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 27, 2018

@tarekgh @tannergooding I just updated the code according to the review comments. Please take a look. Thanks!

@tarekgh

This comment has been minimized.

Copy link
Member

commented Jan 29, 2018

@tannergooding if you are ok with the latest changes, we can merge this one.

@tarekgh tarekgh merged commit 5bfd3c1 into dotnet:master Jan 29, 2018

15 checks passed

Alpine.3.6 x64 Debug Build Build finished.
Details
CROSS Check Build finished.
Details
CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test.* Build finished.
Details
Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm64 Cross Debug Innerloop Build Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test.* Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
WIP ready for review
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
license/cla All CLA requirements met.
Details
@tarekgh

This comment has been minimized.

Copy link
Member

commented Jan 29, 2018

@joperezr please follow up if there is any license docs need to be updated.

Thanks @mazong1123 for getting this done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.