New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: double.ToString() 3x slower in some cases on Linux #10651

Closed
tarekgh opened this Issue Apr 2, 2017 · 20 comments

Comments

@tarekgh
Member

tarekgh commented Apr 2, 2017

We have enhanced the performance of Double.ToString when running on Linux. we used to be 7x slower on Linux than on Windows but now we are around 3x slower in some cases.this issue is tracking looking at Double.ToString performance to optimize more when running on Linux.

Note that, the current Double.ToString performance is slightly better than Windows in some cases (like when using Double.MaxValue /2) for instance.

#10390 is the original issue which has more details about what we have done so far and what more can be done.

@karelz

This comment has been minimized.

Member

karelz commented Apr 2, 2017

@tarekgh just to clarify: Are we tracking improvements on Linux (title says that) or on Windows (first paragraph says that). Or both?

@tarekgh

This comment has been minimized.

Member

tarekgh commented Apr 2, 2017

we are tracking to fix the perf on Linux as the title mention. thanks for the catch, I had typo in the first paragraph and I have fixed it.

@karelz karelz added the os-linux label Apr 2, 2017

@karelz karelz changed the title from Perf: double.ToString() slower in some cases on Linux to Perf: double.ToString() 3x slower in some cases on Linux Apr 2, 2017

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jun 30, 2017

Given an up-for-grabs label here, I'm wondering if anybody is working on this issue? If not, I can help on this.

@karelz

This comment has been minimized.

Member

karelz commented Jun 30, 2017

It is not assigned to anyone, so no one is working on it (see Triage rules). Assigning to you ...

@tarekgh

This comment has been minimized.

Member

tarekgh commented Jun 30, 2017

@mazong1123 thanks for looking at this. could you please share your thoughts before applying any changes?

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jul 1, 2017

@tarekgh I'm still reading the context and previous works you guys have done. Once the preparation finished, I'll list the overview of the issue from my perspective and then put my proposals here for discussion.

This should be done by 7/4 CST.

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jul 2, 2017

@tarekgh I've read the context of the double.ToString() performance issue. Following are my thoughts and plan. Please let me know if I missed anything or you have any input.

Overview

  • Originally, double.ToString() on Linux is 7x slower than Windows because of the inefficient implementation of pal _ecvt, which is trying to emulate the behavior of Windows _ecvt.

  • Re-write _ecvt by leveraging ecvt_r can get the double.ToString()'s performance back to Windows. However, ecvt_r breaks round tripping for edge cases (e.g. Double.MinValue) so we cannot use ecvt_r.

  • Re-write _ecvt using the code of CoreRT DoubleToNumber can improve the performance (but not reach the Windows performance). We have applied the changes to unblock the customers.

Plan

  • As mentioned by @jkotas in
    #10390 (comment) #10390 (comment) , the long term fix is to re-implement DoubleToNumber in an OS-independent manner, probably in full-managed code, so I'll try followings:
  1. Write a demo based on the paper: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf
  2. If the demo is efficient as expected, apply it to DoubleToNumber.
  3. If the performance improved as expected, convert the implementation to managed code.
@tarekgh

This comment has been minimized.

Member

tarekgh commented Jul 3, 2017

@mazong1123 thanks for looking at that.

Write a demo based on the paper: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf

2 things we need to confirm here if we are going to use this paper:

  • there is no legal issues to implement this paper in open source. @terrajobst could you please help with this question?
  • May be this paper implementation will require having a static data. we need to know how big this data would be as we are concerned about the size too.

thanks again for looking at this.

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jul 4, 2017

May be this paper implementation will require having a static data. we need to know how big this data would be as we are concerned about the size too.

Just to be clarified:

  • Are you concerning about the size of data stored in static variables of C/C++ code?
  • Do we have any specification/guideline with details of the size constrain?
@tarekgh

This comment has been minimized.

Member

tarekgh commented Jul 4, 2017

Are you concerning about the size of data stored in static variables of C/C++ code?

Any data that increase the size of the net core on the disk will be a concern. If the data is really small (e.g. small arrays or static data). But if the data is few KB's then we need to look at that and evaluate the benefits. We had some work to optimize the size of the net core on the disk.

CC @jkotas

@jkotas

This comment has been minimized.

Member

jkotas commented Jul 4, 2017

I would worry about speed and correctness first. Once you have prototype that is fast and correct, we will worry about the rest (disk footprint, license, etc.)

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jul 4, 2017

@jkotas I'll write a demo first so that we can examine the correctness and speed. If it does not work as expected, we can have a quick turnaround.

BTW, do you have any idea other than trying the paper?

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jul 8, 2017

@jkotas @tarekgh I've prototype the paper roughly. Before I go further, I'd like to share the information I've learnt so far. With the algorithm,

  1. double.MinValue can be exactly converted to a string -17976931348623157 with an exponent value 308, same as full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), but is more accurarte than using double.ToString(), whose result is -1.79769313486232E+308;

  2. The number 7.9228162514264338E+28 which was mentioned in #10390 (comment) can be converted to a string 7922816251426434 with an exponent value 28, which is less accurate comparing to full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), whose output is 7.9228162514264338E+28, but is more accurate comparing to using double.ToString(), whose output is 7.92281625142643E+28

So I believe generally, the algorithm is more accurate than current implementation, but I need to adopt the double.ToString("R") implementation to my prototype and run the test cases again.

@tarekgh BTW, in #10390 (comment) , when testing double.MinValue against evct_r implementation, were you using double.ToString() or double.ToString("R")? I'm asking because even full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 will produce the wrong number when using double.ToString() to convert double.MinValue.

@tarekgh

This comment has been minimized.

Member

tarekgh commented Jul 10, 2017

double.MinValue can be exactly converted to a string -17976931348623157 with an exponent value 308, same as full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), but is more accurarte than using double.ToString(), whose result is -1.79769313486232E+308;

Yes this is expected to have ToString("R") is more accurate than ToString(). this is why we have "R" guarantee the round tripping but not ToString().

The number 7.9228162514264338E+28 which was mentioned in #10390 (comment) can be converted to a string 7922816251426434 with an exponent value 28, which is less accurate comparing to full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), whose output is 7.9228162514264338E+28, but is more accurate comparing to using double.ToString(), whose output is 7.92281625142643E+28

We need to validate parsing back the value produced from "R" is getting exact same number we originally started with before converting to string.

@tarekgh BTW, in #10390 (comment) , when testing double.MinValue against evct_r implementation, were you using double.ToString() or double.ToString("R")? I'm asking because even full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 will produce the wrong number when using double.ToString() to convert double.MinValue.

I was referring to "R". we understand ToString() without "R" doesn't produce accurate numbers. it more rounded.

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jul 17, 2017

I fixed several bugs of my prototype. Now the accuracy is great. The prototype code can be found at https://github.com/mazong1123/doubletonumber

I'll start to refactor & optimize the code and test the performance on Linux.

@mazong1123

This comment has been minimized.

Collaborator

mazong1123 commented Jul 17, 2017

@tarekgh I just have one question regarding double.ToString("R"). I saw the logic is like:

  1. Try to convert the double to string in precision of 15.
  2. Convert the string back to double and compare to the original double. If they are the same, we return the converted string whose precision is 15.
  3. Otherwise, convert the double to string in precision of 17.

Is there any particular reason preventing us to convert the double to string in precision of 17 directly? I'm asking because:

  1. There's a bug https://stackoverflow.com/questions/24299692/why-is-a-round-trip-conversion-via-a-string-not-safe-for-a-double . Let's say we have following code:
using System;

namespace lab
{
    class Program
    {
        static void Main(string[] args)
        {
            double d1 = 0.84551240822557006;
            string s = d1.ToString("R");
            double d2 = double.Parse(s);
            Console.WriteLine(d1 == d2);
        }
    }
}

In current .NET Core (2.0.0-preview2-005905), the output is False. The reason is d1.ToString("R") wrongly chose the result in precision of 15, which is 0.84551240822557. If we choose result in precision of 17, which is 0.84551240822557006, we can convert it back to double accurately.

  1. Get rid of this logic will simply make the code faster IMO.
@tarekgh

This comment has been minimized.

Member

tarekgh commented Jul 17, 2017

@mazong1123 I don't know the history why decided to format with "R" as 15 digit precision and not 17 directly. my guess is was the old implementation with 15 digit precision was faster than 17 digits and almost all common cases can fit in 15 digit precision.

The doc https://msdn.microsoft.com/en-us/library/kfsatb94(v=vs.110).aspx is stating this design but doesn't tell the rationale behind it. also the doc sample is exactly the same case mentioned in your code sample.

By default, the return value only contains 15 digits of precision although a maximum of 17 digits is maintained internally. If the value of this instance has greater than 15 digits, ToString returns PositiveInfinitySymbol or NegativeInfinitySymbol instead of the expected number. If you require more precision, specify format with the "G17" format specification, which always returns 17 digits of precision, or "R", which returns 15 digits if the number can be represented with that precision or 17 digits if the number can only be represented with maximum precision.
Notes to Callers:
In some cases, Double values formatted with the "R" standard numeric format string do not successfully round-trip if compiled using the /platform:x64 or /platform:anycpu switches and run on 64-bit systems. To work around this problem, you can format Double values by using the "G17" standard numeric format string. The following example uses the "R" format string with a Double value that does not round-trip successfully, and also uses the "G17" format string to successfully round-trip the original value.

I think we can go with the breaking change to always format as 17 digits precision if it is really faster in the common cases.

@jkotas what you think about that? and do you know why we had this design in the first place?

@jkotas

This comment has been minimized.

Member

jkotas commented Jul 17, 2017

what you think about that?

I think we can give it a try. These kind of changes tend to come with compat surprises...

do you know why we had this design in the first place?

I do not know.

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 19, 2017

Re-implemented the ecvt function.
Instead of leveraging snprintf, re-implement the ecvt function according to the paper: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf

Note:
1. This commit won't fix any existing bug.
2. This is a raw implementation of the paper. The performance on Linux only gain 10%. We could tune the performance further.

Fix dotnet#10651

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 19, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 19, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 23, 2017

Modified code according to code review feedback.
This commit fixed most of the issue found in code review. However, some of the feedback may not be involved due to either little performance improvement or need a POC.

Fix dotnet#10651

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 23, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 23, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

Improved multiply 10 operation.
Use shift and add operation to replace actual multiply operation.

Fix dotnet#10651

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

Changed exp > 0 to exp != 0 to remove any confusion.
exp should fall in 1 ~ 2046 for normalized value. Denormalized value has exp = 0.

Fix dotnet#10651

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

Fixed x86 compile issue for _BitScanReverse64
x86 does not support _BitScanReverse64 so we have to add additional shift operations to handle it.

Fix dotnet#10651

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 24, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 25, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 26, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 26, 2017

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Jul 27, 2017

@tannergooding

This comment has been minimized.

Member

tannergooding commented Jul 29, 2017

I investigated the 'R' case and validated that the math for converting from string to double is correct.

I also left a comment (see #13106 (comment)) indicating that we are not meeting the IEEE round tripping requirements which states that double->string->double is only required to match when done with at least 17 digits.

@tarekgh

This comment has been minimized.

Member

tarekgh commented Jul 29, 2017

@tannergooding we have another issue tracking the parsing part #1316 as it looks we have other issues than just using 17 significant digits. Roslyn already introduced a new parsing implementation which conforms to IEEE which we can consider porting it to net core.

mazong1123 added a commit to mazong1123/coreclr that referenced this issue Aug 13, 2017

tarekgh added a commit that referenced this issue Sep 13, 2017

Re-implemented the ecvt function. (#12894)
* Re-implemented the ecvt function.

Instead of leveraging snprintf, re-implement the ecvt function according to the paper: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf

Note:
1. This commit won't fix any existing bug.
2. This is a raw implementation of the paper. The performance on Linux only gain 10%. We could tune the performance further.

Fix #10651

* Resolve a cross platform header file issue.

Fix #10651

* Fixed a minor bug. Improved the performance.

Fix #10651

* Modified code according to code review feedback.

This commit fixed most of the issue found in code review. However, some of the feedback may not be involved due to either little performance improvement or need a POC.

Fix #10651

* Try to fix constexpr compile error on Windows.

Fix #10651

* Fixed a potential overflow bug in BigNum::Compare.

Fix #10651

* Improved multiply 10 operation.

Use shift and add operation to replace actual multiply operation.

Fix #10651

* Remove old _ecvt function.

Fix #10651

* Documented the reason why we do not need m+ and m-.

Fix #10651

* Changed exp > 0 to exp != 0 to remove any confusion.

exp should fall in 1 ~ 2046 for normalized value. Denormalized value has exp = 0.

Fix #10651

* Disable the _ecvt tests.

Fix #10651

* Removed _ecvt tests.

Fix #10651

* Re-implemented LogBase2.

Fix #10651

* Use DWORD and DWORD64 for _BitScanReverse and _BitScanReverse64

Fix #10651

* Fixed x86 compile issue for _BitScanReverse64

x86 does not support _BitScanReverse64 so we have to add additional shift operations to handle it.

Fix #10651

* Implemented BitScanReverse64 and BitScanReverse in pal.h

Fix #10651

* Remove the confusion comment which is unrelated to BitScanReverse.

Fix #10651

* Introduced wmemset to enhance the perf for 0.0

Fix #10651

* Improved the performance of converting 0.0.

Fix #10651

* Renamed ecvt to DoubleToNumberWorker.

Fix #10651

* Updated code according to the code review feedback.

Fix #10651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment