Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding single-precision math functions. #5492

Merged
merged 7 commits into from Oct 18, 2016

Conversation

Projects
None yet
10 participants
@tannergooding
Copy link
Member

commented Jun 4, 2016

Summary

This PR implements dotnet/corefx#1151, by providing scalar single-precision floating-point support for many of the trigonometric, logarithmic, and other common mathematical functions.

Worklist

  • Provide single-precision math functions in the PAL layer.
  • Provide single-precision math tests in the PAL layer
  • Provide single-precision math functions in COMSingle for the FCALL hookups
  • Provide single-precision math functions in mscorlib#System.MathF for the managed layer
  • Provide the appropriate definitions in the ecalllist for the VM layer
  • Provide the appropriate intrinsic implementations for specific single-precision math functions
  • Provide a set of unit tests over the new single-precision math APIs
  • Provide a set of performance tests over the new single-precision math APIs

New APIs

The new APIs provide feature-parity with the existing double-precision math functions provided by the framework.

public static class BitConverter
{
    public static float Int32BitsToSingle(int value);
    public static int SingleToInt32Bits(float value) { return default(int); }
}

public static partial class MathF
{
    public const float PI = 3.14159265f;
    public const float E = 2.71828183f;

    public static float Abs(float x);
    public static float Acos(float x);
    public static float Asin(float x);
    public static float Atan(float x);
    public static float Atan2(float y, float x);
    public static float Ceiling(float x);
    public static float Cos(float x);
    public static float Cosh(float x);
    public static float Exp(float x);
    public static float Floor(float x);
    public static float IEEERemainder(float x, float y);
    public static float Log(float x);
    public static float Log(float x, float y);
    public static float Log10(float x);
    public static float Max(float x, float y);
    public static float Min(float x, float y);
    public static float Pow(float x, float y);
    public static float Round(float x);
    public static float Round(float x, int digits);
    public static float Round(float x, int digits, System.MidpointRounding mode);
    public static float Round(float x, System.MidpointRounding mode);
    public static int Sign(float x);
    public static float Sin(float x);
    public static float Sinh(float x);
    public static float Sqrt(float x);
    public static float Tan(float x);
    public static float Tanh(float x);
    public static float Truncate(float x);
}

Perf Numbers

All performance tests are implemented as follows:

  • 100,000 iterations are executed
  • The time of all iterations are aggregated to compute the Total Time
  • The time of all iterations are averaged to compute the Average Time
  • A single iteration executes some simple operation, using the function under test, 5000 times

The execution time below is the Total Time for all 100,000 iterations, measured in seconds.

Hardware: Desktop w/ 3.7GHz Quad-Core A10-7850K (AMD) and 16GB RAM

Function Improvment Execution Time - Double Execution Time - Single
Abs 0.199243555% 0.63752649s 0.63625626s
Acos 12.30220910% 11.5265412s 10.1085220s
Asin 18.66801808% 11.9472425s 9.71692911s
Atan 21.10350002% 10.9964683s 8.67582861s
Atan2 20.51327307% 24.3328097s 19.3413540s
Ceiling 12.91487191% 1.87116459s 1.62950608s
Cos 5.026665542% 7.19916547s 6.83728750s
Cosh 16.46166555% 13.5416170s 11.3124413s
Exp 33.67586387% 6.65578424s 4.41439140s
Floor 10.39208688% 1.74655247s 1.56504922s
Log 19.81117664% 6.42244806s 5.15008553s
Log10 18.40605725% 6.75118866s 5.50856101s
Pow 47.85595440% 31.8820155s 16.6245727s
Round 0.976398142% 4.22620632s 4.18494172s
Sin 15.49539339% 5.98022268s 5.05356365s
Sinh 17.96609899% 14.6242270s 11.9968239s
Sqrt 4.676516651% 2.51281945s 2.39530703s
Tan 30.33470555% 9.07290178s 6.32066374s
Tanh 0.108182099% 8.12724112s 8.11844890s

I believe some extra perf will be squeezed out when the intrinsics (such as CORINFO_INTRINSIC_Sqrt) are properly implemented in the VM layer for single-precision values. Without such functionality, it falls back to the double-precision functionality (extra precision, reduced performance) for certain calls.

@@ -446,6 +446,16 @@ unsafe public static double ToDouble (byte[] value, int startIndex)
[SecuritySafeCritical]
public static unsafe double Int64BitsToDouble(long value) {
return *((double*)&value);
}

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 28, 2016

Author Member

This line is the 'removed' one. Please note that it has not actually been removed 😄

@tannergooding tannergooding force-pushed the tannergooding:math branch from 12696e8 to 7409063 Sep 28, 2016

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2016

Is the arm_emulator_cross_release_ubuntu_prtest known to be flaky? http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_release_ubuntu_prtest/459 had JIT/Regression/CLR-x86-JIT/V1-M12-Beta2/b80045/b80045 fail, but the test code doesn't look at all related to the changes made here.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2016

Pretty much all of the intrinsic hookup work, as I understand it, is in the valueenum.cpp file. However, there does not appear to be much (if any) documentation on the ValueNumStore.

It would be really great if I could get an overview of the functionality here (or pointed to the documentation). My main concern here is ensuring any changes don't break back-compat.

As I understand it, the compiler will end up determining that some operation is intrinsic, at which point it will attempt to break apart and evaluate the function.

When evaluating the function, if the arguments are constant, there is folding that occurs (provided precision loss is not a concern).

For functions which do not have constant arguments, they do a lookup to see if the value has already been computed (by checking if it exists in the Value Number store). If the value exists, that is returned; otherwise, a new chunk is created which executes the function defined next to the intrinsic in the external call list.

It appears as though the ValueNumStore is keyed off the function id (so VNF_Acos for example, is the function id for CORINFO_INTRINSIC_Acos) and a value number which identifies the expression. So, while two implementations can share an intrinsic (such as Math.Acos and MathF.Acos both sharing CORINFO_INTRINSIC_Acos), it is important that the function ids be unique (so we should have VNF_Acos and VNF_AcosF, for example).

Is this roughly accurate?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2016

@mellinoe, could you direct me towards the appropriate people to answer the above question (asking you since you were the last one assigned to the proposal)?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2016

@mellinoe

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

I believe that the ARM Emulator runs have been flaky in the past, but I don't know the current status of them. @jkotas , could you help with some of the other questions above?

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 28, 2016

cc @dotnet/jit-contrib for valuenum.cpp questions
cc @janvorli for runtime and PAL part

I may be a good idea to add the methods in one PR, and do the JIT optimizations in follow up PR.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2016

@jkotas, if that is possible that would be fine with me. It would also allow me to add the Perf tests in a separate PR, as they will be dependent on updating System.Runtime.Extensions with the new method contracts.

@CarolEidt

This comment has been minimized.

Copy link
Member

commented Sep 28, 2016

We currently don't have documentation for value numbering. @briansull @JosephTremoulet @erozenfeld would be good candidates to look into the value numbering implications.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2016

test Linux ARM Emulator Cross Debug Build please
segfault in JIT/Regression/CLR-x86-JIT/V1-M11-Beta1/b36332/b36332: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_debug_ubuntu_prtest/476/

@JosephTremoulet

This comment has been minimized.

Copy link
Contributor

commented Sep 29, 2016

When evaluating the function, if the arguments are constant, there is folding that occurs (provided precision loss is not a concern).

For functions which do not have constant arguments, they do a lookup to see if the value has already been computed (by checking if it exists in the Value Number store). If the value exists, that is returned; otherwise, a new chunk is created which executes the function defined next to the intrinsic in the external call list.

It appears as though the ValueNumStore is keyed off the function id (so VNF_Acos for example, is the function id for CORINFO_INTRINSIC_Acos) and a value number which identifies the expression.

That all sounds right to me.

while two implementations can share an intrinsic (such as Math.Acos and MathF.Acos both sharing CORINFO_INTRINSIC_Acos), it is important that the function ids be unique (so we should have VNF_Acos and VNF_AcosF, for example)

I'm not sure that follows; I'd expect the float and double arguments to have different value numbers because of their different types. Have you written tests / looked at IR for cases where you'd be worried about this sort of collision?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 29, 2016

I'm not sure that follows; I'd expect the float and double arguments to have different value numbers because of their different types. Have you written tests / looked at IR for cases where you'd be worried about this sort of collision?

@JosephTremoulet, I had made this assumption based on the behavior of CORINFO_INTRINSIC_Round, which treats TYP_DOUBLE (VNF_RoundDouble), TYP_FLOAT (VNF_RoundFloat), and TYP_INT (VNF_RoundInt) differently.

If that is not the case, then it certainly makes some things simpler to implement 😄

@JosephTremoulet

This comment has been minimized.

Copy link
Contributor

commented Sep 29, 2016

CORINFO_INTRINSIC_Round, which treats TYP_DOUBLE (VNF_RoundDouble), TYP_FLOAT (VNF_RoundFloat), and TYP_INT (VNF_RoundInt) differently.

Ah. Yeah, if we're already doing that for a different intrinsic, I agree with your original assessment, it's best to follow suit 😦.

@JosephTremoulet

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2016

@tannergooding , I just took a closer look, and I think it's ok to share VNF_ funcs for overloads. It looks like the argument to round is always a double, and that we're distinguishing its return type with the three different VNF_ enum values (which makes me wonder what source generates that intrinsic, since you can't overload on return type at source...). With the functions you're talking about, the different overloads have different argument types (and return types that differ from the other overloads [but agree with the argument]). So I think the (func ID, argument) pair is unambiguous for these in a way that it would be ambiguous in the round case without distinguishing the IDs. I also double-checked, and we do store each VN's type with it (or really encode it in it) -- so in the methods that need to parse this stuff (e.g. EvalMathFuncUnary), you can both extract the type from the argument valnum and you've explicitly been passed in the result type in the typ parameter. This last point makes me unsure why even the kind of ambiguity that round has would be problematic (since that typ parameter gets passed down into VNForFunc and factors into the hashing), but regardless, it looks to me like the code is expecting to support the type of overloading that you want here.


if (!_isnanf(snan))
{
Fail("_isnanf() failed to identify %I64x as NaN!\n", lsnan);

This comment has been minimized.

Copy link
@janvorli

janvorli Sep 30, 2016

Member

This is a wrong format, it should be %I32. There are other three occurences of this issue below.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Oct 18, 2016

Author Member

Fixed.

@tannergooding tannergooding force-pushed the tannergooding:math branch from 7409063 to 7756fa6 Oct 18, 2016

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 18, 2016

All tests passing. Is there any other feedback here?

Additionally, should the remaining three work items be completed as part of this PR, or should bugs be logged to track them being completed in separate PRs?

The three remaining work items are:

  • Provide the appropriate intrinsic implementations for specific single-precision math functions
  • Provide a set of unit tests over the new single-precision math APIs
  • Provide a set of performance tests over the new single-precision math APIs

@janvorli janvorli merged commit 6057b18 into dotnet:master Oct 18, 2016

14 checks passed

CentOS7.1 x64 Debug Build and Test Build finished.
Details
FreeBSD x64 Checked Build Build finished.
Details
Linux ARM Emulator Cross Debug Build Build finished.
Details
Linux ARM Emulator Cross Release Build Build finished.
Details
OSX x64 Checked Build and Test Build finished.
Details
Ubuntu x64 Checked Build and Test Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
Windows_NT arm Cross Debug Build Build finished.
Details
Windows_NT arm Cross Release Build Build finished.
Details
Windows_NT x64 Debug Build and Test Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release Priority 1 Build and Test Build finished.
Details
Windows_NT x86 legacy_backend Checked Build and Test Build finished.
Details
Windows_NT x86 ryujit Checked Build and Test Build finished.
Details
@janvorli

This comment has been minimized.

Copy link
Member

commented Oct 18, 2016

@tannergooding thank you for all this work!

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 18, 2016

@janvorli. Thanks for the merge!

I have logged the following bugs to track the remaining three work items (and have self-assigned them for the time being):
#7689
#7690
#7691

@mburbea

This comment has been minimized.

Copy link

commented Oct 18, 2016

Isn't it somewhat problematic to compile with /fp:fast rather than /fp:struct? Does this mean that depending on your hardware you could get different results?

As far as I understand, the RyuJit uses xmm registers to avoid the chance of unpredictable results. I wouldn't mind if we added the java route of a StrictMath class, if the perf difference is really worth the change.

@tannergooding tannergooding deleted the tannergooding:math branch Oct 18, 2016

@mikedn

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2016

@mburbea What exactly do you expect /fp:strict to achieve here? On x64 it shouldn't matter, on x86 you'll end up using double precision of functions such as sin and cos which seems exactly the opposite of this change's intent.

@janvorli

This comment has been minimized.

Copy link
Member

commented Oct 19, 2016

@tannergooding we have found that the change breaks NGEN. The issue is that Abs, Min, Max and Sign functions for float exists in the Math class too and their native implementation in the runtime is the same, so linker ends up folding those and violates the invariant that each method has to have unique entrypoint.

So did you know that these methods already exist for float in the Math and added them just to make the MathF "complete"?

It seems we can fix the problem in two ways. One is to remove these from the MathF and the other is to keep them, but implement them in the managed code as calls to their Math counterparts.

@KrzysztofCwalina, @jkotas do you have any opinion on those two options?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 19, 2016

I added them to make it complete (and they were part of the proposed API change as such).

I believe we should keep the APIs and have them call their legacy counterpart. I think a user who is working with System.MathF would prefer to have all of their API calls coming from the same location, if possible (it gets confusing having to mix System.MathF and System.Math depending on whether you are calling a new or old API).

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Oct 19, 2016

I implemented such a fix here: #7721

Let me know if you opt to go the other route.

sergign60 added a commit to sergign60/coreclr that referenced this pull request Nov 14, 2016

Adding single-precision math functions. (dotnet#5492)
* Adding single-precision math functions to floatsingle
* Adding single-precision math functions to the PAL layer.
* Adding single-precision math tests to the PAL layer.
* Adding single-precision math functions to mscorlib.
* Adding single-precision math function support to the vm.
* Updating floatsingle.cpp to define a _isnanf macro for Windows ARM.

@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017

@ghost

This comment has been minimized.

Copy link

commented Jan 30, 2018

Good, but why not rewriting the Math class methods to have overloaded versions with float parameters instead of creating a different class? I don't know untill now where is that MathF class!

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 30, 2018

@MohammadHamdyGhanem, because it would be a breaking change for recompiled code.

Math.Sqrt(4) today resolves to Math.Sqrt(double) (because an implicit conversion to double exists, and the only overload is for double). However, if you add a new Math.Sqrt(float) overload, overload resolution comes into play. int can be implicitly converted to either float or double, but float is preferred, so the recompiled code would call Math.Sqrt(float), which can cause an observable difference in results for certain inputs.

@ghost

This comment has been minimized.

Copy link

commented Jan 30, 2018

So, what about Math.SqrtF instead of MathF.Sqrt?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 30, 2018

See response in the other thread on CoreFX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.