Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Adding single-precision math functions. #5492

Merged
merged 7 commits into from
Oct 18, 2016
Merged

Adding single-precision math functions. #5492

merged 7 commits into from
Oct 18, 2016

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jun 4, 2016

Summary

This PR implements dotnet/corefx#1151, by providing scalar single-precision floating-point support for many of the trigonometric, logarithmic, and other common mathematical functions.

Worklist

  • Provide single-precision math functions in the PAL layer.
  • Provide single-precision math tests in the PAL layer
  • Provide single-precision math functions in COMSingle for the FCALL hookups
  • Provide single-precision math functions in mscorlib#System.MathF for the managed layer
  • Provide the appropriate definitions in the ecalllist for the VM layer
  • Provide the appropriate intrinsic implementations for specific single-precision math functions
  • Provide a set of unit tests over the new single-precision math APIs
  • Provide a set of performance tests over the new single-precision math APIs

New APIs

The new APIs provide feature-parity with the existing double-precision math functions provided by the framework.

public static class BitConverter
{
    public static float Int32BitsToSingle(int value);
    public static int SingleToInt32Bits(float value) { return default(int); }
}

public static partial class MathF
{
    public const float PI = 3.14159265f;
    public const float E = 2.71828183f;

    public static float Abs(float x);
    public static float Acos(float x);
    public static float Asin(float x);
    public static float Atan(float x);
    public static float Atan2(float y, float x);
    public static float Ceiling(float x);
    public static float Cos(float x);
    public static float Cosh(float x);
    public static float Exp(float x);
    public static float Floor(float x);
    public static float IEEERemainder(float x, float y);
    public static float Log(float x);
    public static float Log(float x, float y);
    public static float Log10(float x);
    public static float Max(float x, float y);
    public static float Min(float x, float y);
    public static float Pow(float x, float y);
    public static float Round(float x);
    public static float Round(float x, int digits);
    public static float Round(float x, int digits, System.MidpointRounding mode);
    public static float Round(float x, System.MidpointRounding mode);
    public static int Sign(float x);
    public static float Sin(float x);
    public static float Sinh(float x);
    public static float Sqrt(float x);
    public static float Tan(float x);
    public static float Tanh(float x);
    public static float Truncate(float x);
}

Perf Numbers

All performance tests are implemented as follows:

  • 100,000 iterations are executed
  • The time of all iterations are aggregated to compute the Total Time
  • The time of all iterations are averaged to compute the Average Time
  • A single iteration executes some simple operation, using the function under test, 5000 times

The execution time below is the Total Time for all 100,000 iterations, measured in seconds.

Hardware: Desktop w/ 3.7GHz Quad-Core A10-7850K (AMD) and 16GB RAM

Function Improvment Execution Time - Double Execution Time - Single
Abs 0.199243555% 0.63752649s 0.63625626s
Acos 12.30220910% 11.5265412s 10.1085220s
Asin 18.66801808% 11.9472425s 9.71692911s
Atan 21.10350002% 10.9964683s 8.67582861s
Atan2 20.51327307% 24.3328097s 19.3413540s
Ceiling 12.91487191% 1.87116459s 1.62950608s
Cos 5.026665542% 7.19916547s 6.83728750s
Cosh 16.46166555% 13.5416170s 11.3124413s
Exp 33.67586387% 6.65578424s 4.41439140s
Floor 10.39208688% 1.74655247s 1.56504922s
Log 19.81117664% 6.42244806s 5.15008553s
Log10 18.40605725% 6.75118866s 5.50856101s
Pow 47.85595440% 31.8820155s 16.6245727s
Round 0.976398142% 4.22620632s 4.18494172s
Sin 15.49539339% 5.98022268s 5.05356365s
Sinh 17.96609899% 14.6242270s 11.9968239s
Sqrt 4.676516651% 2.51281945s 2.39530703s
Tan 30.33470555% 9.07290178s 6.32066374s
Tanh 0.108182099% 8.12724112s 8.11844890s

I believe some extra perf will be squeezed out when the intrinsics (such as CORINFO_INTRINSIC_Sqrt) are properly implemented in the VM layer for single-precision values. Without such functionality, it falls back to the double-precision functionality (extra precision, reduced performance) for certain calls.

@@ -446,6 +446,16 @@ unsafe public static double ToDouble (byte[] value, int startIndex)
[SecuritySafeCritical]
public static unsafe double Int64BitsToDouble(long value) {
return *((double*)&value);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is the 'removed' one. Please note that it has not actually been removed 😄

@tannergooding
Copy link
Member Author

Is the arm_emulator_cross_release_ubuntu_prtest known to be flaky? http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_release_ubuntu_prtest/459 had JIT/Regression/CLR-x86-JIT/V1-M12-Beta2/b80045/b80045 fail, but the test code doesn't look at all related to the changes made here.

@tannergooding
Copy link
Member Author

Pretty much all of the intrinsic hookup work, as I understand it, is in the valueenum.cpp file. However, there does not appear to be much (if any) documentation on the ValueNumStore.

It would be really great if I could get an overview of the functionality here (or pointed to the documentation). My main concern here is ensuring any changes don't break back-compat.

As I understand it, the compiler will end up determining that some operation is intrinsic, at which point it will attempt to break apart and evaluate the function.

When evaluating the function, if the arguments are constant, there is folding that occurs (provided precision loss is not a concern).

For functions which do not have constant arguments, they do a lookup to see if the value has already been computed (by checking if it exists in the Value Number store). If the value exists, that is returned; otherwise, a new chunk is created which executes the function defined next to the intrinsic in the external call list.

It appears as though the ValueNumStore is keyed off the function id (so VNF_Acos for example, is the function id for CORINFO_INTRINSIC_Acos) and a value number which identifies the expression. So, while two implementations can share an intrinsic (such as Math.Acos and MathF.Acos both sharing CORINFO_INTRINSIC_Acos), it is important that the function ids be unique (so we should have VNF_Acos and VNF_AcosF, for example).

Is this roughly accurate?

@tannergooding
Copy link
Member Author

@mellinoe, could you direct me towards the appropriate people to answer the above question (asking you since you were the last one assigned to the proposal)?

@tannergooding
Copy link
Member Author

test Linux ARM Emulator Cross Debug Build please
Build timed out: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_debug_ubuntu_prtest/472

@mellinoe
Copy link

I believe that the ARM Emulator runs have been flaky in the past, but I don't know the current status of them. @jkotas , could you help with some of the other questions above?

@jkotas
Copy link
Member

jkotas commented Sep 28, 2016

cc @dotnet/jit-contrib for valuenum.cpp questions
cc @janvorli for runtime and PAL part

I may be a good idea to add the methods in one PR, and do the JIT optimizations in follow up PR.

@tannergooding
Copy link
Member Author

@jkotas, if that is possible that would be fine with me. It would also allow me to add the Perf tests in a separate PR, as they will be dependent on updating System.Runtime.Extensions with the new method contracts.

@CarolEidt
Copy link

We currently don't have documentation for value numbering. @briansull @JosephTremoulet @erozenfeld would be good candidates to look into the value numbering implications.

@tannergooding
Copy link
Member Author

test Linux ARM Emulator Cross Debug Build please
segfault in JIT/Regression/CLR-x86-JIT/V1-M11-Beta1/b36332/b36332: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_debug_ubuntu_prtest/476/

@JosephTremoulet
Copy link

When evaluating the function, if the arguments are constant, there is folding that occurs (provided precision loss is not a concern).

For functions which do not have constant arguments, they do a lookup to see if the value has already been computed (by checking if it exists in the Value Number store). If the value exists, that is returned; otherwise, a new chunk is created which executes the function defined next to the intrinsic in the external call list.

It appears as though the ValueNumStore is keyed off the function id (so VNF_Acos for example, is the function id for CORINFO_INTRINSIC_Acos) and a value number which identifies the expression.

That all sounds right to me.

while two implementations can share an intrinsic (such as Math.Acos and MathF.Acos both sharing CORINFO_INTRINSIC_Acos), it is important that the function ids be unique (so we should have VNF_Acos and VNF_AcosF, for example)

I'm not sure that follows; I'd expect the float and double arguments to have different value numbers because of their different types. Have you written tests / looked at IR for cases where you'd be worried about this sort of collision?

@tannergooding
Copy link
Member Author

I'm not sure that follows; I'd expect the float and double arguments to have different value numbers because of their different types. Have you written tests / looked at IR for cases where you'd be worried about this sort of collision?

@JosephTremoulet, I had made this assumption based on the behavior of CORINFO_INTRINSIC_Round, which treats TYP_DOUBLE (VNF_RoundDouble), TYP_FLOAT (VNF_RoundFloat), and TYP_INT (VNF_RoundInt) differently.

If that is not the case, then it certainly makes some things simpler to implement 😄

@JosephTremoulet
Copy link

CORINFO_INTRINSIC_Round, which treats TYP_DOUBLE (VNF_RoundDouble), TYP_FLOAT (VNF_RoundFloat), and TYP_INT (VNF_RoundInt) differently.

Ah. Yeah, if we're already doing that for a different intrinsic, I agree with your original assessment, it's best to follow suit 😦.

@JosephTremoulet
Copy link

@tannergooding , I just took a closer look, and I think it's ok to share VNF_ funcs for overloads. It looks like the argument to round is always a double, and that we're distinguishing its return type with the three different VNF_ enum values (which makes me wonder what source generates that intrinsic, since you can't overload on return type at source...). With the functions you're talking about, the different overloads have different argument types (and return types that differ from the other overloads [but agree with the argument]). So I think the (func ID, argument) pair is unambiguous for these in a way that it would be ambiguous in the round case without distinguishing the IDs. I also double-checked, and we do store each VN's type with it (or really encode it in it) -- so in the methods that need to parse this stuff (e.g. EvalMathFuncUnary), you can both extract the type from the argument valnum and you've explicitly been passed in the result type in the typ parameter. This last point makes me unsure why even the kind of ambiguity that round has would be problematic (since that typ parameter gets passed down into VNForFunc and factors into the hashing), but regardless, it looks to me like the code is expecting to support the type of overloading that you want here.


if (!_isnanf(snan))
{
Fail("_isnanf() failed to identify %I64x as NaN!\n", lsnan);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wrong format, it should be %I32. There are other three occurences of this issue below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@tannergooding
Copy link
Member Author

All tests passing. Is there any other feedback here?

Additionally, should the remaining three work items be completed as part of this PR, or should bugs be logged to track them being completed in separate PRs?

The three remaining work items are:

  • Provide the appropriate intrinsic implementations for specific single-precision math functions
  • Provide a set of unit tests over the new single-precision math APIs
  • Provide a set of performance tests over the new single-precision math APIs

@janvorli janvorli merged commit 6057b18 into dotnet:master Oct 18, 2016
@janvorli
Copy link
Member

@tannergooding thank you for all this work!

@tannergooding
Copy link
Member Author

@janvorli. Thanks for the merge!

I have logged the following bugs to track the remaining three work items (and have self-assigned them for the time being):
https://github.com/dotnet/coreclr/issues/7689
https://github.com/dotnet/coreclr/issues/7690
https://github.com/dotnet/coreclr/issues/7691

@mburbea
Copy link

mburbea commented Oct 18, 2016

Isn't it somewhat problematic to compile with /fp:fast rather than /fp:struct? Does this mean that depending on your hardware you could get different results?

As far as I understand, the RyuJit uses xmm registers to avoid the chance of unpredictable results. I wouldn't mind if we added the java route of a StrictMath class, if the perf difference is really worth the change.

@tannergooding tannergooding deleted the math branch October 18, 2016 14:10
@mikedn
Copy link

mikedn commented Oct 18, 2016

@mburbea What exactly do you expect /fp:strict to achieve here? On x64 it shouldn't matter, on x86 you'll end up using double precision of functions such as sin and cos which seems exactly the opposite of this change's intent.

@janvorli
Copy link
Member

@tannergooding we have found that the change breaks NGEN. The issue is that Abs, Min, Max and Sign functions for float exists in the Math class too and their native implementation in the runtime is the same, so linker ends up folding those and violates the invariant that each method has to have unique entrypoint.

So did you know that these methods already exist for float in the Math and added them just to make the MathF "complete"?

It seems we can fix the problem in two ways. One is to remove these from the MathF and the other is to keep them, but implement them in the managed code as calls to their Math counterparts.

@KrzysztofCwalina, @jkotas do you have any opinion on those two options?

@tannergooding
Copy link
Member Author

I added them to make it complete (and they were part of the proposed API change as such).

I believe we should keep the APIs and have them call their legacy counterpart. I think a user who is working with System.MathF would prefer to have all of their API calls coming from the same location, if possible (it gets confusing having to mix System.MathF and System.Math depending on whether you are calling a new or old API).

@tannergooding
Copy link
Member Author

I implemented such a fix here: #7721

Let me know if you opt to go the other route.

sergign60 pushed a commit to sergign60/coreclr that referenced this pull request Nov 14, 2016
* Adding single-precision math functions to floatsingle
* Adding single-precision math functions to the PAL layer.
* Adding single-precision math tests to the PAL layer.
* Adding single-precision math functions to mscorlib.
* Adding single-precision math function support to the vm.
* Updating floatsingle.cpp to define a _isnanf macro for Windows ARM.
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
@ghost
Copy link

ghost commented Jan 30, 2018

Good, but why not rewriting the Math class methods to have overloaded versions with float parameters instead of creating a different class? I don't know untill now where is that MathF class!

@tannergooding
Copy link
Member Author

@MohammadHamdyGhanem, because it would be a breaking change for recompiled code.

Math.Sqrt(4) today resolves to Math.Sqrt(double) (because an implicit conversion to double exists, and the only overload is for double). However, if you add a new Math.Sqrt(float) overload, overload resolution comes into play. int can be implicitly converted to either float or double, but float is preferred, so the recompiled code would call Math.Sqrt(float), which can cause an observable difference in results for certain inputs.

@ghost
Copy link

ghost commented Jan 30, 2018

So, what about Math.SqrtF instead of MathF.Sqrt?

@tannergooding
Copy link
Member Author

See response in the other thread on CoreFX.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants