Stop treating sine/cosine/round as target intrinsics on x86. #7023

pgavlin · 2016-09-01T13:05:47Z

These intrinsics were supported by JIT32 using the corresponding
x87 instructions. There are a few downsides to doing the same in
RyuJIT, however, because it uses SSE for all other floating-point
math:

The mix of precisions used by the combination of SSE + x87
instructions matches neither the JIT32 nor the RyuJIT/x64
behavior.
Using the x87 instructions is more expensive, as the result of
the instruction may need to be moved via memory to an SSE
register.
The RA would need to be udpated in order to better understand
that each x87 instruction may require a spill temp in order to
retrieve the result.

Issue #7097 tracks investigating a custom calling convention for
the appropriate helper calls if necessasry.

pgavlin · 2016-09-01T13:06:44Z

@dotnet/jit-contrib @CarolEidt @BruceForstall PTAL

CarolEidt · 2016-09-01T15:28:44Z

src/jit/lsra.cpp

@@ -2641,6 +2644,26 @@ regMaskTP LinearScan::getKillSetForNode(GenTree* tree)
            }
            break;

+#ifdef _TARGET_X86_
+        case GT_INTRINSIC:
+            if (compiler->compFloatingPointUsed)


Minor question - why did you add this check; does it really add value?

This is the same check performed above at line 2603: https://github.com/dotnet/coreclr/pull/7023/files#diff-93fb9832a03a9783e2a415b581fcb45cL2603. I don't think it's necessary, so I'm happy to remove it.

It's your call; I don't feel strongly about it.

CarolEidt · 2016-09-01T15:36:31Z

mikedn · 2016-09-01T16:19:27Z

Wouldn't be better to just do what x64 is doing (helper calls)? What we'll get now is neither the old JIT32 bit behavior (due to the use of SSE in the new JIT32) nor the x64 behavior (due to the use of x87 intrinsics in the new JIT32). Now that both x86 and x64 are using SSE it would be better to use the same transcendental math helpers to avoid precision differences between platforms.

pgavlin · 2016-09-01T16:47:42Z

@mikedn: that's a fair point. Depending on the calling convention for the helpers, we may even get better codegen out of them. @dotnet/jit-contrib thoughts?

CarolEidt · 2016-09-01T17:07:09Z

I think that this approach is the right one to get us unblocked on these.
I think (though I'm not really sure) that enabling the helpers requires some configuration changes in the VM. That said, we need to decide whether we are looking for more compatibility with the 32-bit desktop version of .NET or the 64-bit (desktop or coreclr) version. I think the latter provides better long-term compatibility, so IMO we should consider moving in that direction.
Also, I don't think we have an SSE-based calling convention for 32-bit, even for helper calls. That's something we'd want to add (probably for other reasons as well). Pat - perhaps you could open an issue to track this?

tannergooding · 2016-09-01T18:02:52Z

There were some decent perf gains when I modified the x64 calls to use the CRT implementation of 'cos', 'sin', and 'tan' (#4847), this also resulted in no loss of precision for the existing tests. This was because all of the existing tests were only checking to within a few decimal places (9 IIRC). I ended up adding a number of new tests that improve the precision checks in the tests and coverage overall.

These same perf gains would likely be seen if brought over to the x86 platform and should also not result in any unreasonable loss in precision (and in some cases, may result in better precision due to limitations of the fsin, fcos, and ftan instructions).

sivarv · 2016-09-01T19:27:18Z

Depending on the calling convention for the helpers, we may even get better codegen out of them.

If I understand correctly, we need to bring in new CRT helpers that use SSE calling convention to x86 and implement VM side helpers that in turn calls into CRT helpers. This needs to be done so that VM supports both Legacy Jit32 and RyuJIt 32.

If we think that using SSE-based helpers is the long-term direction and using x87 instructions for now is to unblock us, there is an easier option to unblock ourselves functionally: not to treat sin/cos as intrinsics on RyuJIT x86 (by making changes to importer.cpp::IsTargetIntrinsic()). Obviously this will have poor code quality compared to emitting x87 instructions.

Another question is do we think that SSE-based sin/cos helpers a must for shipping RyuJIT x86 1.0 in coreclr? If yes, we can use the above mentioned quick option to functionally unblock us.

sivarv · 2016-09-01T20:08:37Z

src/jit/codegenxarch.cpp

+
+    // Load the source onto the x87 stack.
+    genConsumeReg(source);
+    genFld(source);


Here is a clean and easy way to achieve what genFld() does plus it also allows us to optimize for the case when the operand is a memory-op:

Compiler class declares two members to hold local numbers, these are initialized to BAD_VAR_NUM
floatTemp and doubleTemp

TreeNodeInfoInitIntrinsic() - in case of cos/sin/round intrinsics:
if the operand is a memory-op, mark it as contained, otherwise create a temp of float/double depending on target type of GT_INTRINSIC node.

e.g. if (floatTemp == BAD_VAR_NUM) { floatTemp = comp->lvaGrabTemp(...) // initialize lv type, lvRefCnt, lvDoNotEnregister // set lvaSortAgain to true }

genIntrinsic() - in case of sin/cos/round, if the operand is not a contained memor-op, use floatTemp/doubleTemp to store the value of the operand to memory and to copy to x87 stack from memory. If it is a contained memory op, we can use it directly.

If it is a contained memory op, we can use it directly.

How? I tried to go down this road but could not find an appropriate emit instruction that to generate an fld with a memory operand that was not on the stack.

Even if there is a straightforward way to do this, I'm not sure it's worth the effort unless we stick with this solution long-term.

pgavlin · 2016-09-07T17:19:47Z

@dotnet/jit-contrib the test failures are due to instances of IR like the following:

                                                 /--*  t237   double arg0
N107 ( 21,  8) [000224] --C-G-------      t224 = *  call      double System.Math.Exp

N109 (  3,  2) [000060] ------------       t60 =    lclVar    double V03 loc3         u:3 (last use) REG NA $401

                                                 /--*  t60    double 
N111 ( 39,  6) [000061] ------------       t61 = *  intrinsic double sin REG NA $482

                                                 /--*  t224   double 
                                                 +--*  t61    double 
N113 ( 65, 18) [000062] --C-G-------       t62 = *  *         double REG NA $402

In this case, the result of the call and the result of the intrinsic are live concurrently, and each needs a spill temp for the transition from the x87 stack. In this case, the max number of concurrent TYP_DOUBLE spills will be incorrect, since the spills for FP-typed calls and intrinsics are not treated in a first class fashion by the RA.

I am pretty certain that we cannot end up in a similar situation with calls. If the results of two calls are concurrently live on x86, the execution of the later call will have killed the register in which the result of the earlier call lives (if any). The RA will then spill the earlier result as part of its normal operation. Later, the RA will record one additional spill that will be used to move the result of the later call off of the FP stack, and the number of concurrent spills will be correct.

As a result, I'm of the opinion that solving the actual problem here is not worth it, and I intend to switch this PR to simply stop treating sine/cosine/round as intrinsics on RyuJIT-x86.

These intrinsics were supported by JIT32 using the corresponding x87 instructions. There are a few downsides to doing the same in RyuJIT, however, because it uses SSE for all other floating-point math: - The mix of precisions used by the combination of SSE + x87 instructions matches neither the JIT32 nor the RyuJIT/x64 behavior. - Using the x87 instructions is more expensive, as the result of the instruction may need to be moved via memory to an SSE register. - The RA would need to be udpated in order to better understand that each x87 instruction may require a spill temp in order to retrieve the result. Issue #7097 tracks investigating a custom calling convention for the appropriate helper calls if necessasry.

pgavlin · 2016-09-07T18:56:59Z

The OSX failure appears to be an infrastructure issue: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/checked_osx_prtest/5564/consoleFull#-6271423686ee26338-6221-4f7c-8c8f-1d1d5cb4704a

cc @mmitche

mmitche · 2016-09-07T19:00:20Z

Taking a look, thanks.

pgavlin · 2016-09-07T19:15:12Z

@dotnet-bot test OSX x64 Checked Build and Test

pgavlin · 2016-09-07T21:07:21Z

@dotnet/jit-contrib Ping. This change is now green.

sivarv · 2016-09-07T21:11:52Z

src/jit/importer.cpp

@@ -17895,7 +17895,7 @@ void Compiler::impMarkInlineCandidate(GenTreePtr             callNode,

 bool Compiler::IsTargetIntrinsic(CorInfoIntrinsics intrinsicId)
 {
-#if defined(_TARGET_AMD64_)
+#if defined(_TARGET_AMD64_) || (defined(_TARGET_X86_) && !defined(LEGACY_BACKEND))


Since this is not a long term solution, please add a TODO comment.

CarolEidt · 2016-09-07T21:12:41Z

LGTM

sivarv · 2016-09-07T21:12:44Z

Stop treating sine/cosine/round as target intrinsics on x86. Commit migrated from dotnet/coreclr@ddc3d67

pgavlin added area-CodeGen arch-x86 labels Sep 1, 2016

dnfclas added the cla-already-signed label Sep 1, 2016

CarolEidt reviewed Sep 1, 2016
View reviewed changes

sivarv reviewed Sep 1, 2016
View reviewed changes

pgavlin force-pushed the X87MathIntrinsics branch from a7e7d30 to 72e2eb5 Compare September 1, 2016 20:43

pgavlin force-pushed the X87MathIntrinsics branch from 72e2eb5 to 32f6613 Compare September 7, 2016 18:04

pgavlin changed the title ~~Implement the x87 math intrinsics for x86.~~ Stop treating sine/cosine/round as target intrinsics on x86. Sep 7, 2016

sivarv reviewed Sep 7, 2016
View reviewed changes

Link #7097 to the relevant code in ::IsTargetIntrinsic.

13e41a6

pgavlin merged commit ddc3d67 into dotnet:master Sep 7, 2016

pgavlin deleted the X87MathIntrinsics branch September 7, 2016 21:56

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/coreclr#7023 from pgavlin/X87MathIntrinsics

c6098c2

Stop treating sine/cosine/round as target intrinsics on x86. Commit migrated from dotnet/coreclr@ddc3d67

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop treating sine/cosine/round as target intrinsics on x86. #7023

Stop treating sine/cosine/round as target intrinsics on x86. #7023

pgavlin commented Sep 1, 2016 •

edited

pgavlin commented Sep 1, 2016

CarolEidt Sep 1, 2016

pgavlin Sep 1, 2016

CarolEidt Sep 1, 2016

CarolEidt commented Sep 1, 2016

mikedn commented Sep 1, 2016

pgavlin commented Sep 1, 2016

CarolEidt commented Sep 1, 2016

tannergooding commented Sep 1, 2016

sivarv commented Sep 1, 2016 •

edited

sivarv Sep 1, 2016

pgavlin Sep 1, 2016

pgavlin commented Sep 7, 2016

pgavlin commented Sep 7, 2016

mmitche commented Sep 7, 2016

pgavlin commented Sep 7, 2016

pgavlin commented Sep 7, 2016

sivarv Sep 7, 2016

CarolEidt commented Sep 7, 2016

sivarv commented Sep 7, 2016

Stop treating sine/cosine/round as target intrinsics on x86. #7023

Stop treating sine/cosine/round as target intrinsics on x86. #7023

Conversation

pgavlin commented Sep 1, 2016 • edited

pgavlin commented Sep 1, 2016

CarolEidt Sep 1, 2016

Choose a reason for hiding this comment

pgavlin Sep 1, 2016

Choose a reason for hiding this comment

CarolEidt Sep 1, 2016

Choose a reason for hiding this comment

CarolEidt commented Sep 1, 2016

mikedn commented Sep 1, 2016

pgavlin commented Sep 1, 2016

CarolEidt commented Sep 1, 2016

tannergooding commented Sep 1, 2016

sivarv commented Sep 1, 2016 • edited

sivarv Sep 1, 2016

Choose a reason for hiding this comment

pgavlin Sep 1, 2016

Choose a reason for hiding this comment

pgavlin commented Sep 7, 2016

pgavlin commented Sep 7, 2016

mmitche commented Sep 7, 2016

pgavlin commented Sep 7, 2016

pgavlin commented Sep 7, 2016

sivarv Sep 7, 2016

Choose a reason for hiding this comment

CarolEidt commented Sep 7, 2016

sivarv commented Sep 7, 2016

pgavlin commented Sep 1, 2016 •

edited

sivarv commented Sep 1, 2016 •

edited