Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

AndyAyersMS
Copy link
Member

Repro case for #18582 and #23309.

@AndyAyersMS
Copy link
Member Author

@sandreenko PTAL
cc @dotnet/jit-contrib

Will propose merging this to 2.2 along with the fix, to address #23309.

Copy link

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{
int z = s_x;
Consume(
q(), q(), q(), q(), q(), q(), q(), q(), q(), q(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these q() threated as late arguments? Is it because they are calls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first 839 calls to q() are spilled to temps by the importer. But that last call to q() is not spilled to a temp, and this causes all the previous args to become late args.

[005082] ------------              *  STMT      void  (IL   ???...  ???)
[004231] --C-G-------              \--*  CALL      void   P.Consume
[000010] ------------ arg0            +--*  LCL_VAR   int    V02 tmp1         
[000015] ------------ arg1            +--*  LCL_VAR   int    V03 tmp2         
[000020] ------------ arg2            +--*  LCL_VAR   int    V04 tmp3         
...
[004200] ------------ arg838          +--*  LCL_VAR   int    V840 tmp839      
[004196] --C-G------- arg839          +--*  CALL      int    P.q
[004202] ------------                 |  /--*  CNS_INT   int    1
[004203] ------------ arg840          +--*  ADD       int   
[004201] ------------                 |  \--*  LCL_VAR   int    V00 loc0         
[004205] ------------                 |  /--*  CNS_INT   int    1
...
Upping fgPtrArgCntMax from 835 to 846
argSlots=850, preallocatedArgCount=850, nextSlotNum=850, outgoingArgSpaceSize=6800

Sorting the arguments:
Argument with 'side effect'...
               [004196] --CXG+------              *  CALL      int    P.q

lvaGrabTemp returning 841 (V841 tmp840) called for argument with side effect.

  Evaluate to a temp:
               [004196] --CXG+------              /--*  CALL      int    P.q
               [005086] -ACXG-----L-              *  ASG       int   
               [005085] D------N----              \--*  LCL_VAR   int    V841 tmp840      

...
Shuffled argument table:    rdx r8 r9 rcx 
fgArgTabEntry[arg 839 5087.LCL_VAR, numSlots=1, slotNum=839, align=1, lateArgInx=0, tmpNum=V841, isTmp, processed]
fgArgTabEntry[arg 841 4206.ADD, numSlots=1, slotNum=841, align=1, processed]
fgArgTabEntry[arg 842 4209.ADD, numSlots=1, slotNum=842, align=1, processed]
fgArgTabEntry[arg 843 4212.ADD, numSlots=1, slotNum=843, align=1, processed]
fgArgTabEntry[arg 844 4215.ADD, numSlots=1, slotNum=844, align=1, processed]
fgArgTabEntry[arg 845 4218.ADD, numSlots=1, slotNum=845, align=1, processed]
fgArgTabEntry[arg 846 4221.ADD, numSlots=1, slotNum=846, align=1, processed]
fgArgTabEntry[arg 847 4224.ADD, numSlots=1, slotNum=847, align=1, processed]
fgArgTabEntry[arg 848 4227.ADD, numSlots=1, slotNum=848, align=1, processed]
fgArgTabEntry[arg 849 4230.ADD, numSlots=1, slotNum=849, align=1, processed]
fgArgTabEntry[arg 840 4203.ADD, numSlots=1, slotNum=840, align=1, processed]
fgArgTabEntry[arg 1 15.LCL_VAR, rdx, regs=1, align=1, lateArgInx=1, processed]
fgArgTabEntry[arg 2 20.LCL_VAR, r8, regs=1, align=1, lateArgInx=2, processed]
fgArgTabEntry[arg 3 25.LCL_VAR, r9, regs=1, align=1, lateArgInx=3, processed]
fgArgTabEntry[arg 4 30.LCL_VAR, numSlots=1, slotNum=4, align=1, lateArgInx=4, needPlace, processed]
fgArgTabEntry[arg 5 35.LCL_VAR, numSlots=1, slotNum=5, align=1, lateArgInx=5, needPlace, processed]
fgArgTabEntry[arg 6 40.LCL_VAR, numSlots=1, slotNum=6, align=1, lateArgInx=6, needPlace, processed]
...
fgArgTabEntry[arg 837 4195.LCL_VAR, numSlots=1, slotNum=837, align=1, lateArgInx=837, needPlace, processed]
fgArgTabEntry[arg 838 4200.LCL_VAR, numSlots=1, slotNum=838, align=1, lateArgInx=838, needPlace, processed]
fgArgTabEntry[arg 0 10.LCL_VAR, rcx, regs=1, align=1, lateArgInx=839, processed]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that last call to q() is not spilled to a temp

Why is not it spilled? It is what I was not able to reach when tried to create a repro test for the original PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we process a call we spill the stack unless the call returns void or is one of a few special methods. So we spill before the last q() but not before Consume(...).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you.

@mikedn
Copy link

mikedn commented Mar 19, 2019

I checked what loop hoisting does with this example. It does indeed go into deep recursion but there's no stack overflow because the frame size is lower (~200 bytes in a checked build). Anyway I happen to have a PR with a loop hoisiting fix and that also takes care of recursion.

@AndyAyersMS AndyAyersMS merged commit 69ec646 into dotnet:master Mar 20, 2019
@AndyAyersMS AndyAyersMS deleted the ReproFor18582 branch March 20, 2019 16:17
sandreenko pushed a commit to sandreenko/coreclr that referenced this pull request Mar 20, 2019
@sandreenko
Copy link

This test failed on OSX with 134 exit code. Log should be available here, but I do not trust azure links.

If the link doesn't work you can see the results from scheduled stress runs and then click "Test for PollingCounter (#23257)" -> "Test Pri1 OSX x64 checked Job" -> Result will be available ...

Probably OSX has lower stack size and hits overflow on loop hoisting that @mikedn described.
@AndyAyersMS could you please take a look?

@AndyAyersMS
Copy link
Member Author

Hmm, the stack size is specified by the test. Let me dig in.

@sandreenko
Copy link

Hmm, the stack size is specified by the test. Let me dig in.

I mean OSX default stack size limit is 512Kb, when on Ubuntu it is 2Mb, on Windows 10 it is 1Mb if I remember correctly.

@AndyAyersMS
Copy link
Member Author

        Thread t = new Thread(Test, 512 * 1024);

should be asking for a 512KB stack segment, regardless of OS defaults.

@sandreenko
Copy link

I see, I have missed that. Thanks.

@mikedn
Copy link

mikedn commented Mar 22, 2019

Can't be loop hoisting because the actual test doesn't have a loop. I actually modified the test code to check what happens if loop hoisting hits such a tree.

@AndyAyersMS
Copy link
Member Author

Issue on OSX is that Test gets selected for cloning stress, and gtCloneExpr does the same sort of recursive traversal for GT_LIST.

Will mark this test as jit optimization sensitive.

AndyAyersMS added a commit to AndyAyersMS/coreclr that referenced this pull request Mar 25, 2019
Jitstress on OSX will turn on cloning stress in `Test` and cause stack overflows
cloning the large `GT_LIST` subtree of the call.

See issue dotnet#23346.
AndyAyersMS added a commit that referenced this pull request Mar 25, 2019
Jitstress on OSX will turn on cloning stress in `Test` and cause stack overflows
cloning the large `GT_LIST` subtree of the call.

See issue #23346.
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Jitstress on OSX will turn on cloning stress in `Test` and cause stack overflows
cloning the large `GT_LIST` subtree of the call.

See issue dotnet/coreclr#23346.

Commit migrated from dotnet/coreclr@ca2eec2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants