-
Notifications
You must be signed in to change notification settings - Fork 153
Reader: implement tail calls #191
Comments
@richardlford would be nice to at least get the first layer of this in place and drop back to normal calls in most cases. This should unblock a lot of methods in our testing. |
Started looking into this. First step is to just pass on tail call opportunities without the ‘tail’ prefix, since the jit is not obligated to make these into tail calls. Internally this means trying to handle cases that are isTailCalI() && !isUnmarkedTailCall() as normal calls. This covers (as far as I can tell) all of the cases from C#, which does not seem to use the tail prefix. This unblocks a lot of methods, but it appears we generate incorrect code for some of them. Every test fails with a runtime error, and at the point of failure there are no managed frames on the stack. So it seems like we are corrupting memory or something similar. I’ve isolated System.Buffer.Memmove as the first method that causes issues, so will dig into that next.... |
Memmove does a lot of case analysis internally. It also gets called a repeatedly. I verified in the debugger that the first 20 calls or so seem to work as expected, but there are many more. A lot of these calls are from code that manipulates file paths during CLR initialization so the failure mode may be sensitive to the length of the path to CoreCLR.exe and similar things. It seems plausible that for some particular length, things go bad and memmove ends up corrupting memory somewhere. I'll set the debugger to capture the different length values that appear and perhaps by focusing on "unusual" values we can pin down which particular calls are worth looking at more closely. I also will try verifying heap integrity after each call to see if that can spot corruption. |
There are 739 calls to Memmove and 73 distinct length values for copies. I've verified that the heap is still in good shape before each call and at the end when the process is failing. So if the problem is memory corruption the corruption is not happening on the GC heap anywhere. The memmove code special cases short (16 bytes or less) moves and very long ones (512 bytes or more). The short cases we see are 2, 10, and 14 bytes, so I'll look at those and verify they work ok. The long cases fall back to pinvokes and those seem to be ok but I'll double check that again too. |
Because of the way memmove is structured, you can force it (via debugger breakin altering RIP) to always follow the pinvoke path. That works. So I made this conditional and have been binary searching to see which call to memmove causes problems. For my repro case I have it narrowed down to the range 701, 712 -- that is, if the first 700 calls just run the normal code, everything is ok; if the first 712 use the normal code, the app fails. The windbg breakpoint command I'm using is something like this:
Here $t0 pseudo-var counts the number of hits; once it gets past some value, the debugger modifies RIP to force the code down the pinvoke path. I do this at the first conditional branch in memmove, where it is checking for overlap src and dst. |
In my repro call #708 goes bad. This is copying c2 (194) bytes to an address that ends in 4. So the code first stores 4 bytes to get the destination aligned, then does 11 rounds of storing two qwords (176 bytes in all, 180 including the initial 4 byte write). This leaves 14 bytes, which it stores as 8, 4, and 2. That last write of 2 however, actually writes 4 bytes. So this corrupts the memory just beyond the buffer. The LLVM IR clearly shows this:
|
%370 = sext i16 %369 to i16 < not needed |
Bug was that storePrimitiveType was using the stack type, not the CorInfoType. Have a fix. |
so we need this->getType(CorInfoType, NULL) instead of Value->getType()? Probably this is my bad. |
Basically, yeah. When values are pushed on the stack they widen to i32 or i64, so you can't use the stack type to figure out how many bytes to store. |
Commit e3efa38 is in -- we now should only fail when there are explicit tall calls. Down the road we can implement the optimization to opportunistically handle the unmarked tail calls more efficiently. |
Note LLVM has tail call annotations on call instructions, so once we get more of the legality checking implemented in the reader we can use this to try and convince downstream phases to actually do the transformation. There are both "may" and "must" forms. |
We now hit these in lots of CoreCLR tests. Most of them in JIT\Methodical\tailcall |
Looks like the existing LLVM tail annotations aren't enough for implementing tail.call. |
@erozenfeld it might be worth looking into what ghc does here. |
It turns out that tail call optimization is guaranteed for their calling convention as well:
|
So called sibling call optimization may be good enough for us. Sibling call optimization is a restricted form of tail call optimization. Sibling call optimization is currently performed on x86/x86-64 when the
|
@russellhadley please reassign as appropriate |
@AndyAyersMS Assigning since this overlaps with the tailcall work you're doing now. |
Have returned to this and have it mostly coded up -- debugging now. We classify calls as tail/notail in the reader, let LLVM do the sibling call opt where it can. I can see it kicking in, which is cool. Must still be missing a legality check though. |
Found a bug in LLVM -- when it goes to readjust the stack in the "epilog" it calls
Worked around this but there are still more issues to sort through.... |
Issues fixed, things looking good. Will have to push the LLVM change out first though. |
LLVM fix is out for review |
With that fix and my changes, I can pass the local windows tests. Still need to see how this interacts with EH and GC, and try it on Linux... |
LLVM fix is now in the main LLVM tree. Now it needs to show up in our MS branch... |
Closes dotnet#191. Add the `tail` modifier to calls that satisfy correctness checks for tail calls (either explicit or implicit). This will enable LLVM to perform "sibling" call optimizations during lowering. Do a very simplistic tracking of address-taken locals and args to screen out implicit tail call candidates. Do likewise with localloc. Both these need to be detected during the reader's first pass, so add suitable asserts. Also start tracking unsafe locals, since they (and localloc) will inspire stack security protection checking (aka /GS) and will inhibit tail calls. We still don't actually do the checks (see dotnet#353). Add logic to the ABI classifier so we can detect if we're introducing on-stack references for call lowerings and avoid tail calling in those cases too. This can also be made smarter (eg we might be able to copy-prop and use the values passed by the caller). Have the jit options determine if the jit is optimizing and use the option setting in the code rather than checking the flags. Remove existing NYI for explicit tail calls. Verified by hand that the excluded tail call tests all now pass and seem to get most of the tail call cases the tests intended, so unexcluded them. Most "normal" bring up tests will create around 150 tail call sites so I think the new codegen gets pretty well tested. Also verified all this works with EH enabled and GC enabled.
Closes dotnet#191. Add the `tail` modifier to calls that satisfy correctness checks for tail calls (either explicit or implicit). This will enable LLVM to perform "sibling" call optimizations during lowering. Do a very simplistic tracking of address-taken locals and args to screen out implicit tail call candidates. Do likewise with localloc. Both these need to be detected during the reader's first pass, so add suitable asserts. Also start tracking unsafe locals, since they (and localloc) will inspire stack security protection checking (aka /GS) and will inhibit tail calls. We still don't actually do the checks (see dotnet#353). Add logic to the ABI classifier so we can detect if we're introducing on-stack references for call lowerings and avoid tail calling in those cases too. This can also be made smarter (eg we might be able to copy-prop and use the values passed by the caller). Have the jit options determine if the jit is optimizing and use the option setting in the code rather than checking the flags. Remove existing NYI for explicit tail calls. Verified by hand that the excluded tail call tests all now pass and seem to get most of the tail call cases the tests intended, so unexcluded them. Most "normal" bring up tests will create around 150 tail call sites so I think the new codegen gets pretty well tested. Also verified all this works with EH enabled and GC enabled.
No description provided.
The text was updated successfully, but these errors were encountered: