Fix shuffling thunk for Unix AMD64#16904
Conversation
|
FYI: @stephentoub |
d002620 to
0ec5e3a
Compare
|
I am having hard time convincing myself that this works for all corner cases. It probably does, but it is hard to see. Also, iterating the signature twice is not particularly efficient. Would it be better and simpler to build the pShuffleEntryArray in regular way, and then apply topological sort on it once we are done? It does not have to be particularly smart because of the array is going to be mostly sorted already. E.g. I think it can look like this: |
|
The algorithm I've implemented is very simple in what it does. Let me try to explain in a few words. When it wants to move a register or stack slot, it checks whether the target register or stack slot was already moved away. If yes, it just performs the move, since the slot is guaranteed to be free. If not, it postpones the move until the target slot is freed. That's all. I don't see any corner cases that this could handle in a wrong way. The only part that may look more complex is the fact that this can naturally create a chain of postponed moves, hence the loop that adds the postponed moves once it finds that the move it has added enables a postponed move to happen. Regarding the double iteration over the arguments, that's a bit unfortunate, but do you think that this function is a performance bottleneck? |
|
I understand that it is simple in principle, but it is not easy to read and understand because of it is interleaved with the other code and ifdefs, the special cases for return buffer, ... . I think it would be a lot easier to understand if all the code related to the topological sort is together and fits onto a single screen. I do not think that this function is performance bottleneck. I just mentioned it as a side-benefit. |
The shufflign thunk was generated incorrectly for some edge cases when a struct was passed in a register or a pair of registers in the destination, but on stack in the source. This change implements a new algorithm that ensures that argument slots are never overwritten before their current value is moved out. It also adds an extensive regression test that checks various interesting combinations of arguments that were causing issues before.
0ec5e3a to
bf33bec
Compare
|
@jkotas ok, I've changed it to the way you've suggested and amended the commit. While I could cleanup the stuff I had to the point where all the interleaving of ifdefs is gone and all relevant pieces of code are single screen size, the fact that my way needs to iterate over the arguments even in cases when it is basically useless, which will happen in most of the cases made me change my mind. |
| ArgLocDesc sArgDst; | ||
|
|
||
| #if defined(UNIX_AMD64_ABI) && defined(FEATURE_UNIX_AMD64_STRUCT_PASSING) | ||
| int argSlots = NUM_FLOAT_ARGUMENT_REGISTERS + NUM_ARGUMENT_REGISTERS + sArgPlacerSrc.SizeOfArgStack() / sizeof(size_t); |
There was a problem hiding this comment.
Nit: Can this be inside the big ifdef below?
There was a problem hiding this comment.
Unfortunately it cannot. The sArgPlacerSrc.SizeOfArgStack() cannot be called after the iteration has finished. It asserts due to that if I try to call it at that place, that's why I had to move it here.
The shufflign thunk was generated incorrectly for some edge cases when
a struct was passed in a register or a pair of registers in the
destination, but on stack in the source.
That resulted in corruption of delegate argument in those cases.
This change implements a new algorithm that ensures that argument slots
are never overwritten before their current value is moved out.
It also adds an extensive regression test that checks various
interesting combinations of arguments that were causing issues before.
Close #16833