Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shuffling thunk for Unix AMD64 #16904

Merged
merged 1 commit into from Mar 13, 2018

Conversation

@janvorli
Copy link
Member

commented Mar 13, 2018

The shufflign thunk was generated incorrectly for some edge cases when
a struct was passed in a register or a pair of registers in the
destination, but on stack in the source.
That resulted in corruption of delegate argument in those cases.

This change implements a new algorithm that ensures that argument slots
are never overwritten before their current value is moved out.

It also adds an extensive regression test that checks various
interesting combinations of arguments that were causing issues before.

Close #16833

@janvorli janvorli added the area-VM label Mar 13, 2018
@janvorli janvorli added this to the 2.1.0 milestone Mar 13, 2018
@janvorli janvorli self-assigned this Mar 13, 2018
@janvorli janvorli requested a review from jkotas Mar 13, 2018
@janvorli

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2018

@janvorli janvorli force-pushed the janvorli:fix-shuffling-thunk branch from d002620 to 0ec5e3a Mar 13, 2018
@jkotas

This comment has been minimized.

Copy link
Member

commented Mar 13, 2018

I am having hard time convincing myself that this works for all corner cases. It probably does, but it is hard to see. Also, iterating the signature twice is not particularly efficient.

Would it be better and simpler to build the pShuffleEntryArray in regular way, and then apply topological sort on it once we are done?

It does not have to be particularly smart because of the array is going to be mostly sorted already. E.g. I think it can look like this:

    //
    // Topological sort of shuffle entries to ensure that the slots are not overwritten before they are moved.
    //
    int argSlots = NUM_FLOAT_ARGUMENT_REGISTERS + NUM_ARGUMENT_REGISTERS + sArgPlacerSrc.SizeOfArgStack() / sizeof(size_t);
    NewArrayHolder<bool> filledSlots = new bool[argSlots];
    bool reordered;
    do
    {
        reordered = false;

        for (int i = 0; i < argSlots; i++)
        {
            filledSlots[i] = false;
        }

        for (int i = 0; i < pShuffleEntryArray->GetCount(); i++)
        {
            entry = (*pShuffleEntryArray)[i];

            // If the slot that we are moving the argument to was filled in already, we need to move this entry in front 
            // of the entry that filled it in.
            if (filledSlots[GetNormalizedArgumentSlotIndex(entry.srcofs)])
            {
                for (int j = i; (*pShuffleEntryArray)[j].dstofs != entry.srcofs; j--)
                    (*pShuffleEntryArray)[j] = (*pShuffleEntryArray)[j - 1];
                (*pShuffleEntryArray)[j] = entry;
                reordered = true;
            }

            filledSlots[GetNormalizedArgumentSlotIndex(entry.dstofs)] = true;
        }
    }
    while (reordered);
@janvorli

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2018

The algorithm I've implemented is very simple in what it does. Let me try to explain in a few words. When it wants to move a register or stack slot, it checks whether the target register or stack slot was already moved away. If yes, it just performs the move, since the slot is guaranteed to be free. If not, it postpones the move until the target slot is freed. That's all. I don't see any corner cases that this could handle in a wrong way.

The only part that may look more complex is the fact that this can naturally create a chain of postponed moves, hence the loop that adds the postponed moves once it finds that the move it has added enables a postponed move to happen.

Regarding the double iteration over the arguments, that's a bit unfortunate, but do you think that this function is a performance bottleneck?

@jkotas

This comment has been minimized.

Copy link
Member

commented Mar 13, 2018

I understand that it is simple in principle, but it is not easy to read and understand because of it is interleaved with the other code and ifdefs, the special cases for return buffer, ... . I think it would be a lot easier to understand if all the code related to the topological sort is together and fits onto a single screen.

I do not think that this function is performance bottleneck. I just mentioned it as a side-benefit.

The shufflign thunk was generated incorrectly for some edge cases when
a struct was passed in a register or a pair of registers in the
destination, but on stack in the source.
This change implements a new algorithm that ensures that argument slots
are never overwritten before their current value is moved out.

It also adds an extensive regression test that checks various
interesting combinations of arguments that were causing issues before.
@janvorli janvorli force-pushed the janvorli:fix-shuffling-thunk branch from 0ec5e3a to bf33bec Mar 13, 2018
@janvorli

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2018

@jkotas ok, I've changed it to the way you've suggested and amended the commit. While I could cleanup the stuff I had to the point where all the interleaving of ifdefs is gone and all relevant pieces of code are single screen size, the fact that my way needs to iterate over the arguments even in cases when it is basically useless, which will happen in most of the cases made me change my mind.

@@ -352,6 +378,10 @@ VOID GenerateShuffleArray(MethodDesc* pInvoke, MethodDesc *pTargetMeth, SArray<S
ArgLocDesc sArgSrc;
ArgLocDesc sArgDst;

#if defined(UNIX_AMD64_ABI) && defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
int argSlots = NUM_FLOAT_ARGUMENT_REGISTERS + NUM_ARGUMENT_REGISTERS + sArgPlacerSrc.SizeOfArgStack() / sizeof(size_t);

This comment has been minimized.

Copy link
@jkotas

jkotas Mar 13, 2018

Member

Nit: Can this be inside the big ifdef below?

This comment has been minimized.

Copy link
@janvorli

janvorli Mar 13, 2018

Author Member

Unfortunately it cannot. The sArgPlacerSrc.SizeOfArgStack() cannot be called after the iteration has finished. It asserts due to that if I try to call it at that place, that's why I had to move it here.

This comment has been minimized.

Copy link
@jkotas
@jkotas
jkotas approved these changes Mar 13, 2018
Copy link
Member

left a comment

LGTM otherwise

@janvorli janvorli merged commit 41c0295 into dotnet:master Mar 13, 2018
15 of 18 checks passed
15 of 18 checks passed
Windows_NT arm Cross Checked Innerloop Build and Test Started.
Details
Windows_NT arm64 Cross Checked Innerloop Build and Test Started.
Details
Windows_NT armlb Cross Checked Innerloop Build and Test Started.
Details
Alpine.3.6 x64 Debug Build Build finished.
Details
CROSS Check Build finished.
Details
CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm64 Cross Debug Innerloop Build Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
WIP ready for review
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
license/cla All CLA requirements met.
Details
@janvorli janvorli deleted the janvorli:fix-shuffling-thunk branch Mar 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.