Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted struct passed to delegate constructed via reflection #16833

Closed
Ninds opened this issue Mar 8, 2018 · 20 comments

Comments

@Ninds
Copy link

commented Mar 8, 2018

When a CancellationToken is checked for cancellation .NET CORE 2.0 & 2.1 crashes on Debian and Redhat.

@Ninds

This comment has been minimized.

Copy link
Author

commented Mar 8, 2018

@Drawaes You can probably add a few more, more useful comments

@stephentoub

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

@Ninds, please share a repro; we have a ton of code, both product and test, that checks CancellationTokens for cancellation and they do not crash.

@Ninds

This comment has been minimized.

Copy link
Author

commented Mar 8, 2018

@stephentoub Sent my email as suggested by @Drawaes.

@stephentoub

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

@Ninds, thank you. I've created an isolated repro. I don't believe this issue is specific to CancellationToken, but rather there's some kind of corruption happening due to the reflection.

using System;
using System.Threading;

class Program
{
    public static void Main()
    {
        var func = (Func<object, object, object, object, object, CancellationToken, bool, object>)Delegate.CreateDelegate(
            typeof(Func<object, object, object, object, object, CancellationToken, bool, object>),
            typeof(Program).GetMethod(nameof(MyMethod)));
        func(new object(), new object(), new object(), new object(), new object(), new CancellationTokenSource().Token, true);
    }

    public static object MyMethod(
        object obj1,
        object obj2,
        object obj3,
        object obj4,
        object obj5,
        CancellationToken cancellationToken,
        bool boolean)
    {
        Console.WriteLine(cancellationToken.IsCancellationRequested);
        return "hello";
    }
}
@stephentoub

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

cc: @janvorli

@stephentoub

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

And without CancellationToken:

using System;

class Program
{
    public static void Main()
    {
        var func = (Func<object, object, object, object, object, WrapperStruct, bool, object>)Delegate.CreateDelegate(
            typeof(Func<object, object, object, object, object, WrapperStruct, bool, object>),
            typeof(Program).GetMethod(nameof(MyMethod)));
        func(new object(), new object(), new object(), new object(), new object(), new WrapperStruct { Class = new WrappedClass() }, true);
    }

    public static object MyMethod(
        object obj1,
        object obj2,
        object obj3,
        object obj4,
        object obj5,
        WrapperStruct s,
        bool boolean)
    {
        Console.WriteLine(s.Class != null && s.Class.Value);
        return "hello";
    }
}

public class WrappedClass { public bool Value; }

public struct WrapperStruct { public WrappedClass Class; }
@stephentoub stephentoub changed the title Crash on Linux with CancellationToken Corrupted struct passed via reflection Mar 8, 2018
@stephentoub stephentoub changed the title Corrupted struct passed via reflection Corrupted struct passed to delegate constructed via reflection Mar 8, 2018
@stephentoub stephentoub added the bug label Mar 8, 2018
@stephentoub stephentoub added this to the 2.1.0 milestone Mar 8, 2018
@Drawaes

This comment has been minimized.

Copy link

commented Mar 8, 2018

Extra note as far as I can tell this only occurs on Linux and not windows.

@svick

This comment has been minimized.

Copy link
Contributor

commented Mar 8, 2018

It seems the corruption comes from putting the value of the following parameter into the previous parameter. You can see that in the following variant of the code:

using System;

class Program
{
    static void Main()
    {
        var func = (Action<object, object, object, object, object, WrapperStruct, int>)Delegate.CreateDelegate(
            typeof(Action<object, object, object, object, object, WrapperStruct, int>),
            typeof(Program).GetMethod(nameof(MyMethod)));
        func(null, null, null, null, null, new WrapperStruct(), 42);
    }

    public static void MyMethod(
        object obj1,
        object obj2,
        object obj3,
        object obj4,
        object obj5,
        WrapperStruct s,
        int i)
    {
        Console.WriteLine(s.Value); // prints 42 on Linux, 0 on Windows
    }
}

public struct WrapperStruct { public int Value; }
@janvorli

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

I've looked into it and found the culprit. There is a bug in the generated shuffle thunk code. This shuffle thunk is called when the delegate is invoked and it shifts arguments in registers and on stack by one before jumping to the target function. In our case, the MyMethod has parameters in RDI, RSI, RDX, RCX, R8, R9 and one more on stack. The thunk in this case overwrites the first stack argument before it moves it to R9:

    0x7fff7c0d3040: mov    rax, rsp
    0x7fff7c0d3043: mov    r11, rdi
    0x7fff7c0d3046: mov    rdi, rsi
    0x7fff7c0d3049: mov    rsi, rdx
    0x7fff7c0d304c: mov    rdx, rcx
    0x7fff7c0d304f: mov    rcx, r8
    0x7fff7c0d3052: mov    r8, r9
    0x7fff7c0d3055: mov    r10, qword ptr [rax + 0x10]
    0x7fff7c0d3059: mov    qword ptr [rax + 0x8], r10 ; this overwrites the cancellationToken at [rax+0x8]
    0x7fff7c0d305d: mov    r9, qword ptr [rax + 0x8] ; and now we load the corrupted value instead of the original one
    0x7fff7c0d3061: mov    r10, qword ptr [r11 + 0x20]
    0x7fff7c0d3065: add    r11, 0x20
    0x7fff7c0d3069: jmp    r10
@janvorli

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

I am going to work on a fix.

@Ninds

This comment has been minimized.

Copy link
Author

commented Mar 8, 2018

...so why do we only see this on Linux and not on Windows?

@Drawaes

This comment has been minimized.

Copy link

commented Mar 8, 2018

Because the calling convention is different on Linux and windows ... But I am Sure others can give a more in depth response ;)

@janvorli

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

Because the shuffle thunk generation is different on Unix x64 due to the difference in calling convention. The Unix convention allows passing structs in upto two registers, but it can never split a struct to be half in registers and half on stack. On Windows, you can only pass struct in single register.
This makes the shuffling more complex on Unix x64. Unlike on Windows, the arguments don't fill available registers and then follow up on stack in the same order as they are in the argument list.
One example to make it clearer.

void fn1(int a, int b, int c, int d, S s, int e)
a -> RDI, b -> RSI, c -> RDX, d -> RCX, s -> R8 and R9, e -> stack[0]
void fn2(int a, int b, int c, int d, int e, S s, int f)
a -> RDI, b -> RSI, c -> RDX, d -> RCX, e -> R8, s -> stack[0] and stack[1], f -> R9

As you can see that the "s" in the fn2 needed to be passed on stack, but the next argument could go to a register again.

@Drawaes

This comment has been minimized.

Copy link

commented Mar 20, 2018

/cc @janvorli Any chance of this being patched in 2.0x ? Its an issue that was found in that and is causing problems that we would rather resolve without having to wait/go through 2.1 updates and testing?

@janvorli

This comment has been minimized.

Copy link
Member

commented Mar 20, 2018

@Petermarcu what do you think about porting the fix?

@Petermarcu

This comment has been minimized.

Copy link
Member

commented Mar 22, 2018

I think we can take it in the next available train. Ask @leecow which one makes sense.

@leecow

This comment has been minimized.

Copy link
Member

commented Mar 22, 2018

May Update will be the next opportunity.

@Ninds

This comment has been minimized.

Copy link
Author

commented Mar 27, 2018

How is it possible to know when the fix makes it into the daily snapshot of the SDK ?
I have just tried 2.1.300-preview3-008414 and it doesn't seem to be in there.

@janvorli

This comment has been minimized.

Copy link
Member

commented Mar 27, 2018

@Ninds I've just checked the commit hash of the libcoreclr.so in the 2.1.300-preview3-008414. It was build from about 19 days old state of coreclr while my fix came in 14 days ago.

strings libcoreclr.so |  grep "@(#)"

@(#)Version 4.6.26310.01 @BuiltBy: root-bfda0172ad36 @Branch: HEAD @SrcCode: https://github.com/dotnet/coreclr/tree/c3aef92c137658cebb111da6c856ab653316fb0a

If you open the link reported by that command and then click on the "commits" link, you'll see which commits are part of the specific libcoreclr.so. The most recent commit is on the top of the list.

@Ninds

This comment has been minimized.

Copy link
Author

commented Apr 10, 2018

I have just tested this fix against 2.1.300-preview3-008443 and the bug doesn't seem to be there anymore.

Thanks !
N

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.