Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when running .NET Core 3.0 app on OpenVZ #27955

Open
Daniel15 opened this issue Nov 29, 2019 · 4 comments

Comments

@Daniel15
Copy link

@Daniel15 Daniel15 commented Nov 29, 2019

I have a .NET Core app that's a fairly basic gRPC service (unfortunately not open source yet, so I can't link to the source at the moment). On one particular server, it's throwing a segmentation fault as soon as I run it:

ASPNETCORE_ENVIRONMENT=Production ASPNETCORE_URLS=http://*:54561 ./TestApp
info: Microsoft.Hosting.Lifetime[0]
      Now listening on: http://[::]:54561
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /opt/exampleapp-worker
Segmentation fault

In fact, I'm seeing this for a very basic ASP.NET website too (just one custom middleware using app.Run).

Backtrace from lldb + SOS:

* thread #7, name = 'TestApp', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x00007ffff7019b3d libcoreclr.so`ThreadpoolMgr::GetRecycledMemory(ThreadpoolMgr::MemType) [inlined] ThreadpoolMgr::RecycledListInfo::Remove() at win32threadpool.h:655
    frame #1: 0x00007ffff7019aa5 libcoreclr.so`ThreadpoolMgr::GetRecycledMemory(memType=<unavailable>) at win32threadpool.cpp:1674
    frame #2: 0x00007ffff715e464 libcoreclr.so`UnManagedPerAppDomainTPCount::QueueUnmanagedWorkRequest(unsigned int (*)(void*), void*) [inlined] ThreadpoolMgr::MakeWorkRequest(unsigned int (*)(void*), void*) at win32threadpool.h:367
    frame #3: 0x00007ffff715e45a libcoreclr.so`UnManagedPerAppDomainTPCount::QueueUnmanagedWorkRequest(this=0x00007ffff75dd640, function=(libcoreclr.so`ThreadpoolMgr::AsyncTimerCallbackCompletion(void*) at win32threadpool.cpp:4765), context=0x0000000000738290)(void*), void*) at threadpoolrequest.cpp:356
    frame #4: 0x00007ffff701d100 libcoreclr.so`ThreadpoolMgr::FireTimers() at win32threadpool.cpp:855
    frame #5: 0x00007ffff701d088 libcoreclr.so`ThreadpoolMgr::FireTimers() at win32threadpool.cpp:4710
    frame #6: 0x00007ffff701cd41 libcoreclr.so`ThreadpoolMgr::TimerThreadFire() at win32threadpool.cpp:4598
    frame #7: 0x00007ffff701cc15 libcoreclr.so`ThreadpoolMgr::TimerThreadStart(p=<unavailable>) at win32threadpool.cpp:4569
    frame #8: 0x00007ffff732c86d libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000000737530) at thread.cpp:1807
    frame #9: 0x00007ffff7fb1fa3 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:486
    frame #10: 0x00007ffff7bbc4cf libc.so.6`__GI___clone at clone.S:95

Debian stable (buster)

$ apt list --installed | grep dotnet

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

dotnet-host/buster,now 3.0.1-1 amd64 [installed,automatic]
dotnet-hostfxr-3.0/buster,now 3.0.1-1 amd64 [installed,automatic]
dotnet-runtime-3.0/buster,now 3.0.1-1 amd64 [installed]
dotnet-runtime-deps-3.0/buster,now 3.0.1-1 amd64 [installed,automatic]
@Daniel15

This comment has been minimized.

Copy link
Author

@Daniel15 Daniel15 commented Nov 29, 2019

Here's strace output for another test app: https://d.ls/dotnet/bugs/coreclr-27955-strace.txt

@Daniel15

This comment has been minimized.

Copy link
Author

@Daniel15 Daniel15 commented Nov 29, 2019

@303248153 @janvorli Do you think this could be related to #26912? I'm seeing this on an OpenVZ VPS so it sounds like it might be related, but I don't know enough about C++ debugging to confirm that. Sometimes it just locks up (like what you reported), and other times I get this segmentation fault in ThreadpoolMgr.

Do you know of any workaround I can use in the meantime? @303248153, do you have a patched libcoreclr.so I could try so I don't have to build the patched version myself?

@Daniel15 Daniel15 changed the title Segmentation fault when running .NET Core 3.0 app Segmentation fault when running .NET Core 3.0 app on OpenVZ Nov 29, 2019
@303248153

This comment has been minimized.

Copy link

@303248153 303248153 commented Nov 29, 2019

Looks like it's the same problem, but I already deleted my workspace so no patched libcoreclr.so in my hand now.
The original issue is under milestone 3.1 so I think they will fix this in the next release.
For quick test you can patch the assembly code at 0x00007ffff7019b3d, replace the call to PAL_HasGetCurrentProcessorNumber with xor rax, rax; nop; nop; ... (48 31 C0 90 90 ...).

@Daniel15

This comment has been minimized.

Copy link
Author

@Daniel15 Daniel15 commented Nov 29, 2019

Today I learnt that it's pretty easy to override library methods using LD_PRELOAD.

As a workaround, I defined a sched_getcpu method that always returns 0 (which is fine given my VPS only has one vCPU):

// coreclr-27955-workaround.c
int sched_getcpu(void) {
    return 0;
}

Compiled it:

gcc -shared -fPIC coreclr-27955-workaround.c -o libcoreclr-27955-workaround.so
sudo cp libcoreclr-27955-workaround.so /usr/local/lib

Then ran my app with the LD_PRELOAD environment variable set:

LD_PRELOAD=/usr/local/lib/libcoreclr-27955-workaround.so ASPNETCORE_ENVIRONMENT=Production ASPNETCORE_URLS=http://*:54561 ./Foo

It worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.