Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when running .NET Core 3.0 app on OpenVZ #13833

Closed
Daniel15 opened this issue Nov 29, 2019 · 5 comments
Closed

Segmentation fault when running .NET Core 3.0 app on OpenVZ #13833

Daniel15 opened this issue Nov 29, 2019 · 5 comments

Comments

@Daniel15
Copy link

I have a .NET Core app that's a fairly basic gRPC service (unfortunately not open source yet, so I can't link to the source at the moment). On one particular server, it's throwing a segmentation fault as soon as I run it:

ASPNETCORE_ENVIRONMENT=Production ASPNETCORE_URLS=http://*:54561 ./TestApp
info: Microsoft.Hosting.Lifetime[0]
      Now listening on: http://[::]:54561
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /opt/exampleapp-worker
Segmentation fault

In fact, I'm seeing this for a very basic ASP.NET website too (just one custom middleware using app.Run).

Backtrace from lldb + SOS:

* thread dotnet/coreclr#7, name = 'TestApp', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x00007ffff7019b3d libcoreclr.so`ThreadpoolMgr::GetRecycledMemory(ThreadpoolMgr::MemType) [inlined] ThreadpoolMgr::RecycledListInfo::Remove() at win32threadpool.h:655
    frame dotnet/coreclr#1: 0x00007ffff7019aa5 libcoreclr.so`ThreadpoolMgr::GetRecycledMemory(memType=<unavailable>) at win32threadpool.cpp:1674
    frame dotnet/coreclr#2: 0x00007ffff715e464 libcoreclr.so`UnManagedPerAppDomainTPCount::QueueUnmanagedWorkRequest(unsigned int (*)(void*), void*) [inlined] ThreadpoolMgr::MakeWorkRequest(unsigned int (*)(void*), void*) at win32threadpool.h:367
    frame dotnet/coreclr#3: 0x00007ffff715e45a libcoreclr.so`UnManagedPerAppDomainTPCount::QueueUnmanagedWorkRequest(this=0x00007ffff75dd640, function=(libcoreclr.so`ThreadpoolMgr::AsyncTimerCallbackCompletion(void*) at win32threadpool.cpp:4765), context=0x0000000000738290)(void*), void*) at threadpoolrequest.cpp:356
    frame dotnet/coreclr#4: 0x00007ffff701d100 libcoreclr.so`ThreadpoolMgr::FireTimers() at win32threadpool.cpp:855
    frame dotnet/coreclr#5: 0x00007ffff701d088 libcoreclr.so`ThreadpoolMgr::FireTimers() at win32threadpool.cpp:4710
    frame dotnet/coreclr#6: 0x00007ffff701cd41 libcoreclr.so`ThreadpoolMgr::TimerThreadFire() at win32threadpool.cpp:4598
    frame dotnet/coreclr#7: 0x00007ffff701cc15 libcoreclr.so`ThreadpoolMgr::TimerThreadStart(p=<unavailable>) at win32threadpool.cpp:4569
    frame dotnet/coreclr#8: 0x00007ffff732c86d libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000000000737530) at thread.cpp:1807
    frame dotnet/coreclr#9: 0x00007ffff7fb1fa3 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:486
    frame dotnet/coreclr#10: 0x00007ffff7bbc4cf libc.so.6`__GI___clone at clone.S:95

Debian stable (buster)

$ apt list --installed | grep dotnet

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

dotnet-host/buster,now 3.0.1-1 amd64 [installed,automatic]
dotnet-hostfxr-3.0/buster,now 3.0.1-1 amd64 [installed,automatic]
dotnet-runtime-3.0/buster,now 3.0.1-1 amd64 [installed]
dotnet-runtime-deps-3.0/buster,now 3.0.1-1 amd64 [installed,automatic]
@Daniel15
Copy link
Author

Here's strace output for another test app: https://d.ls/dotnet/bugs/coreclr-27955-strace.txt

@Daniel15
Copy link
Author

@303248153 @janvorli Do you think this could be related to dotnet/coreclr#26912? I'm seeing this on an OpenVZ VPS so it sounds like it might be related, but I don't know enough about C++ debugging to confirm that. Sometimes it just locks up (like what you reported), and other times I get this segmentation fault in ThreadpoolMgr.

Do you know of any workaround I can use in the meantime? @303248153, do you have a patched libcoreclr.so I could try so I don't have to build the patched version myself?

@Daniel15 Daniel15 changed the title Segmentation fault when running .NET Core 3.0 app Segmentation fault when running .NET Core 3.0 app on OpenVZ Nov 29, 2019
@303248153
Copy link
Contributor

Looks like it's the same problem, but I already deleted my workspace so no patched libcoreclr.so in my hand now.
The original issue is under milestone 3.1 so I think they will fix this in the next release.
For quick test you can patch the assembly code at 0x00007ffff7019b3d, replace the call to PAL_HasGetCurrentProcessorNumber with xor rax, rax; nop; nop; ... (48 31 C0 90 90 ...).

@Daniel15
Copy link
Author

Today I learnt that it's pretty easy to override library methods using LD_PRELOAD.

As a workaround, I defined a sched_getcpu method that always returns 0 (which is fine given my VPS only has one vCPU):

// coreclr-27955-workaround.c
int sched_getcpu(void) {
    return 0;
}

Compiled it:

gcc -shared -fPIC coreclr-27955-workaround.c -o libcoreclr-27955-workaround.so
sudo cp libcoreclr-27955-workaround.so /usr/local/lib

Then ran my app with the LD_PRELOAD environment variable set:

LD_PRELOAD=/usr/local/lib/libcoreclr-27955-workaround.so ASPNETCORE_ENVIRONMENT=Production ASPNETCORE_URLS=http://*:54561 ./Foo

It worked!

@jkotas
Copy link
Member

jkotas commented Jan 23, 2020

Duplicate of #13475. This issue will be fixed in .NET Core 3.1.2 update.

@jkotas jkotas closed this as completed Jan 23, 2020
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants