-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASPNet service crashing on linux-arm64 when targeted to net9.0 #110442
Comments
Tagging subscribers to this area: @dotnet/gc |
Thanks for reporting the issue. Does it repro frequently? Please share a dump (multiple if available) via email if possible. |
Yes, it fails periodically (but have no clue, what is the exact trigger). We have I will (try to) send you the dumps over mail. |
I have tried the trivial Might the behavior somehow depend (even non-intentionally) on the platform the application is built on? (we build/publish on windows machines and then deploy the result to linux servers): |
Thanks for sharing the dump. It appears that Thread::GetAllocContext is returning an invalid context. In 9 this part of the code was touched in #103055 and #103607. @jkoritzinsky, do you think any of those changes would cause a race on arm64? @pavel-faltynek, would be helpful if you can share a few more dumps -- assume all of them AV with the same stack? And any other specific details and/or a repro would be helpful. |
I know we had to do some fixes for during process shutdown (on Windows) in #103877. My first guess would be that this is happening because the alloc context for a thread was destroyed, but the Maybe there's a corresponding shutdown issue for Linux? I can't think of anything else without looking at the dump myself. |
In this particular case doesnt look like it's occurring during shutdown. I have shared the dump with you offline. |
The crash dump shows that the runtime was not notified about thread shutting down. It is most likely an managed/native interop problem (e.g. bug in interop corrupting unmanaged heap). I can see from the crash dump that your service uses number of nuget packages - the interop bug might be in one of them. It may be useful to run it on checked build CoreCLR to see whether it will give us any extra insights. Could you please give it a try?
|
@jkotas, it doesn't seem I can access this. I have tried to randomly accept "account and project creation" in |
@mangod9, generating more dumps might be the easier part. As far as I can remember all of the ones I loaded in WinDbg had the same call stack. I have added:
|
Regarding the nuget note: there is one which differs between Additionally one code update was performed for EDIT: Forgot the |
I have shared the checked build at https://github.com/jkotas/scratch/ |
Thank you, @jkotas. I have added dump 8 executed against the checked CLR.
Doesn't seem like repeated thing. Found only single instance (even when the service crashed many times). |
So just to clarify you hit the assert only once, but the service was still crashing with |
same bug on docker runtime image x86_64 on tag :9.0. On 9.0-alpine work perfect. |
Right. Single shot assert, unfortunately no documented relationship to the dump(s). So I have no clue, whether the dump 8 is anyhow connected to the assert or not. |
do you happen to have a standalone repro? |
heisenbug |
Description
Aspnet service crashes on
SIGSEGV
when compiled forlinux-arm64
and targeted tonet9.0
.It does not crash on windows at all, also as on windows and linux when targeted to
net8.0
.Reproduction Steps
Unfortunately I have no repro steps (other than just run the service and send few http requests to it).
On
Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-125-generic aarch64)
it looks like it just randomly crashes.Installed dotnet:
9.0.101
.Expected behavior
Don't crash even on
linux-arm64
, please 😁Actual behavior
There is a crash report available which - when preprocessed by
apport-unpack /var/crash/_usr_lib_dotnet_dotnet.1000.crash ~/crash
- can provide a dump. As there is possibly sensitive data in the dump, I can share it for inspection only via some "more secure" channels, if needed. I'm far from being expert here, but I'm able to open it in WinDbg and observe following:Under the "Stack" section, there is:
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
No response
The text was updated successfully, but these errors were encountered: