-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[linux] Segfault while invoking Roslyn's csc.dll with long command line #98936
Comments
AFAIR Linux doesn't support such line lenghts and it truncates them, can't find a reliable answer on google though since various results say that either 1024, 2048, 4096 or 128k chars is the limit. |
The line limits are higher than on macOS (which has 1M): $ getconf ARG_MAX
2097152 And it appears that OS-imposed line length limit is not a factor here, since named pipes appear to be used to pass arguments to the sub-process. Command line length may play a role in either |
Just tested with the new dotnet, |
Hello @grendello, are you able to share the dump and/or a repro for this issue? |
Hello @mangod9, I didn't save any of the original core dump files, alas, but I've just tested with dotnet I will close the issue and reopen it only should I see the crash again. Thanks! |
@mangod9 well, it started happening again with the update to
|
In hope to get better traces, I built
|
Good news is that |
The official builds happen with llvm instead of gcc so this might still be an issue. If you want to try building something closer to the official build you can use the docker container |
@akoeplinger I used gcc because the runtime failed to build with clang 18 in |
Rebuilt the current tip of |
ok interesting. If you do repro it again please capture dumps (esp. the assert you hit on debug). Thanks. |
Here's the debug build assertion stack trace Commit:
and for completeness, stack trace for all the threads:
|
Adding @davidwrighton since |
I'm looking into this |
@mangod9, @grendello Well, the assertion you found with our latest bits is definitely a bug, but not related to the other issue you've been looking at. I'm in the process of writing a reliable test for the assertion failure issue, and it'll be fixed in main probably later this week, but it won't fix the segfault problem. This assertion failure issue could in principle result in a deadlock, not a crash. |
yeah agree this isnt related to the original issue. @grendello if you arent able to repro the main bug we can close after David makes the fix. |
The assertion issue is in process of being fixed with pr #104003 |
Once your fix lands, I will test the debug build again to see if I can repro the original issue in this configuration. I guess the assertion stood in the way of it. |
Alas, the issue appears to be back with the tip of For both commits, the same compiler (clang 16.0.6) was used, with the same build command line (
Interestingly, the same exact build managed to catch the segfault once and abort instead of crashing, here's the trace:
The 2nd trace isn't very helpful, but the bottom frames are at least the same in both cases. Might it be an issue of thread synchronization in the GC code? |
A workaround has presented itself, and a clue (I think) at the same time. Disabling background GC makes the build work. I disabled it only for |
The workaround doesn't remove the issue completely, but it makes the crash much less likely. I've just had the first one in around 20 builds. The stack trace for this crash is not complete, alas:
|
An interesting observation is that when I keep getting the segfaults, dropping all the kernel page and slab object caches, as well as flushing filesystem caches help: # sync -f
# echo 3 > /proc/sys/vm/drop_caches Makes me wonder if it could be some uninitialized and not properly cleared memory somewhere in the native GC code? When dotnet keeps crashing, it's possible that subsequent runs of it would reuse shared libraries already mapped in memory and kept around in the caches. If this happens and the runtime doesn't initialize (or clear after use) some variable(s), invalid values may cause it to go astray. |
Hey @grendello, does this continue to repro on recent main? |
@mangod9 in the past ~10 days I haven't seen any crashes, things look hopeful atm. We're currently using |
is this ok to close now? |
I think so, but it would be good to know what was the problem :) Any idea what fixed it? |
The crash happens occasionally (every 4-5 builds) when building
.NET Android's Mono.Android
project, which passes a very long command line to the Roslyn C# compiler (1.8Mb long, over 11k parameters). It started happening with bump to the latestdotnet
9 preview,dotnet/installer/main@0a73f814e1 9.0.100-preview.2.24122.3
.Environment info:
OS: Debian/testing (trixie)
Arch: amd64
CPU: AMD Ryzen 9 5950X 16-Core Processor
Kernel: 6.7.6-x64v3-xanmod1
The crash doesn't happen on every build, but it appears the more parallel the build, the crashier the process becomes. It also appears that putting the long command line in a response file mitigates the issue, at least I wasn't able to reproduce the crash this way after 20 attempts. When the crash happens, runtime just goes away silently without capturing the signal and printing any information. Our CI Linux bot uses Ubuntu 20.04.6 LTS and, so far, doesn't appear to experience this crash (or we were lucky, as it only landed last Friday)
Unfortunately, to reproduce locally one would have to build
Xamarin.Android
on their machine and extract thecsc
invocation from binlog. This is necessary because the build uses a lot of artifacts produced by .NET Android build. I will be happy to provide assistance in that regard, if necessary :)I extracted the compiler invocation into a shell script and captured a handful of coredumps, the best trace I managed to get is below:
The text was updated successfully, but these errors were encountered: