Fix EPERM when killing sudo-launched createdump process in CollectCrashDumpWithCreateDump#125916
Fix EPERM when killing sudo-launched createdump process in CollectCrashDumpWithCreateDump#125916
Conversation
…CreateDump Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/a2dbb77d-4221-4987-8406-2d2f97adfeb8
|
@copilot I don't want a PR to just remove the workaorund - it will just expose the problem again. I want you to remove the workaround locally, loop to try to get a repro. If you get a repro, investigate. Any fix, put in this PR. |
…stead of Process.Kill Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/70211a86-2150-4fc8-9869-b914448a0643
The root cause is that The fix (commit a1b48fe):
|
|
@copilot look at feedback |
…condition Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/d515aa14-bc39-4049-80be-7464fad96cf5
|
In the Task.WaitAll(copyError, copyOutput);
Console.WriteLine("chown stdout:");
Console.WriteLine(copyOutput.Result);
Console.WriteLine("chown stderr:");
Console.WriteLine(copyError.Result);The |
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/0070e5df-2424-4d00-b35e-824e9c9cdfcd
…e) for sudoKill Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/841650d4-6146-47f2-b7fb-763a292fd331
|
Fixed in 84153ab — changed |
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/dee1a687-e44d-4551-bf84-6cdb1cebec8d
…rectly Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/147ec28d-750a-4c2e-947e-a22335663c8a
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/3b687777-2d3d-4874-95ea-d9662f946346
Tests on macOS were failing with
AggregateException: Not all processes in process tree could be terminated (Operation not permitted)when a timed-outcreatedumpprocess could not be killed.Description
Root cause:
createdumpis launched viasudo(createdump.StartInfo.FileName = "sudo"), so thesudoprocess and its childcreatedumpbinary run as root. On macOS/Linux, a non-root process cannot send SIGKILL to root-owned processes. This is exactly whatProcess.Kill(entireProcessTree: true)was hitting in the timeout path, causing anAggregateExceptionto propagate and fail the test.A previous workaround silently retried the failing
Killcall up to 5 times, masking the error. This PR removes that workaround and fixes the underlying cause.Fix: Added a
KillWithSudohelper that kills the timed-outcreatedumpprocess usingsudo -n kill -9 <pid>. Since passwordlesssudoaccess is already required to launchcreatedumpin the first place, this approach works correctly.The helper includes the following design decisions:
sudo -nis used so thatsudofails fast (with a non-zero exit code) instead of hanging waiting for a password prompt if passwordlesssudois not configured.sudo killprocess are not redirected — output flows directly to the console. Since this method is only called on a failure path, verbose output is acceptable and desirable.sudoKill:WaitForExit()is called without a timeout. Ifsudo killitself hangs, we are already in a cascade of failures with no meaningful recovery path, so no attempt is made to killsudoKill.KillWithSudoare allowed to propagate so that failures are visible rather than silently swallowed.Before (workaround):
After (fix):
Where
KillWithSudokills the given process usingProcess.Start("sudo", $"-n kill -9 {pid}")?.WaitForExit(), with output flowing directly to the console and exception propagation.Changes
src/tests/Common/CoreCLRTestLibrary/CoreclrTestWrapperLib.cs]: Remove the retry workaround; addKillWithSudohelper method using theProcess.Start(fileName, arguments)convenience overload,sudo -n, no stdout/stderr redirection (output goes directly to the console — already on a failure path where verbose output is fine), unconditionalWaitForExit()(no timeout — ifsudo killhangs we are already in a cascade of failures with no recovery path), exception propagation (no try/catch blocks); call it in the timeout path instead ofKill(entireProcessTree: true); remove all unnecessaryUseShellExecute = falseassignments throughout the file (the default in .NET Core)Note
This PR was generated with AI assistance (GitHub Copilot).
Original prompt
This section details on the original issue you should resolve
<issue_title>Test failure: AggregateException: Not all processes in process tree could be terminated.</issue_title>
<issue_description>Failed in: runtime-coreclr jitstress 20231009.2
Failed tests:
Error message:
Stack trace:
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.