dkms: use kill -0 $pid instead of /proc/$pid in process exit wait loop#403
Conversation
Some Ubuntu installers with custom install commands run DKMS from inside a separate PID namespace without mounting a separate /proc for the new namespace. As a result, this query for /proc/$pid will usually be waiting for an unrelated process in the parent namespace to exit, which sometimes doesn't exist at all or is a process that never exits, causing DKMS to hang forever in these scenarios. Changing this to kill -0 will ensure that DKMS always checks for the PID in the correct namespace regardless of which /proc is mounted. We are also investigating solutions that can be implemented on our end for this issue, but since this issue causes undefined behavior on *any* system with a separate PID namespace and shared /proc, we feel this patch is appropriate for DKMS to ensure robustness in other implementations similar to ours. Link to Ubuntu bug report: https://bugs.launchpad.net/curtin/+bug/2037682
evelikov
left a comment
There was a problem hiding this comment.
@MitchellAugustin thanks for the patch. Couple of questions:
- Is this the only fix needed for dkms to work in PID namespaces?
- Reading through
man 7 signalandman 1 killI see no mention of0. Is it guaranteed to work, can we reasonably say it won't backfire?
dkms.in
Outdated
| } | ||
| trap on_exit EXIT | ||
| while [[ -d /proc/$pid ]]; do | ||
| while kill -0 $pid > /dev/null 2>&1; do |
There was a problem hiding this comment.
Nit: please use kill --signal 0 ...
There was a problem hiding this comment.
I will test that syntax when I am back at work on Monday and update the PR if it works as expected. Thanks!
If you take a look at
(I see that In my tests, I observed that the result of |
|
I tested the requested syntax change in my environment, and it looks like it does work with --signal, however we must just be careful to specify /bin/kill --signal 0 $pid since some shells have a built-in "kill" command that does not have the same --signal option. This is included in my latest commit. |
evelikov
left a comment
There was a problem hiding this comment.
Double-checked and it's working as expected - util-linux's kill happily accepts the syntax.
All the shell built-ins (busybox/ash, zsh, bash) behave differently from one another wrt signal handling 🤷
|
Awesome, thanks @evelikov! |
Some programs that are installed by default in Ubuntu rely on checking /proc/$pid to determine if certain processes are still alive. Curtin's current default behavior for "curtin in-target" is to run children inside a separate PID namespace without also isolating /proc, meaning checks for /proc/$pid usually wait on the wrong process entirely. One such example is in any DKMS package, as DKMS checks /proc/$pid. (I have submitted a similar patch to them to address this as well: dkms-project/dkms#403). Adding --mount-proc to unshare --pid args resolves that issue since /proc is already mounted in util.py. However, it introduces a new issue with postinstall scripts that rely on ischroot to determine whether to restart systemd, since ischroot behavior is undefined in pid namespaces. Bind-mounting /usr/bin/ischroot to /usr/bin/true resolves this issue and is always correct since the symlink is added only within ChrootableTargets.
Some Ubuntu installers with custom install commands run DKMS from inside a separate PID namespace without mounting a separate /proc for the new namespace. As a result, this query for /proc/$pid will usually be waiting for an unrelated process in the parent namespace to exit, which sometimes doesn't exist at all or is a process that never exits, causing DKMS to hang forever in these scenarios.
Changing this to kill -0 will ensure that DKMS always checks for the PID in the correct namespace regardless of which /proc is mounted. We are also investigating solutions that can be implemented on our end for this issue, but since this issue causes undefined behavior on any system with a separate PID namespace and shared /proc, we feel this patch is appropriate for DKMS to ensure robustness in other implementations similar to ours.
Link to Ubuntu bug report: https://bugs.launchpad.net/curtin/+bug/2037682