Skip to content

dkms: use kill -0 $pid instead of /proc/$pid in process exit wait loop#403

Merged
evelikov merged 2 commits intodkms-project:masterfrom
MitchellAugustin:namespace_fix
Feb 20, 2024
Merged

dkms: use kill -0 $pid instead of /proc/$pid in process exit wait loop#403
evelikov merged 2 commits intodkms-project:masterfrom
MitchellAugustin:namespace_fix

Conversation

@MitchellAugustin
Copy link
Contributor

Some Ubuntu installers with custom install commands run DKMS from inside a separate PID namespace without mounting a separate /proc for the new namespace. As a result, this query for /proc/$pid will usually be waiting for an unrelated process in the parent namespace to exit, which sometimes doesn't exist at all or is a process that never exits, causing DKMS to hang forever in these scenarios.

Changing this to kill -0 will ensure that DKMS always checks for the PID in the correct namespace regardless of which /proc is mounted. We are also investigating solutions that can be implemented on our end for this issue, but since this issue causes undefined behavior on any system with a separate PID namespace and shared /proc, we feel this patch is appropriate for DKMS to ensure robustness in other implementations similar to ours.

Link to Ubuntu bug report: https://bugs.launchpad.net/curtin/+bug/2037682

Some Ubuntu installers with custom install commands run DKMS from inside a separate PID namespace
without mounting a separate /proc for the new namespace. As a result, this query for
/proc/$pid will usually be waiting for an unrelated process in the parent namespace
to exit, which sometimes doesn't exist at all or is a process that never exits, causing
DKMS to hang forever in these scenarios.

Changing this to kill -0 will ensure that DKMS always checks for the PID in the correct
namespace regardless of which /proc is mounted. We are also investigating solutions
that can be implemented on our end for this issue, but since this issue causes
undefined behavior on *any* system with a separate PID namespace and shared /proc,
we feel this patch is appropriate for DKMS to ensure robustness in other implementations
similar to ours.

Link to Ubuntu bug report: https://bugs.launchpad.net/curtin/+bug/2037682
Copy link
Collaborator

@evelikov evelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MitchellAugustin thanks for the patch. Couple of questions:

  • Is this the only fix needed for dkms to work in PID namespaces?
  • Reading through man 7 signal and man 1 kill I see no mention of 0. Is it guaranteed to work, can we reasonably say it won't backfire?

dkms.in Outdated
}
trap on_exit EXIT
while [[ -d /proc/$pid ]]; do
while kill -0 $pid > /dev/null 2>&1; do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please use kill --signal 0 ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test that syntax when I am back at work on Monday and update the PR if it works as expected. Thanks!

@MitchellAugustin
Copy link
Contributor Author

* Reading through `man 7 signal` and `man 1 kill` I see no mention of `0`. Is it guaranteed to work, can we reasonably say it won't backfire?

If you take a look at man 2 kill, you'll see:

If sig is 0, then no signal is sent, but existence and permission checks are still performed; this can be used to check for the existence of a process ID or process group ID that the caller is permitted to signal.

(I see that man 1 kill references signal 0 too, but only in that it says it is "particularly useful").

In my tests, I observed that the result of kill -0 reflected the correct status of the specified process within the child PID namespace (as well as in the default namespace on my host system), so I don't know of any way that it could backfire.

@MitchellAugustin
Copy link
Contributor Author

I tested the requested syntax change in my environment, and it looks like it does work with --signal, however we must just be careful to specify /bin/kill --signal 0 $pid since some shells have a built-in "kill" command that does not have the same --signal option. This is included in my latest commit.

Copy link
Collaborator

@evelikov evelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checked and it's working as expected - util-linux's kill happily accepts the syntax.

All the shell built-ins (busybox/ash, zsh, bash) behave differently from one another wrt signal handling 🤷

@MitchellAugustin
Copy link
Contributor Author

Awesome, thanks @evelikov!

@evelikov evelikov merged commit 4b4c62f into dkms-project:master Feb 20, 2024
powersj pushed a commit to canonical/curtin that referenced this pull request Mar 2, 2024
Some programs that are installed by default in Ubuntu rely on checking
/proc/$pid to determine if certain processes are still alive. Curtin's
current default behavior for "curtin in-target" is to run children
inside a separate PID namespace without also isolating /proc,
meaning checks for /proc/$pid usually wait on the wrong process
entirely. One such example is in any DKMS package, as DKMS checks
/proc/$pid. (I have submitted a similar patch to them to address
this as well: dkms-project/dkms#403).

Adding --mount-proc to unshare --pid args resolves that issue since
/proc is already mounted in util.py. However, it introduces a new
issue with postinstall scripts that rely on ischroot to determine
whether to restart systemd, since ischroot behavior is undefined
in pid namespaces. Bind-mounting /usr/bin/ischroot to /usr/bin/true
resolves this issue and is always correct since the symlink is
added only within ChrootableTargets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants