New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9wm: Ignore SIGCHLD instead of handling it #16
Conversation
It's been a long time since I used anything but Linux, but I'm almost positive that SIG_IGN will not clean up zombies on other types of Unix.
Are you actually seeing zombie processes lying around with the current code? |
Aha, I finally found a document that explains why 9wm has this code: http://www.unixguide.net/unix/faq/3.13.shtml
I'm still curious what motivated you to write this patch, are you seeing problems with zombies not getting cleaned by 9wm? |
Yes, and I run Linux. From what I looked at, waitpid stops the thread until a process changes state (including exit), and SIGCHLD is sent when a child exits. Doing waitpid after SIGCHLD should do nothing, and since it has WNOHANG it returns immediately. There may be something I'm not understanding, but this solves my problem. I can try an unpatched and patched version on some sort of BSD to see if this makes any difference. |
Here's some text from
It appears that the POSIX standard has changed, so I would expect a modern BSD to also respond to your patch the way you expect. A better test would be something old like BSD4.3 or, I don't know, Irix or HP-UX, where I'm 99.92% sure you'll see a bunch of zombies if you use your patch. Probably more productive would be to look into why Linux is no longer reaping defunct processes handled with #include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#define SECOND 1
#define MINUTE (60 * SECOND)
void sigchld(int signum) {
pid_t kid;
while (0 < (kid = waitpid(-1, NULL, WNOHANG))) {
printf("Child %d has exited\n", kid);
}
}
int main(int argc, char *argv[]) {
signal(SIGCHLD, sigchld);
for (int i = 0; i < 20; i += 1) {
if (!fork()) {
execlp("sleep", "sleep", "2s", NULL);
}
}
for (int remain = 15 * SECOND; remain; remain = sleep(remain)) {
printf("sleep(%d)\n", remain);
}
return 0;
} On my system, when I run this and watch the process table, the |
Ah, I guess I misunderstood the man page. I have tried this code on my machine, and the zombie processes get cleared up. |
Hello |
Are you sure 9wm is the parent process of these zombies? |
Yes, I am quite sure |
I'm working on a larger test program to check something. While I'm doing this, could you try the following:
If this clears out your zombies, then there might be some race condition where the child (to 9wm) exits, sends |
Actually, setting It is, however, looking increasingly likely that we're seeing a race condition here. |
This doesn't do anything for me. |
I'm not sure that is the case, as for instance xterm's child (bash) gets killed, while xterm itself leaves a zombie. |
For the record, I am unable to reproduce this situation with zombies. I believe that you're seeing it, I just can't figure out how to make it happen on my machine. It sounds like the situation is thus:
At this point I'm at a complete loss as to what to do next. The options seem to be:
I'm loathe to implement the first option, since it's an uninformed kludge that I would have added only because I'm too dumb to understand what the actual problem is. Also I don't even know what to test for with the The second option is also pretty bad, because 9wm becomes a real pain in the rear for people with systems similar to yours, when previously it worked just fine. They will hopefully find this pull request, merge in the code, and it we're really lucky, leave a comment that they did so, with some information pointing us at what might be causing the difference. The third option feels like giving up, but I feel like as the maintainer of code this old, of mostly historical interest, and with an active global userbase of maybe a dozen people, it's the most responsible thing to do. So unless @Tookmund chimes in with a different opinion (or, better, an idea about what in the world is happening), I'm going with option three. I'm going to leave the merge request open because the work @circl-lastname did in this pull request is the correct first option path out for anyone else experiencing this problem. If you, the reader, have any insight about why this is happening, please leave a comment! |
I just reproduced this! Yay! |
Running under |
@circl-lastname Could you try this patch against diff --git a/9wm.c b/9wm.c
index 940de04..08e0b00 100644
--- a/9wm.c
+++ b/9wm.c
@@ -176,7 +176,12 @@ main(int argc, char *argv[])
signal(SIGINT, SIG_IGN);
if (signal(SIGHUP, sighandler) == SIG_IGN)
signal(SIGHUP, SIG_IGN);
- signal(SIGCHLD, sigchld);
+ {
+ struct sigaction act = {0};
+
+ act.sa_handler = sigchld;
+ sigaction(SIGCHLD, &act, NULL);
+ }
exit_9wm = XInternAtom(dpy, "9WM_EXIT", False);
restart_9wm = XInternAtom(dpy, "9WM_RESTART", False); |
I will be trying your patch today |
Well, sorry, I was busy, but your patch does work. |
No worries! I'll push the fix and close this. |
Thanks for your patience in helping debug this! |
No problem! Also, while this isn't a huge issue, my e-mail is circl.lastname@gmail.com, rather than circl-lastname@gmail.com, but I'm nitpicking. |
This prevents defunct processes from being created.
waitpid within a SIGCHLD handler doesn't do anything, as by the time you get the signal, the process has already exited. 9wm doesn't need to do anything with the process afterwards, so it's safe to just ignore.