Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When aw-qt crashes, aw-watcher-afk does not shut down #19

Closed
johan-bjareholt opened this issue Jun 19, 2017 · 11 comments · Fixed by ActivityWatch/aw-client#21
Closed

When aw-qt crashes, aw-watcher-afk does not shut down #19

johan-bjareholt opened this issue Jun 19, 2017 · 11 comments · Fixed by ActivityWatch/aw-client#21

Comments

@johan-bjareholt
Copy link
Member

Really odd behavior.
Reproducible by starting aw-qt and then doing "kill -9" on the aw-qt process.

@ErikBjare
Copy link
Member

ErikBjare commented Jun 20, 2017

Yeah, I tried using a process group for this but I'm not that familiar with that part of POSIX.

Do you think you could look into that @johan-bjareholt?

@ErikBjare
Copy link
Member

ErikBjare commented Jun 20, 2017

It seems that kill -9 doesn't kill the whole process group. You'd have to do kill -9 -$PGID. (See here: https://stackoverflow.com/a/15139734/965332)

A solution to prevent double-running processes would be to run a process check upon start and see if any aw-* processes (that aw-qt wants to manage) are already running.

Also, from Wikipedia:

The distribution of signals to process groups forms the basis of job control employed by shell programs. The tty device driver incorporates a notion of a foreground process group, to which it sends signals generated by keyboard interrupts, notably SIGINT ("interrupt", Control+C), SIGTSTP ("terminal stop", Control+Z), and SIGQUIT ("quit", Control+).

SIGKILL is notably missing from that list.

@johan-bjareholt
Copy link
Member Author

A solution to prevent double-running processes would be to run a process check upon start and see if any aw-* processes (that aw-qt wants to manage) are already running.

Here's a proposal for this

ActivityWatch/aw-client#21

@ErikBjare
Copy link
Member

Closing this since it only happens in the case of kill -9, which we can't do anything about.

@johan-bjareholt
Copy link
Member Author

johan-bjareholt commented Jun 26, 2017

No, it does not only happen for kill -9, i only used kill when i tried to reproduce. Another way to reproduce on my machine is to simply press "open log folder" so it crashes normally, but in that case all modules continue running (aw-server, aw-watcher-afk and aw-watcher-window).

@ErikBjare
Copy link
Member

Oh, fair enough.

I think this depends on how Python crashes, if Python crashes with an internal error (possible that PyQt triggers this) instead of a normal Exception-caused crash then I'm not sure we can do much about it. If it crashes "normally" then I think the atexit should work (which I added here).

@nikanar
Copy link

nikanar commented Jul 18, 2017

Once aw-qt dies, the other aw-* become orphan processes, and their PPID becomes 1 (for the init process, rather than having the value of their parent's PID (which in our case is also their PGID). Since there is no formal tie between a child process and its parent), it is normal, if undesirable, that children processes stay alive after the parent crashes. Children that would rather die than stay as orphans typically poll their parents for aliveness, for instance checking whether their own PPID ever becomes 1 (and then shutting down).

This can be done in this way in Python.
# inspired from http://www.programcreek.com/python/example/4464/os.getppid
while True: # watcher loop, run() or heatbeat_loop()
    if os.getppid() == 1:
        sys.exit()

This kills the watchers when aw-qt is kill -9-ed.

Then I've been looking for the proper way to terminate the watchers, and while aw-qt.Module.stop looked nice, I couldn't access it, so I elected to just break out of the main loops. If there's a better thing to do, please let me know as I didn't find one.

PR will come in as soon as I figure a proper git way out of my local mess (I've been adding --user to all Makefiles here, and --upgrade --force-reinstall to aw-watcher-window or it wouldn't use newer code (maybe due to the make clean issue ?)).

Some other solution using prctl

Another solution involves prctl : prctl(PR_SET_PDEATHSIG, SIGTERM);, but that seems easier in C than in Python.

python-prctl is a thing, if someone wants to go that way.

And this track was yet another option but led nowhere

Otherwise, per section 10.6.4 in this document, supposedly all orphaned process groups receive a SIGHUP, and could take that hint to die graciously (but it doesn't seem to apply here, maybe because reasons ; this question is super relevant but gets complicated). That would take to install a SIGHUP handler. That would have simply entailed adding signal.signal(signal.SIGHUP, signal.SIGTERM) inside heartbeat_loop in aw-watcher-window/main.py.


Note that even if all I did works fine, aw-server still stays alive when aw-qt crashes, which I think is still an issue. (But that seemed like an entirely different beast, to be tamed another day.)

@johan-bjareholt
Copy link
Member Author

johan-bjareholt commented Jul 19, 2017

@nikanar Awesome! Much better solution than my suggestion.

Out of the 3 solutions you found the one you selected was clearly most clean.

Will also work nicely even if the watchers are not run with aw-qt.

Will check the PRs and merge :)

Thanks for the help!

@ErikBjare
Copy link
Member

Amazing work, learned a lot from reading those links.

But, as I mentioned in comments on the PRs, there might be an issue with the taken approach when run using the PyInstaller binaries. I'm confident we can find a way around that however.

@ErikBjare
Copy link
Member

ErikBjare commented Jul 20, 2017

Continued discussions should probably go here.

So, @nikanar did a nice fix with the PPID detection but I don't think that'll work with PyInstaller due to it using a bootloader process that in turn starts the application process.

Details here: https://pythonhosted.org/PyInstaller/advanced-topics.html#the-bootstrap-process-in-detail

The process tree would look something like this when run using the PyInstaller (iirc):

  • aw-qt (bootloader)
    • aw-qt
      • aw-server (bootloader)
        • aw-server
      • aw-watcher-afk (bootloader)
        • aw-watcher-afk
      • ...

As specified in the PyInstaller docs, the bootloader will exit when its child process exits.

There has been talk about moving the child processes into Python-managed processes or threads (this would get rid of all the secondary bootloader and is likely to save some memory), but this has not been investigated.

@nikanar
Copy link

nikanar commented Jul 20, 2017

If that's right, then when running under PyInstaller, the children should look for the PPID of their parent process (the PyInterpreter) to see if that is 1 because aw-qt died. But then, I also have no idea if PPIDs work the same under Windows either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants