Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Euthanize unresponsive ipdevpoll workers #2129

Merged
merged 4 commits into from Mar 9, 2020

Conversation

lunkwill42
Copy link
Member

Since the cause of #2105 still has not been found, I'm taking the time to produce a potential workaround (which may still be useful if the cause of #2105 is found):

This PR implements a Ping command to the ipdevpoll worker AMP protocol, and uses this to regularly ping all running workers. Any worker who is not known to be done, and does not respond to a ping request will be euthanized using a SIGTERM signal. The existing ipdevpoll scheduler will automatically reschedule the jobs that are lost/frozen as a consequence.

Please see comments in individual commits for various rationales.

Because:
- We need the functionality of Deferred.addTimeout(), introduced in
  Twisted 16.5.
- Since the requirements specified Twisted versions up to 17 are ok for
  both Python versions, the most common installation methods for NAV
  will already have Twisted 17 installed, regardless of whether it is
  running on Python 3 or 2.
- The next NAV feature release will drop support for Python 2 anyway.
Because:
- We want to be able to verify that the worker processes are actually
  responsive.
Because:
- We've thus far been unable to pinpoint the problem that sometimes
  causes worker processes to freeze under NAV 5.0.
- Killing an unresponsive worker process will cause all the jobs in the
  worker to fail and the master will reschedule them automatically.
- Until the actual issue be identified, an option to monitor and kill
  unresponsive workers should suffice as a workaround.
Because:
- It's nice to be able to change the values when debugging.

These aren't necessarily options that normal users will ever need to
touch, so I'm not documenting these options just yet.
@lunkwill42 lunkwill42 added this to the 5.0.5 milestone Mar 6, 2020
@lunkwill42 lunkwill42 requested a review from hmpf March 6, 2020 14:46
@lunkwill42 lunkwill42 self-assigned this Mar 6, 2020
Copy link
Contributor

@hmpf hmpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be pertinent to this but will the ipdevpoll workers be properly killed by SIGTERM? I read recently that a common problem in python on Docker is that the python programs are commonly set up to be explicitly killed by SIGINT (KeyboardInterrupt) and that such programs should either also explicitly listen for SIGTERM, or docker should send SIGINT instead of SIGTERM. With explicit I mean catch in try:/except:, run a cleanup procedure and then sys.exit(). Not that we're running only in docker, and not that I can see that we need to explicitly catch anything in order to clean anything up, just something to keep in mind.

@lunkwill42
Copy link
Member Author

Might not be pertinent to this but will the ipdevpoll workers be properly killed by SIGTERM?

They should be, unless they are properly f***ed and won't even process signals.

signal.signal(signal.SIGTERM, self.sigterm_handler)
signal.signal(signal.SIGINT, self.sigterm_handler)

def sigterm_handler(self, signum, _frame):
"""Cleanly shuts down logging system and the reactor."""
self._logger.warning("%s received: Shutting down", signame(signum))
self._shutdown_start_time = time.time()
reactor.callFromThread(reactor.stop)

@lunkwill42 lunkwill42 merged commit 4d5bf5d into Uninett:5.0.x Mar 9, 2020
@lunkwill42 lunkwill42 deleted the debug/ipdevpoll-worker-ping branch March 9, 2020 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants