Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: webagg: Handle ioloop shutdown correctly #6334

Merged
merged 1 commit into from May 10, 2016

Conversation

perimosocordiae
Copy link
Contributor

This change correctly triggers the ioloop.stop() method, allowing the user to call plt.show(), then resume control with a SIGINT, then call plt.show() again.

Without this change, any attempt to call plt.show() after killing the webagg server the first time would result in an error because the ioloop instance hadn't been properly stopped.

The motivation for using a signal handler comes from this SO answer, which is the only solution I could find that worked as intended.

@perimosocordiae
Copy link
Contributor Author

The AppVeyor error was caused by one of the builds failing to download Miniconda correctly.

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Apr 26, 2016
@tacaswell
Copy link
Member

wouldn't it be simpler to just set cls.started = False when shutting the loop down?

If we are going to install a signal handler we need to capture the current handler, call it as part of our handler and then clean up by restoring the previous handler.

@perimosocordiae
Copy link
Contributor Author

I'm not sure I understand your question. cls.started is used to prevent the WebAggApplication from starting the tornado IOLoop when it's already running. The problem with the existing code (that this PR fixes) is that the KeyboardInterrupt we catch doesn't actually stop the IOLoop, but does set cls.started = False. That means that the next time someone tries to start it, tornado throws an exception because as far as it knows, the IOLoop is still running.

@tacaswell
Copy link
Member

Can you just add ioloop.stop() into the except block?

I am somewhat surprised that tornado does not stop it's self on a keyboard interupt

@perimosocordiae
Copy link
Contributor Author

That's what I tried first, but it didn't work. As far as I can tell, this is what happens in that scenario:

  1. ioloop.start() -> blocks, transferring control to tornado
  2. KeyboardInterrupt -> takes control back from the ioloop without letting it clean up properly
  3. ioloop.stop() -> does some cleanup, but not all

Specifically, the problem is that the only correct way to stop the IOLoop is by having it run a callback that triggers its stop() method. Otherwise, the cleanup code I linked above never runs.

@perimosocordiae
Copy link
Contributor Author

Correction, that linked code is in a finally block, so it does run, but it runs before the call to stop().

@perimosocordiae
Copy link
Contributor Author

I suppose we could hack around the problem like this:

    try:
        ioloop.start()
    except KeyboardInterrupt:
        # set the relevant flag, bypassing ioloop.stop()
        ioloop._running = False
    print("Server is stopped")
    sys.stdout.flush()
    cls.started = False

I can confirm that this works, ugly as it is.

The key is that for the IOLoop to be restartable, it needs to exit in a state where both self._running and self._stopped are False. The finally block only sets self._stopped = False, and the stop() method sets self._running = False and self._stopped = True. Oddly enough, the start() method exits right away if self._stopped is True, but also sets it to False, so this works, too:

    try:
        ioloop.start()
    except KeyboardInterrupt:
        ioloop.stop()  # clears the _running flag, sets the _stopped flag
        ioloop.start()  # clears the _stopped flag and returns
    print("Server is stopped")
    sys.stdout.flush()
    cls.started = False

This seems like least intrusive change that gets the job done, and apparently it's documented behavior. I'll update the PR to use this now.

@perimosocordiae
Copy link
Contributor Author

Travis failure is unrelated.

@@ -22,6 +22,7 @@
import random
import sys
import socket
import signal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, removed.

@tacaswell
Copy link
Member

That makes sense, but I worry that it is going to be very dependent on the internal bits of tornado. I suspect that server people do not understand why you would want to start/stop/restart the server in the same process....

Is it worth trying to get the attention of someone from the tornado project on this?

attn @mdboom

@perimosocordiae
Copy link
Contributor Author

I agree that it's treading pretty close to unexpected usage. The docs for IOLoop.start imply that you should only call stop() from a callback, but docs for IOLoop.stop mention that:

If the event loop is not currently running, the next call to start() will return immediately.

Maybe @bdarnell could clear things up?

@bdarnell
Copy link

After an exception has escaped from IOLoop.start(), the loop is left in an undefined state and should not be used again. To keep the IOLoop restartable, you have to catch the exception before it breaks through IOLoop.start(). The signal handler in the SO answer linked at the top of this thread is the correct approach.

@tacaswell
Copy link
Member

Fair enough. @perimosocordiae Can you do the signal handling with a context manager? Something like

@contextmanager
def signal_int_grabber():
    old_sig = signal.signal(signal.SIGINT, 
                              lambda sig, frame: ioloop.add_callback_from_signal(on_shutdown))
    try:
        yield
    finally:
        signal.signal(signal.SIGINT, old_sig)

with sig_int_grabber():
     ioloop.start()

@perimosocordiae
Copy link
Contributor Author

Sure, pushed and squashed. The datetime import was unused, so I removed it since I was messing with imports anyway.

@tacaswell
Copy link
Member

❤️ pep8

======================================================================
FAIL: matplotlib.tests.test_coding_standards.test_pep8_conformance_installed_files
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/matplotlib/matplotlib/venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/tests/test_coding_standards.py", line 247, in test_pep8_conformance_installed_files
    expected_bad_files=expected_bad_files)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/tests/test_coding_standards.py", line 143, in assert_pep8_conformance
    assert_equal(result.total_errors, 0, msg)
AssertionError: 1 != 0 : Found code syntax errors (and warnings):
/home/travis/build/matplotlib/matplotlib/lib/matplotlib/backends/backend_webagg.py:346:80: E501 line too long (80 > 79 characters)
----------------------------------------------------------------------

This other wise looks good to me 👍 .

This change correctly cleans up after the ioloop is
interrupted, allowing the user to call plt.show(),
then resume control with a SIGINT, then call plt.show() again.
@perimosocordiae
Copy link
Contributor Author

Curses, off by one! Fixed and squashed.

@tacaswell
Copy link
Member

@jenshnielsen Can you review and merge this? Given that you found those issues from the nbagg refactor I take it use use webagg?

@mdboom
Copy link
Member

mdboom commented May 9, 2016

This looks good to me. The misunderstanding about SIGINT not actually stopping the server was mine originally, and in fairness I only cared about the whole process stopping and never really considered starting up another server. This looks good to me, but it wouldn't hurt to get @jenshnielsen's opinion as well.

@tacaswell tacaswell merged commit 1d81f97 into matplotlib:master May 10, 2016
@tacaswell
Copy link
Member

@perimosocordiae Thanks!

@perimosocordiae perimosocordiae deleted the restartable-webagg branch May 10, 2016 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants