Skip to content

Provide better UX for Clients when idle-timeout is reached #4617

@TomAugspurger

Description

@TomAugspurger

Currently, users who've connected to a scheduler that then times out don't get much information.

In [1]: from distributed import Client

In [2]: client = Client("172.25.136.49:8786")  # started with `dask-scheduler --idle-timeout=10s

In [3]: client
Out[3]: <Client: 'tcp://172.25.136.49:8786' processes=0 threads=0, memory=0 B>
# scheduler times out here <-----------------------------------
In [4]: distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
asyncio.exceptions.CancelledError
In [4]:

For reference, the traceback when using dask-gateway is a bit scarier

Details
distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
asyncio.exceptions.CancelledError
Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1391, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1124)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 696, in _handle_events
    self._handle_read()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1478, in _handle_read
    self._do_ssl_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1409, in _do_ssl_handshake
    return self.close(exc_info=err)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 611, in close
    self._signal_closed()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 641, in _signal_closed
    self._ssl_connect_future.exception()
asyncio.exceptions.CancelledError
Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1391, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1124)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 696, in _handle_events
    self._handle_read()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1478, in _handle_read
    self._do_ssl_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1409, in _do_ssl_handshake
    return self.close(exc_info=err)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 611, in close
    self._signal_closed()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 641, in _signal_closed
    self._ssl_connect_future.exception()
asyncio.exceptions.CancelledError
Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1391, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1124)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 696, in _handle_events
    self._handle_read()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1478, in _handle_read
    self._do_ssl_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1409, in _do_ssl_handshake
    return self.close(exc_info=err)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 611, in close
    self._signal_closed()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 641, in _signal_closed
    self._ssl_connect_future.exception()
asyncio.exceptions.CancelledError
Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1391, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1124)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 696, in _handle_events
    self._handle_read()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1478, in _handle_read
    self._do_ssl_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1400, in _do_ssl_handshake
    return self.close(exc_info=err)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 611, in close
    self._signal_closed()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 641, in _signal_closed
    self._ssl_connect_future.exception()
asyncio.exceptions.CancelledError
Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1391, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1124)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 696, in _handle_events
    self._handle_read()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1478, in _handle_read
    self._do_ssl_handshake()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 1400, in _do_ssl_handshake
    return self.close(exc_info=err)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 611, in close
    self._signal_closed()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/iostream.py", line 641, in _signal_closed
    self._ssl_connect_future.exception()
asyncio.exceptions.CancelledError

I wonder how we can make this UX a bit nicer. It's a bit difficult since the scheduler is the one going away, but the client is where the user is looking. Tossing some ideas out:

  1. scheduler notifies the client that it's timing out, Update the client's repr to include the fact that it's timed out.
  2. scheduler notifies the client that it's timing out, Use whatever mechanism is printing that _GatheringFuture exception to immediately print that the scheduler has timed out.

cc @jacobtomlinson if you have thoughts on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions