Skip to content

distributed.comm.tcp - WARNING - Closing dangling stream in <TCP local=tcp://127.0.0.1:52230 remote=tcp://127.0.0.1:42125> #2573

@muammar

Description

@muammar

I am getting this problem when using the distributed scheduler to compute 16,000,000 dot products. It makes the dashboard to not respond for several minutes and the computations are not started immediately. When they do start, I do not see all the resources of my workstations being used. I wonder why that would be.

I also see errors like the following ones:

remote=tcp://127.0.0.1:42125>
distributed.comm.tcp - WARNING - Closing dangling stream in <TCP local=tcp://127.0.0.1:52318 remote=tcp://127.0.0.1:42125>


tornado.application - ERROR - Exception in callback <bound method BokehTornado._keep_alive of
<bokeh.server.tornado.BokehTornado object at 0x7f96fc9055c0>>                                
Traceback (most recent call last):
  File "/home/muammar/.local/lib/python3.7/site-packages/tornado/ioloop.py", line 907, in _run
    return self.callback()
  File "/home/muammar/.local/lib/python3.7/site-packages/bokeh/server/tornado.py", line 542, in _keep_alive
    c.send_ping()
  File "/home/muammar/.local/lib/python3.7/site-packages/bokeh/server/connection.py", line 80, in send_ping
    self._socket.ping(codecs.encode(str(self._ping_count), "utf-8"))                         
  File "/home/muammar/.local/lib/python3.7/site-packages/tornado/websocket.py", line 447, in ping
    raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError

I suspect, but I might be wrong, that those socket errors are because the dashboard is waiting for the new 16000000 delayed computations to be populated in an array that I pass to the scheduler. Therefore, for a couple of minutes, it does not have any data to show. Would that be a cause for that?

When this happens I am unable to Ctrl-c the script running into the problem and my computer might even freeze (this happened just once). I posted more information here.

Any idea on what is going on and how to solve it?

Edit 1: My workstation has 64GB RAM, and it is consumed completely. I will go through the best practices of dask.delayed to do this in batches and see if I get rid of the warnings above.

Edit 2: After doing the computations in batches I can get them done, but the matrix I build does not fit in memory and that is fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions