I am getting this problem when using the distributed scheduler to compute 16,000,000 dot products. It makes the dashboard to not respond for several minutes and the computations are not started immediately. When they do start, I do not see all the resources of my workstations being used. I wonder why that would be.
I also see errors like the following ones:
remote=tcp://127.0.0.1:42125>
distributed.comm.tcp - WARNING - Closing dangling stream in <TCP local=tcp://127.0.0.1:52318 remote=tcp://127.0.0.1:42125>
tornado.application - ERROR - Exception in callback <bound method BokehTornado._keep_alive of
<bokeh.server.tornado.BokehTornado object at 0x7f96fc9055c0>>
Traceback (most recent call last):
File "/home/muammar/.local/lib/python3.7/site-packages/tornado/ioloop.py", line 907, in _run
return self.callback()
File "/home/muammar/.local/lib/python3.7/site-packages/bokeh/server/tornado.py", line 542, in _keep_alive
c.send_ping()
File "/home/muammar/.local/lib/python3.7/site-packages/bokeh/server/connection.py", line 80, in send_ping
self._socket.ping(codecs.encode(str(self._ping_count), "utf-8"))
File "/home/muammar/.local/lib/python3.7/site-packages/tornado/websocket.py", line 447, in ping
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
I suspect, but I might be wrong, that those socket errors are because the dashboard is waiting for the new 16000000 delayed computations to be populated in an array that I pass to the scheduler. Therefore, for a couple of minutes, it does not have any data to show. Would that be a cause for that?
When this happens I am unable to Ctrl-c the script running into the problem and my computer might even freeze (this happened just once). I posted more information here.
Any idea on what is going on and how to solve it?
Edit 1: My workstation has 64GB RAM, and it is consumed completely. I will go through the best practices of dask.delayed to do this in batches and see if I get rid of the warnings above.
Edit 2: After doing the computations in batches I can get them done, but the matrix I build does not fit in memory and that is fine.
I am getting this problem when using the
distributedscheduler to compute 16,000,000 dot products. It makes the dashboard to not respond for several minutes and the computations are not started immediately. When they do start, I do not see all the resources of my workstations being used. I wonder why that would be.I also see errors like the following ones:
I suspect, but I might be wrong, that those socket errors are because the dashboard is waiting for the new 16000000 delayed computations to be populated in an array that I pass to the scheduler. Therefore, for a couple of minutes, it does not have any data to show. Would that be a cause for that?
When this happens I am unable to
Ctrl-cthe script running into the problem and my computer might even freeze (this happened just once). I posted more information here.Any idea on what is going on and how to solve it?
Edit 1: My workstation has 64GB RAM, and it is consumed completely. I will go through the best practices of dask.delayed to do this in batches and see if I get rid of the warnings above.
Edit 2: After doing the computations in batches I can get them done, but the matrix I build does not fit in memory and that is fine.