-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler and Client thow error on connect #3412
Comments
I missed it before but before throwing an error scheduler logs: |
I have been thinking about how can I modify my setup to narrow down what is wrong and solve this/provide more information here. |
Would you be able to supply some more information about how you are starting your cluster up? Ideally it would good to get an MRE. |
I'm having the same problem is there anything do to? |
File a new issue that contains a simple reproducer along with details about the environment it was run. |
also running into the same issue although using the latest docker images and using |
Hi @jakirkham For my case at least I really don't know what is the cause or how I can reproduce this issue as it randomly happens. I'm just using dask-distributed to spin up a dask cluster on kubernetes. No fancy things around it's really hard to debug or reproduce intentionally |
Maintainers (like myself) generally have lots of asks from many directions, which means we have a limited amount of time to spend per issue. As a result we (maintainers) really depend on users (like yourselves) to articulate clearly what problem you are running into with a reproducer. If you can't reproduce it, we won't be able to reproduce it. If we can't reproduce it, we won't be able to help you debug it (let alone come up with a fix or a test to confirm it doesn't get broken again). I get it, this is probably not what you want to hear and I have been on the other side of these problems (struggling to find my own reproducers and spending significant amounts of time constructing them). Unfortunately this is just the reality of things and this division of work tends to result in better outcome (fixed issues for end users). Separately the issue raised by this OP is nearly 2yrs old and has been largely dormant until a couple people (yourself included) have seen something that looks like it. This tells me there is a very good chance that what you are seeing is entirely unrelated. The fact Kubernetes and Cloudprovider are coming into the mix suggest it could even be a downstream issue or at least a downstream change that unmasked an upstream issue. In any event to avoid confusion and aid maintainers in helping you, would suggest filing new issues with as much information as you can provided. Hopefully this all makes sense and greatly appreciate your help here 🙂 |
I am trying to connect to a cluster and I didn't get a chance to change any configurations yet so I think everything is set to default. To turn on the scheduler I use the CLI interface without any arguments
dask-scheduler
and to create client I use Jupyter Notebook that just runs:I'm pretty confident that error I am seeing has nothing to do with URL to scheduler that I provided because my client manages to crash the scheduler 😆 .
In both of my containers I specify the version of dask to use 2.9.2 and they are built at the same time so I am also pretty sure that this issue has little to do with client and scheduler having different versions.
One thing worth to mention is that I have both client and scheduler running in two docker containers, and those container run fine when using
docker-compose
problem starts when I run these container inside Amazon's ECS.At this point all I can think of is that tornado need to have set timeout setting?
Blank assertion error is not really helping pinpoint the problem.
Looking at the trace looks like no time out has been provided as default.
Client error:
Scheduler error:
The text was updated successfully, but these errors were encountered: