Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker service update with --detach=false hangs on services with 0 tasks #627

Closed
wsong opened this issue Oct 19, 2017 · 9 comments
Closed

Comments

@wsong
Copy link
Contributor

wsong commented Oct 19, 2017

Description

If you have a global service that cannot be scheduled anywhere (e.g. its placement constraints are incorrect), its task list will be empty. If you try to update that service with --detach=false, the CLI will hang while "waiting for new tasks."

Steps to reproduce the issue:

  1. docker service create --name testservice --mode global --constraint node.platform.os==notarealos busybox sleep 24h
  2. docker service ps testservice (note that the task list is empty)
  3. docker service update --detach=false --label-add foo=bar testservice

Describe the results you received:
The CLI hangs forever with overall progress: waiting for new tasks

Describe the results you expected:
The update would go through

The client I tested this on was 17.09.0-ce, but this is likely an issue on any CLI with the --detach=false option.

@dnephin
Copy link
Contributor

dnephin commented Oct 19, 2017

What happens when you run with --detach=true ? Does it update as you expect?

Since the update doesn't fix the constraint problem then the service is never fixed, right? So I think I would expect the CLI to just hang and wait for it to be fixed. We could maybe add a timeout.

@wsong
Copy link
Contributor Author

wsong commented Oct 19, 2017

With --detach=true, the update goes through immediately and everything works fine.

If the expectation is that docker service update only works on schedulable services, then this is working as expected, but that's quite unintuitive. You can imagine a case where a service has two invalid placement constraints (e.g. node.platform.os==exampleos and node.hostname=examplehost); its quite reasonable that a user might remove the first constraint, then want to check the service, then realize that they need to remove the second constraint, then issue a second service update call.

Additionally, this means that it's not really possible to update services that are supposed to run on nodes that are not yet part of the cluster (e.g. you deploy a Windows service, but you haven't added any Windows nodes yet) unless you remember to specify --detach=true.

@dnephin
Copy link
Contributor

dnephin commented Oct 19, 2017

Are you sure the update isn't applied? The entire purpose of --detach=false is to wait for the service tasks to stabalize, but the update should be applied first, before it starts waiting.

What happens if you inspect the service while the cli is stuck in "overall progress: waiting for new tasks" ? I would expect that the update has been applied, it's just the CLI is waiting for the tasks to be in the running state. I think this is the correct behaviour.

If you want to update services that you know will not be in a running state then use --detach=true.

@wsong
Copy link
Contributor Author

wsong commented Oct 19, 2017

Oh sorry, yes, the update does get applied. It's just that the CLI hangs and for a user it's not clear why.

@dnephin
Copy link
Contributor

dnephin commented Oct 19, 2017

That seems like the right behaviour. How do you think this can be improved?

Could we add more details to the message "overall progress: waiting for new tasks" to make it clear?

Maybe we could add a timeout and print a more helpful message when the timeout is hit?

@wsong
Copy link
Contributor Author

wsong commented Oct 19, 2017

Well, if a service has zero tasks before the update is applied, should we really wait for the tasks to get updated? Maybe in that case the service update should just return right away, even if --detach=false is specified.

I don't know what the original intention of that flag was, but the way I interpreted it was "block until the update has gone through on all existing tasks." What is the use case for blocking a service update forever on a service with zero tasks?

@dnephin
Copy link
Contributor

dnephin commented Oct 19, 2017

if a service has zero tasks before the update is applied, should we really wait for the tasks to get updated?

Yes. For example when going from 0 to 1+ replicas I would expect there to be no tasks before the update, but I would still want to wait on the new tasks to be running.

on all existing tasks

It's not just existing tasks, because the update could change the number of tasks.

What is the use case for blocking a service update forever on a service with zero tasks?

Expecting a service to have 0 tasks doesn't really seem like a common use case. There's already a way of handling this rare case, which is to use --detach=true. Since there are no tasks, there is nothing to wait on. The update is already "done" when service update --detach=true returns.

@wsong
Copy link
Contributor Author

wsong commented Oct 19, 2017

If this is working as intended, then that's fine; my point is just that this is somewhat surprising for global services. Replicated services always have at least one Pending task, even if they are not schedulable. If global services are not schedulable, however, then their task list is empty. This means that a docker service update --detach=false call might hang forever or it might work, depending on the current state of your node list. That's fairly unintuitive for users (or at least, it was for me).

@dnephin
Copy link
Contributor

dnephin commented Nov 2, 2017

Opened #665 to add a timeout. Closing this issue as I believe it's working as intended, and #663 is an issue for adding a timeout.

@dnephin dnephin closed this as completed Nov 2, 2017
djmaze pushed a commit to containrrr/shepherd that referenced this issue Jul 6, 2023
#104)

* fix: docker service update with `--detach=false` hangs on services with 0 tasks

read more here: docker/cli#627

* refactor: use $(...) notation instead of legacy backticks `...`

* refactor: `tasks_num` variable renamed to `num_tasks`

* refactor: use `-f` for filtering service by name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants