Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling update causes "out of sequence" errors for the client. #1379

Open
tonistiigi opened this issue Aug 16, 2016 · 13 comments
Open

Rolling update causes "out of sequence" errors for the client. #1379

tonistiigi opened this issue Aug 16, 2016 · 13 comments

Comments

@tonistiigi
Copy link
Member

Rolling update internally changes properties for the service, causing the version to increment. This means that user can randomly get "out of sequence" errors. The solution should be to have a separate version for the spec part only so that it only covers all the properties of the service that a user can change.

This issue is causing TestSwarmPublishAdd to fail in Docker CI.

cc @aaronlehmann

@aaronlehmann
Copy link
Collaborator

ping @aluzzardi

@stevvooe
Copy link
Contributor

This is a feature. Clients that use zk or etcd have to deal with this, as well.

This allows the user to ensure that decisions are made on up to date data.

We can probably add retries if we don't like this behavior.

@tonistiigi
Copy link
Member Author

@stevvooe "up to date data" is an arbitrary concept. Some users may want to make a decision based on the number of tasks running, the network used by service changing etc. I don't see why we should pick some random ones (UpdateStatus in this case) and provide the user a way to have a dependency on that field, while these fields are no way even visible to our client during update.

The versioning is very useful for avoiding collisions between user requests and making sure that the update matches user's intention. I'm just suggesting that we would limit that only to the fields that user can actually control.

@stevvooe
Copy link
Contributor

@tonistiigi Anything that is observable by the user is a possible change in the data dependency, meaning that the user may have made a decision on it.

Breaking this guarantee will have untold effects on the system. If we want there to be different versioning semantics, we need to have separate objects. That is the data model. When we break the data model, we break the abstraction and make the system harder to reason about.

If we were building a consistent key value store, this wouldn't even be a question. It is wrong to apply updates to an object that are made with out of date data and usually APIs require something to get around those style of guarantees.

If we bypass this, we may as well get rid of raft, as it is a waste.

@aaronlehmann
Copy link
Collaborator

@stevvooe: I really don't understand your perspective. The user can only change the spec. Why would changes to the spec be informed by real-time progress of a rolling update? How would that even be possible with the current CLI design, since there's no check/set primitive? (docker service update doesn't give you a way to only update if the rolling update is in a certain state, and I don't see us adding that).

You're saying it's okay for updates to separate objects to have different versioning semantics, but why do you draw the line there instead of drawing it between the spec and the observed quantities? The ServiceUpdate control API only takes a service spec, not a whole service. It feels strange to me that the versioning for the broader service object comes into play for this.

@stevvooe
Copy link
Contributor

We basically already have a check/set primitive with versioning.

The user modified fields are in the Spec and the result of that is in the surrounding object, which is versioned as a unit. This is the data model. We don't version sub-components of an object. Introducing this to services will just create an inconsistency in the API that we'll never return from.

And let's not think about the "current design". Let's actually think about the future and avoid introducing more and more micro-complexities. Just because we're not doing something today, doesn't mean we should sacrifice the properties of our data model. This solution just introduces more technical debt that others have to code around in the future.

If we want to version a separate component, let's make a separate component. I don't see what is so hard about that.

@sylvainmouquet
Copy link

The error occures today with docker version 17.07.0-ce

$ docker version

Client:
 Version:      17.07.0-ce
 API version:  1.31
 Go version:   go1.8.3
 Git commit:   8784753
 Built:        Tue Aug 29 17:42:53 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.07.0-ce
 API version:  1.31 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   8784753
 Built:        Tue Aug 29 17:41:43 2017
 OS/Arch:      linux/amd64
 Experimental: false

@vce-xx
Copy link

vce-xx commented Apr 1, 2018

@stevvooe What do you mean ? Shouldn't this be fixed ?

This is a feature.

@bbotte
Copy link

bbotte commented Mar 11, 2020

I use swarm, docker Server Version: 19.03.6 , when update service
docker service update --with-registry-auth service_name --image images_http_address
tips: Error response from daemon: rpc error: code = Unknown desc = update out of sequence

will repair?

@matodrobec
Copy link

Hello,

I use swarm and Docker version 19.03.8. When I update stack

docker stack deploy --with-registry-auth -c ./web.yml api-dev

I am getting:

 Updating service api-dev_webserver (id: zz5veiexnvsgleroomr2snecy)
 Updating service api-dev_data (id: v4hoc42ajz5jv17rsf7uwlcgj)
 failed to update service api-dev_data: Error response from daemon: rpc error: code = Unknown desc = update out of sequence

@mhemrg
Copy link

mhemrg commented May 8, 2020

I'm using Docker Swarm's API to update services and sometimes I'm facing this issue. Is it safe to retry the operation with a new index key when I get an update out of sequence error?

@trajano
Copy link

trajano commented May 11, 2020

@trajano
Copy link

trajano commented May 11, 2020

I think this sort of synchronization issues would've been solved by traditional database systems or even Kafka. I wonder if a traditional database or Kafka for scale can be used for the state backend. That would at least defer the responsibility out of Swarm Kit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants