Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsTaskDirty: Ignore PullOptions for running tasks #2351

Merged
merged 1 commit into from
Aug 17, 2017

Conversation

jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Aug 14, 2017

This patch causes the orchestrator to ignore the PullOptions field of a ContainerSpec when determining whether a task is considered to be 'dirty'. It is only ignored if and only if the current state of the task is either READY, STARTING, or RUNNING.

Related to #971

@stevvooe
Copy link
Contributor

Only do this at the point in the orchestrator where tasks are being compared. Doing this globally will break other update functionality, such as the dispatcher pushes and agent task manager updates (probably okay if that one is ignored).

@jlhawn
Copy link
Contributor Author

jlhawn commented Aug 14, 2017

@stevvooe I'm not familiar enough with this repo to know where the point in the orchestrator where tasks are being compared is. A quick search of the codebase for calls to equality.TasksEqualStable yields these 3 locations:

https://github.com/docker/swarmkit/blob/b5ca4a25b6351d86a89ae6fba57fa85e0bc0eb1d/agent/task.go#L189

https://github.com/docker/swarmkit/blob/b5ca4a25b6351d86a89ae6fba57fa85e0bc0eb1d/manager/dispatcher/assignments.go#L224

https://github.com/docker/swarmkit/blob/b5ca4a25b6351d86a89ae6fba57fa85e0bc0eb1d/manager/dispatcher/dispatcher.go#L782

Are any of these 3 the location you were thinking of?

@stevvooe
Copy link
Contributor

There are two ways:

  1. https://github.com/docker/swarmkit/blob/master/manager/controlapi/service.go#L728 would allow checking whether the update is actually a change and ignore it, right at the AP.

  2. https://github.com/docker/swarmkit/blob/master/manager/orchestrator/task.go#L62 would influence whether the orchestrator believes the task to be "dirty" and issue updates if that is the case. Check usages there to see if it makes sense.

Number 1 is kind of a hack. Number 2 seems to have some caveats but seems doable.

Let me know if that is enough to go on.

@codecov
Copy link

codecov bot commented Aug 14, 2017

Codecov Report

Merging #2351 into master will increase coverage by 0.07%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2351      +/-   ##
==========================================
+ Coverage   59.96%   60.04%   +0.07%     
==========================================
  Files         128      128              
  Lines       26183    26183              
==========================================
+ Hits        15701    15721      +20     
+ Misses       9092     9066      -26     
- Partials     1390     1396       +6

@jlhawn jlhawn changed the title Task Equality: Ignore container image pull options Orchestrator: IsTaskDirty: Ignore PullOptions Aug 14, 2017
@jlhawn
Copy link
Contributor Author

jlhawn commented Aug 14, 2017

@stevvooe I've updated this patch to only apply to the orchestrator.IsTaskDirty function (which is used by the updater).

@stevvooe
Copy link
Contributor

stevvooe commented Aug 14, 2017

LGTM

Make sure to test this one thoroughly.

@aaronlehmann
Copy link
Collaborator

I don't think this is a good idea. With this change, if you deployed a service with incorrect credentials, updating those credentials would not necessarily correct the problem.

It would happen to work with a restart policy of "always" or "on failure", because the service update would trigger a reconciliation, and the orchestrator would start new tasks with the updated spec. But if the restart policy is "never" or the service is running up against the restart limit, the orchestrator will check if anything has changed in the service spec to merit forcing a restart, and this hack will tell it that nothing has changed.

There are workarounds for this, such as making this hack apply only to tasks with DesiredState <= api.TaskStateRunning (that way, a change to credentials won't replace a running task, but will cause a dead task to be replaced). But I find this hack on top of a hack quite unpleasant. My preference would be to keep things simple and say that if you change anything in ContainerSpec, that will always kick off a rolling update. Ideally, we should avoid rotating credentials frequently, and/or set up rolling update parameters so that the rolling updates aren't disruptive.

But if this must be done, I suppose the extra hack I mentioned would work.

If you do move forward with this, I'd recommend writing a test that runs with various restart policies, starting with a 1-replica service and an up-to-date task with desired state "shutdown", updating the service to change PullOptions, and confirming that this change causes a new task to be created.

@andrewhsu
Copy link
Member

Is it possible to have a test case for this in this repo? If not for the test case described by @aaronlehmann at least something that can verify an expected change in behaviour with this patch that is desirable?

@jlhawn
Copy link
Contributor Author

jlhawn commented Aug 15, 2017

@andrewhsu I can throw together a unit test for the IsTaskDirty function.

@stevvooe
Copy link
Contributor

@aaronlehmann Thanks for providing input here!

This was definitely a concern I brought up. Looking at the various call sites for IsTaskDirty, it looks like there might be a spot where this can have the desired effect without creating the described impact. Is there a specific spot you had in mind?

@andrewhsu Yes, these situations should be testable in this repo.

@andrewhsu I can throw together a unit test for the IsTaskDirty function.

@jlhawn Based on @aaronlehmann's feedback, I am not quite sure that is enough. There are cases where we want the dirty task to propagate and cases where we don't. It seems like we just need to do this comparison only at certain callsites.

@jlhawn
Copy link
Contributor Author

jlhawn commented Aug 15, 2017

There are workarounds for this, such as making this hack apply only to tasks with DesiredState <= api.TaskStateRunning (that way, a change to credentials won't replace a running task, but will cause a dead task to be replaced).

Why would we want to look at the desired state of the task? Wouldn't we only want to ignore the PullOptions field if-and-only-if the current state of the task is api.TaskStateRunning?

@stevvooe
Copy link
Contributor

Why would we want to look at the desired state of the task? Wouldn't we only want to ignore the PullOptions field if-and-only-if the current state of the task is api.TaskStateRunning?

He is not suggesting that you look at the desired state. He is saying that the logic should be outside of IsTaskDirty because we don't want these changes to always be ignored for the dirty test.

@aaronlehmann
Copy link
Collaborator

aaronlehmann commented Aug 15, 2017 via email

@@ -67,7 +67,28 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool {
return false
}

return !reflect.DeepEqual(s.Spec.Task, t.Spec) ||
// Make shallow copies for the comparison.
specA, specB := s.Spec.Task, t.Spec
Copy link
Contributor

@anshulpundir anshulpundir Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would suggest renaming these to specService, specTask or something along those lines to make it more suggestive and readable. Same for other variable names below.

@jlhawn jlhawn changed the title Orchestrator: IsTaskDirty: Ignore PullOptions Updater.isTaskDirty: Ignore PullOptions Aug 16, 2017
@jlhawn jlhawn changed the title Updater.isTaskDirty: Ignore PullOptions IsTaskDirty: Ignore PullOptions for running tasks Aug 16, 2017
// handle updates.
// See https://github.com/docker/swarmkit/issues/971
currentState := t.Status.State
ignorePullOpts := api.TaskStateReady <= currentState && currentState <= api.TaskStateRunning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add && t.DesiredState <= api.TaskStateRunning

This will make sure you're not ignoring pull options for a task that the orchestrator has decided shouldn't run anymore (and therefore, might need to be replaced if pull options are updated), for example if the node where it was running failed. t.Status.State is self-reported by the node, so when a node fails, this field can be stale for some time.

Generally the orchestrator only looks at DesiredState, except when detecting a task failure and setting DesiredState to Shutdown. This gives us a single source of truth and avoids making decisions based on out-of-date information. But I see what you're trying to do here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I discussed this logic with @aluzzardi and we settled on at least only ignoring pull options if we know for certain it wont affect the task at all, i.e., the worker already reported that it has the image it was supposed to use (so it's current status is either ready, starting, or running). Even if it is desired to be shutdown, at least we know that it didn't fail because it couldn't pull the image.

There's also logic later on in the updater which will create new tasks anyway if it's desired state is greater than running, even if it was not found to be "dirty": https://github.com/docker/swarmkit/blob/6716ddf5808932a56be3aa1af8510c306ed7145f/manager/orchestrator/update/updater.go#L323-L331

// handle updates.
// See https://github.com/docker/swarmkit/issues/971
currentState := t.Status.State
ignorePullOpts := api.TaskStateReady <= currentState && currentState <= api.TaskStateRunning && t.DesiredState <= api.TaskStateRunning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you change this to currentState >= api.TaskStateReady? I think it's easier to read

Copy link
Contributor Author

@jlhawn jlhawn Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. What I was originally going for here was to make it read more like Python where you can do things like:

api.TaskStateReady <= current_state <= api.TaskStateRunning

which just expands into:

api.TaskStateReady <= current_state and current_state <= api.TaskStateRunning

So when you look at it it makes it more clear that current state is between ready and running.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured that out a few moments later (the range expression), but at first my brain was having a hard time computing api.TaskStateReady <= currentState

@@ -67,7 +67,31 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool {
return false
}

return !reflect.DeepEqual(s.Spec.Task, t.Spec) ||
// Make shallow copy of the service for the comparison.
service := *s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be safer to deep copy (e.g. s.Copy()) before tweaking the object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, I didn't know that existed. I'll check it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We codegen deep copiers (.Copy()) to all the proto types

@aluzzardi
Copy link
Member

LGTM, couple of nits

This patch causes the orchestrator to ignore the PullOptions field of a
ContainerSpec when determining whether a task is considered to be 'dirty'.
It is only ignored if and only if the current state of the task is either
READY, STARTING, or RUNNING.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
@aluzzardi aluzzardi merged commit 163a8c2 into moby:master Aug 17, 2017
@@ -67,7 +67,29 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool {
return false
}

return !reflect.DeepEqual(s.Spec.Task, t.Spec) ||
// Make a deep copy of the service and task spec for the comparison.
serviceTaskSpec := *s.Spec.Task.Copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, @jlhawn, shouldn't we not dereference the pointer here now that we're doing a deep copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline but posting here to follow-up: The call to DeepEqual() later on is comparing it to a non-pointer api.TaskSpec. I could have either dereferenced it here or there, but what's important is that it is still calling DeepEqual() with the same types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants