IsTaskDirty: Ignore PullOptions for running tasks #2351

jlhawn · 2017-08-14T22:02:59Z

This patch causes the orchestrator to ignore the PullOptions field of a ContainerSpec when determining whether a task is considered to be 'dirty'. It is only ignored if and only if the current state of the task is either READY, STARTING, or RUNNING.

Related to #971

stevvooe · 2017-08-14T22:16:42Z

Only do this at the point in the orchestrator where tasks are being compared. Doing this globally will break other update functionality, such as the dispatcher pushes and agent task manager updates (probably okay if that one is ignored).

jlhawn · 2017-08-14T22:24:16Z

@stevvooe I'm not familiar enough with this repo to know where the point in the orchestrator where tasks are being compared is. A quick search of the codebase for calls to equality.TasksEqualStable yields these 3 locations:

https://github.com/docker/swarmkit/blob/b5ca4a25b6351d86a89ae6fba57fa85e0bc0eb1d/agent/task.go#L189

https://github.com/docker/swarmkit/blob/b5ca4a25b6351d86a89ae6fba57fa85e0bc0eb1d/manager/dispatcher/assignments.go#L224

https://github.com/docker/swarmkit/blob/b5ca4a25b6351d86a89ae6fba57fa85e0bc0eb1d/manager/dispatcher/dispatcher.go#L782

Are any of these 3 the location you were thinking of?

stevvooe · 2017-08-14T22:31:21Z

There are two ways:

https://github.com/docker/swarmkit/blob/master/manager/controlapi/service.go#L728 would allow checking whether the update is actually a change and ignore it, right at the AP.
https://github.com/docker/swarmkit/blob/master/manager/orchestrator/task.go#L62 would influence whether the orchestrator believes the task to be "dirty" and issue updates if that is the case. Check usages there to see if it makes sense.

Number 1 is kind of a hack. Number 2 seems to have some caveats but seems doable.

Let me know if that is enough to go on.

codecov · 2017-08-14T22:51:13Z

Codecov Report

Merging #2351 into master will increase coverage by 0.07%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2351      +/-   ##
==========================================
+ Coverage   59.96%   60.04%   +0.07%     
==========================================
  Files         128      128              
  Lines       26183    26183              
==========================================
+ Hits        15701    15721      +20     
+ Misses       9092     9066      -26     
- Partials     1390     1396       +6

jlhawn · 2017-08-14T23:05:08Z

@stevvooe I've updated this patch to only apply to the orchestrator.IsTaskDirty function (which is used by the updater).

stevvooe · 2017-08-14T23:39:39Z

LGTM

Make sure to test this one thoroughly.

aaronlehmann · 2017-08-15T05:00:11Z

I don't think this is a good idea. With this change, if you deployed a service with incorrect credentials, updating those credentials would not necessarily correct the problem.

It would happen to work with a restart policy of "always" or "on failure", because the service update would trigger a reconciliation, and the orchestrator would start new tasks with the updated spec. But if the restart policy is "never" or the service is running up against the restart limit, the orchestrator will check if anything has changed in the service spec to merit forcing a restart, and this hack will tell it that nothing has changed.

There are workarounds for this, such as making this hack apply only to tasks with DesiredState <= api.TaskStateRunning (that way, a change to credentials won't replace a running task, but will cause a dead task to be replaced). But I find this hack on top of a hack quite unpleasant. My preference would be to keep things simple and say that if you change anything in ContainerSpec, that will always kick off a rolling update. Ideally, we should avoid rotating credentials frequently, and/or set up rolling update parameters so that the rolling updates aren't disruptive.

But if this must be done, I suppose the extra hack I mentioned would work.

If you do move forward with this, I'd recommend writing a test that runs with various restart policies, starting with a 1-replica service and an up-to-date task with desired state "shutdown", updating the service to change PullOptions, and confirming that this change causes a new task to be created.

andrewhsu · 2017-08-15T17:16:13Z

Is it possible to have a test case for this in this repo? If not for the test case described by @aaronlehmann at least something that can verify an expected change in behaviour with this patch that is desirable?

jlhawn · 2017-08-15T17:49:45Z

@andrewhsu I can throw together a unit test for the IsTaskDirty function.

stevvooe · 2017-08-15T17:51:15Z

@aaronlehmann Thanks for providing input here!

This was definitely a concern I brought up. Looking at the various call sites for IsTaskDirty, it looks like there might be a spot where this can have the desired effect without creating the described impact. Is there a specific spot you had in mind?

@andrewhsu Yes, these situations should be testable in this repo.

@andrewhsu I can throw together a unit test for the IsTaskDirty function.

@jlhawn Based on @aaronlehmann's feedback, I am not quite sure that is enough. There are cases where we want the dirty task to propagate and cases where we don't. It seems like we just need to do this comparison only at certain callsites.

jlhawn · 2017-08-15T18:09:28Z

There are workarounds for this, such as making this hack apply only to tasks with DesiredState <= api.TaskStateRunning (that way, a change to credentials won't replace a running task, but will cause a dead task to be replaced).

Why would we want to look at the desired state of the task? Wouldn't we only want to ignore the PullOptions field if-and-only-if the current state of the task is api.TaskStateRunning?

stevvooe · 2017-08-15T18:37:09Z

Why would we want to look at the desired state of the task? Wouldn't we only want to ignore the PullOptions field if-and-only-if the current state of the task is api.TaskStateRunning?

He is not suggesting that you look at the desired state. He is saying that the logic should be outside of IsTaskDirty because we don't want these changes to always be ignored for the dirty test.

aaronlehmann · 2017-08-15T18:39:10Z

There are workarounds for this, such as making this hack apply only to tasks with DesiredState <= api.TaskStateRunning (that way, a change to credentials won't replace a running task, but will cause a dead task to be replaced). Why would we want to look at the desired state of the task? Wouldn't we only want to ignore the PullOptions field if-and-only-if the current state of the task is api.TaskStateRunning.

When a task fails, its desired state gets changed to shutdown. Desired state is more general and also covers cases like a failed node that hasn't reported status recently.

anshulpundir · 2017-08-15T19:04:41Z

manager/orchestrator/task.go

@@ -67,7 +67,28 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool {
 		return false
 	}

-	return !reflect.DeepEqual(s.Spec.Task, t.Spec) ||
+	// Make shallow copies for the comparison.
+	specA, specB := s.Spec.Task, t.Spec


nit: I would suggest renaming these to specService, specTask or something along those lines to make it more suggestive and readable. Same for other variable names below.

aaronlehmann · 2017-08-16T07:15:01Z

manager/orchestrator/task.go

+	// handle updates.
+	// See https://github.com/docker/swarmkit/issues/971
+	currentState := t.Status.State
+	ignorePullOpts := api.TaskStateReady <= currentState && currentState <= api.TaskStateRunning


I'd add && t.DesiredState <= api.TaskStateRunning

This will make sure you're not ignoring pull options for a task that the orchestrator has decided shouldn't run anymore (and therefore, might need to be replaced if pull options are updated), for example if the node where it was running failed. t.Status.State is self-reported by the node, so when a node fails, this field can be stale for some time.

Generally the orchestrator only looks at DesiredState, except when detecting a task failure and setting DesiredState to Shutdown. This gives us a single source of truth and avoids making decisions based on out-of-date information. But I see what you're trying to do here.

Sounds good. I discussed this logic with @aluzzardi and we settled on at least only ignoring pull options if we know for certain it wont affect the task at all, i.e., the worker already reported that it has the image it was supposed to use (so it's current status is either ready, starting, or running). Even if it is desired to be shutdown, at least we know that it didn't fail because it couldn't pull the image.

There's also logic later on in the updater which will create new tasks anyway if it's desired state is greater than running, even if it was not found to be "dirty": https://github.com/docker/swarmkit/blob/6716ddf5808932a56be3aa1af8510c306ed7145f/manager/orchestrator/update/updater.go#L323-L331

aluzzardi · 2017-08-16T18:52:07Z

manager/orchestrator/task.go

+	// handle updates.
+	// See https://github.com/docker/swarmkit/issues/971
+	currentState := t.Status.State
+	ignorePullOpts := api.TaskStateReady <= currentState && currentState <= api.TaskStateRunning && t.DesiredState <= api.TaskStateRunning


nit: Can you change this to currentState >= api.TaskStateReady? I think it's easier to read

Okay. What I was originally going for here was to make it read more like Python where you can do things like:

api.TaskStateReady <= current_state <= api.TaskStateRunning

which just expands into:

api.TaskStateReady <= current_state and current_state <= api.TaskStateRunning

So when you look at it it makes it more clear that current state is between ready and running.

Yeah, I figured that out a few moments later (the range expression), but at first my brain was having a hard time computing api.TaskStateReady <= currentState

aluzzardi · 2017-08-16T18:54:29Z

manager/orchestrator/task.go

@@ -67,7 +67,31 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool {
 		return false
 	}

-	return !reflect.DeepEqual(s.Spec.Task, t.Spec) ||
+	// Make shallow copy of the service for the comparison.
+	service := *s


Might be safer to deep copy (e.g. s.Copy()) before tweaking the object

ooh, I didn't know that existed. I'll check it out.

We codegen deep copiers (.Copy()) to all the proto types

aluzzardi · 2017-08-16T18:54:49Z

LGTM, couple of nits

This patch causes the orchestrator to ignore the PullOptions field of a ContainerSpec when determining whether a task is considered to be 'dirty'. It is only ignored if and only if the current state of the task is either READY, STARTING, or RUNNING. Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

aluzzardi · 2017-08-17T16:25:46Z

manager/orchestrator/task.go

@@ -67,7 +67,29 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool {
 		return false
 	}

-	return !reflect.DeepEqual(s.Spec.Task, t.Spec) ||
+	// Make a deep copy of the service and task spec for the comparison.
+	serviceTaskSpec := *s.Spec.Task.Copy()


Actually, @jlhawn, shouldn't we not dereference the pointer here now that we're doing a deep copy?

We discussed this offline but posting here to follow-up: The call to DeepEqual() later on is comparing it to a non-pointer api.TaskSpec. I could have either dereferenced it here or there, but what's important is that it is still calling DeepEqual() with the same types.

jlhawn force-pushed the temp_971 branch from 1a8726b to 1411fd0 Compare August 14, 2017 22:51

jlhawn changed the title ~~Task Equality: Ignore container image pull options~~ Orchestrator: IsTaskDirty: Ignore PullOptions Aug 14, 2017

anshulpundir reviewed Aug 15, 2017

View reviewed changes

jlhawn force-pushed the temp_971 branch from 1411fd0 to d6c8ad7 Compare August 16, 2017 00:43

jlhawn changed the title ~~Orchestrator: IsTaskDirty: Ignore PullOptions~~ Updater.isTaskDirty: Ignore PullOptions Aug 16, 2017

jlhawn force-pushed the temp_971 branch from d6c8ad7 to 939e4b4 Compare August 16, 2017 00:53

jlhawn changed the title ~~Updater.isTaskDirty: Ignore PullOptions~~ IsTaskDirty: Ignore PullOptions for running tasks Aug 16, 2017

aaronlehmann reviewed Aug 16, 2017

View reviewed changes

jlhawn force-pushed the temp_971 branch from 939e4b4 to 6f29414 Compare August 16, 2017 16:07

aluzzardi reviewed Aug 16, 2017

View reviewed changes

jlhawn force-pushed the temp_971 branch from 6f29414 to 2c47a1e Compare August 16, 2017 20:26

aluzzardi approved these changes Aug 16, 2017

View reviewed changes

aluzzardi merged commit 163a8c2 into moby:master Aug 17, 2017

aluzzardi reviewed Aug 17, 2017

View reviewed changes

jlhawn deleted the temp_971 branch August 17, 2017 17:21

andrewhsu mentioned this pull request Aug 25, 2017

[17.06] vndr swarmkit to bring in fix for PullOptions docker-archive/docker-ce#208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IsTaskDirty: Ignore PullOptions for running tasks #2351

IsTaskDirty: Ignore PullOptions for running tasks #2351

jlhawn commented Aug 14, 2017 •

edited

Loading

stevvooe commented Aug 14, 2017

jlhawn commented Aug 14, 2017

stevvooe commented Aug 14, 2017

codecov bot commented Aug 14, 2017 •

edited

Loading

jlhawn commented Aug 14, 2017

stevvooe commented Aug 14, 2017 •

edited

Loading

aaronlehmann commented Aug 15, 2017

andrewhsu commented Aug 15, 2017

jlhawn commented Aug 15, 2017

stevvooe commented Aug 15, 2017

jlhawn commented Aug 15, 2017 •

edited

Loading

stevvooe commented Aug 15, 2017

aaronlehmann commented Aug 15, 2017 via email

anshulpundir Aug 15, 2017 •

edited

Loading

aaronlehmann Aug 16, 2017

jlhawn Aug 16, 2017

aluzzardi Aug 16, 2017

jlhawn Aug 16, 2017 •

edited

Loading

aluzzardi Aug 16, 2017

aluzzardi Aug 16, 2017

jlhawn Aug 16, 2017

aluzzardi Aug 16, 2017

aluzzardi commented Aug 16, 2017

aluzzardi Aug 17, 2017

jlhawn Aug 17, 2017

IsTaskDirty: Ignore PullOptions for running tasks #2351

IsTaskDirty: Ignore PullOptions for running tasks #2351

Conversation

jlhawn commented Aug 14, 2017 • edited Loading

stevvooe commented Aug 14, 2017

jlhawn commented Aug 14, 2017

stevvooe commented Aug 14, 2017

codecov bot commented Aug 14, 2017 • edited Loading

Codecov Report

jlhawn commented Aug 14, 2017

stevvooe commented Aug 14, 2017 • edited Loading

aaronlehmann commented Aug 15, 2017

andrewhsu commented Aug 15, 2017

jlhawn commented Aug 15, 2017

stevvooe commented Aug 15, 2017

jlhawn commented Aug 15, 2017 • edited Loading

stevvooe commented Aug 15, 2017

aaronlehmann commented Aug 15, 2017 via email

anshulpundir Aug 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlhawn Aug 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aluzzardi commented Aug 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlhawn commented Aug 14, 2017 •

edited

Loading

codecov bot commented Aug 14, 2017 •

edited

Loading

stevvooe commented Aug 14, 2017 •

edited

Loading

jlhawn commented Aug 15, 2017 •

edited

Loading

anshulpundir Aug 15, 2017 •

edited

Loading

jlhawn Aug 16, 2017 •

edited

Loading