-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IsTaskDirty: Ignore PullOptions for running tasks #2351
Conversation
Only do this at the point in the orchestrator where tasks are being compared. Doing this globally will break other update functionality, such as the dispatcher pushes and agent task manager updates (probably okay if that one is ignored). |
@stevvooe I'm not familiar enough with this repo to know where the point in the orchestrator where tasks are being compared is. A quick search of the codebase for calls to https://github.com/docker/swarmkit/blob/b5ca4a25b6351d86a89ae6fba57fa85e0bc0eb1d/agent/task.go#L189 Are any of these 3 the location you were thinking of? |
There are two ways:
Number 1 is kind of a hack. Number 2 seems to have some caveats but seems doable. Let me know if that is enough to go on. |
Codecov Report
@@ Coverage Diff @@
## master #2351 +/- ##
==========================================
+ Coverage 59.96% 60.04% +0.07%
==========================================
Files 128 128
Lines 26183 26183
==========================================
+ Hits 15701 15721 +20
+ Misses 9092 9066 -26
- Partials 1390 1396 +6 |
@stevvooe I've updated this patch to only apply to the |
LGTM Make sure to test this one thoroughly. |
I don't think this is a good idea. With this change, if you deployed a service with incorrect credentials, updating those credentials would not necessarily correct the problem. It would happen to work with a restart policy of "always" or "on failure", because the service update would trigger a reconciliation, and the orchestrator would start new tasks with the updated spec. But if the restart policy is "never" or the service is running up against the restart limit, the orchestrator will check if anything has changed in the service spec to merit forcing a restart, and this hack will tell it that nothing has changed. There are workarounds for this, such as making this hack apply only to tasks with But if this must be done, I suppose the extra hack I mentioned would work. If you do move forward with this, I'd recommend writing a test that runs with various restart policies, starting with a 1-replica service and an up-to-date task with desired state "shutdown", updating the service to change |
Is it possible to have a test case for this in this repo? If not for the test case described by @aaronlehmann at least something that can verify an expected change in behaviour with this patch that is desirable? |
@andrewhsu I can throw together a unit test for the |
@aaronlehmann Thanks for providing input here! This was definitely a concern I brought up. Looking at the various call sites for @andrewhsu Yes, these situations should be testable in this repo.
@jlhawn Based on @aaronlehmann's feedback, I am not quite sure that is enough. There are cases where we want the dirty task to propagate and cases where we don't. It seems like we just need to do this comparison only at certain callsites. |
Why would we want to look at the desired state of the task? Wouldn't we only want to ignore the |
He is not suggesting that you look at the desired state. He is saying that the logic should be outside of |
There are workarounds for this, such as making this hack apply only to tasks with DesiredState <= api.TaskStateRunning (that way, a change to credentials won't replace a running task, but will cause a dead task to be replaced).
Why would we want to look at the desired state of the task? Wouldn't we only want to ignore the PullOptions field if-and-only-if the current state of the task is api.TaskStateRunning.
When a task fails, its desired state gets changed to shutdown.
Desired state is more general and also covers cases like a failed node that hasn't reported status recently.
|
manager/orchestrator/task.go
Outdated
@@ -67,7 +67,28 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool { | |||
return false | |||
} | |||
|
|||
return !reflect.DeepEqual(s.Spec.Task, t.Spec) || | |||
// Make shallow copies for the comparison. | |||
specA, specB := s.Spec.Task, t.Spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would suggest renaming these to specService, specTask or something along those lines to make it more suggestive and readable. Same for other variable names below.
manager/orchestrator/task.go
Outdated
// handle updates. | ||
// See https://github.com/docker/swarmkit/issues/971 | ||
currentState := t.Status.State | ||
ignorePullOpts := api.TaskStateReady <= currentState && currentState <= api.TaskStateRunning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add && t.DesiredState <= api.TaskStateRunning
This will make sure you're not ignoring pull options for a task that the orchestrator has decided shouldn't run anymore (and therefore, might need to be replaced if pull options are updated), for example if the node where it was running failed. t.Status.State
is self-reported by the node, so when a node fails, this field can be stale for some time.
Generally the orchestrator only looks at DesiredState
, except when detecting a task failure and setting DesiredState
to Shutdown
. This gives us a single source of truth and avoids making decisions based on out-of-date information. But I see what you're trying to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I discussed this logic with @aluzzardi and we settled on at least only ignoring pull options if we know for certain it wont affect the task at all, i.e., the worker already reported that it has the image it was supposed to use (so it's current status is either ready, starting, or running). Even if it is desired to be shutdown, at least we know that it didn't fail because it couldn't pull the image.
There's also logic later on in the updater which will create new tasks anyway if it's desired state is greater than running, even if it was not found to be "dirty": https://github.com/docker/swarmkit/blob/6716ddf5808932a56be3aa1af8510c306ed7145f/manager/orchestrator/update/updater.go#L323-L331
manager/orchestrator/task.go
Outdated
// handle updates. | ||
// See https://github.com/docker/swarmkit/issues/971 | ||
currentState := t.Status.State | ||
ignorePullOpts := api.TaskStateReady <= currentState && currentState <= api.TaskStateRunning && t.DesiredState <= api.TaskStateRunning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you change this to currentState >= api.TaskStateReady
? I think it's easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. What I was originally going for here was to make it read more like Python where you can do things like:
api.TaskStateReady <= current_state <= api.TaskStateRunning
which just expands into:
api.TaskStateReady <= current_state and current_state <= api.TaskStateRunning
So when you look at it it makes it more clear that current state is between ready and running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I figured that out a few moments later (the range expression), but at first my brain was having a hard time computing api.TaskStateReady <= currentState
manager/orchestrator/task.go
Outdated
@@ -67,7 +67,31 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool { | |||
return false | |||
} | |||
|
|||
return !reflect.DeepEqual(s.Spec.Task, t.Spec) || | |||
// Make shallow copy of the service for the comparison. | |||
service := *s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be safer to deep copy (e.g. s.Copy()
) before tweaking the object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, I didn't know that existed. I'll check it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We codegen deep copiers (.Copy()
) to all the proto types
LGTM, couple of nits |
This patch causes the orchestrator to ignore the PullOptions field of a ContainerSpec when determining whether a task is considered to be 'dirty'. It is only ignored if and only if the current state of the task is either READY, STARTING, or RUNNING. Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
@@ -67,7 +67,29 @@ func IsTaskDirty(s *api.Service, t *api.Task) bool { | |||
return false | |||
} | |||
|
|||
return !reflect.DeepEqual(s.Spec.Task, t.Spec) || | |||
// Make a deep copy of the service and task spec for the comparison. | |||
serviceTaskSpec := *s.Spec.Task.Copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, @jlhawn, shouldn't we not dereference the pointer here now that we're doing a deep copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline but posting here to follow-up: The call to DeepEqual()
later on is comparing it to a non-pointer api.TaskSpec
. I could have either dereferenced it here or there, but what's important is that it is still calling DeepEqual()
with the same types.
This patch causes the orchestrator to ignore the PullOptions field of a ContainerSpec when determining whether a task is considered to be 'dirty'. It is only ignored if and only if the current state of the task is either READY, STARTING, or RUNNING.
Related to #971