New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carry "Don't delete tasks until they're actually removed by the agent" #2461
Conversation
return nil | ||
} | ||
|
||
checkDo(ctx, t, task, ctlr, &api.TaskStatus{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add one more case here that ensures a COMPLETED
(or any terminal state) does not proceed to shutdown. It should only proceed to shutdown if the container is not already terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevvooe do you mean like this?
// TestDesiredStateRemoveOnlyNonterminal checks that the agent will only stop
// a container on REMOVE if it's not already in a terminal state. If the
// container is already in a terminal state, (like COMPLETE) the agent should
// take no action
func TestDesiredStateRemoveOnlyNonterminal(t *testing.T) {
// go through all terminal states, just for completeness' sake
for _, state := range []api.TaskState{
api.TaskStateCompleted,
api.TaskStateShutdown,
api.TaskStateFailed,
api.TaskStateRejected,
api.TaskStateRemove,
// no TaskStateOrphaned becaused that's not a state the task can be in
// on the agent
} {
// capture state variable here to run in parallel
state := state
t.Run(state.String(), func(t *testing.T) {
// go parallel to go faster
t.Parallel()
var (
// create a new task, actual state `state`, desired state
// shutdown
task = newTestTask(t, state, api.TaskStateShutdown)
ctx, ctlr, finish = buildTestEnv(t, task)
)
// make the shutdown function a noop
ctlr.ShutdownFn = func(_ context.Context) error {
return nil
}
// Note we check for error ErrTaskNoop, which will be raised
// because nothing happens
checkDo(ctx, t, task, ctlr, &api.TaskStatus{
State: state,
}, ErrTaskNoop)
defer func() {
finish()
// we should never have called shutdown
assert.Equal(t, 0, ctlr.calls["Shutdown"])
}()
})
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -44,8 +46,23 @@ func DeleteServiceTasks(ctx context.Context, s *store.MemoryStore, service *api. | |||
err = s.Batch(func(batch *store.Batch) error { | |||
for _, t := range tasks { | |||
err := batch.Update(func(tx store.Tx) error { | |||
if err := store.DeleteTask(tx, t.ID); err != nil { | |||
log.G(ctx).WithError(err).Errorf("failed to delete task") | |||
// time travel is not allowed. if the current desired state is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bug should be filed to have a function that enforces this "math".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Albeit, "desired state" doesn't have the same constraints as the state field, so this is less important.
"DesiredState" is typically directly observable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear what you mean by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the lack of clarity. I was just pointing out that we should probably always do this check when we change the state. I am not sure it should be done in this PR, as it may introduce a regression.
api/types.proto
Outdated
// TaskStateOrphaned is used to free up resources associated with service | ||
// tasks on unresponsive nodes without having to delete those tasks. This | ||
// state is directly assigned to the task by the orchestrator. | ||
ORPHANED = 832 [(gogoproto.enumvalue_customname)="TaskStateOrphaned"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something went wrong with indentation here.
07682a0
to
be6a62e
Compare
Codecov Report
@@ Coverage Diff @@
## master #2461 +/- ##
==========================================
- Coverage 63.74% 60.53% -3.22%
==========================================
Files 64 128 +64
Lines 11793 26433 +14640
==========================================
+ Hits 7517 16000 +8483
- Misses 3662 9031 +5369
- Partials 614 1402 +788 |
be6a62e
to
f8f2a85
Compare
LGTM after adding the extra test. |
I'm kicking off a rebuild not because I'm trying to ram this through but because I want to see if the same error happens twice. |
This should be ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest squashing the commits into a single commit:
Set task desired state to REMOVE for service removal and scale down …
Add unit tests for Task state REMOVE behavior …
Fix some nits …
Set task desired state to REMOVE for service removal and scale down. Tasks in REMOVE state can be deleted after they are reported SHUTDOWN. Avoids a race conditions where network resources are freed before the running task has shut down fully. Add unit tests for Task state REMOVE behavior. Started by Nishant Totla <nishanttotla@gmail>, finished by Drew Erny <drew.erny@docker.com>. Signed-off-by: Nishant Totla <nishanttotla@gmail.com> Signed-off-by: Drew Erny <drew.erny@docker.com>
f8f2a85
to
b9e2c2f
Compare
@anshulpundir rebased and squashed. Reworded commit message. |
}) | ||
|
||
if len(tasks) > 0 { | ||
for _, t := range tasks { | ||
if len(orphanedTasks)+len(removeTasks) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this if
statement is superfluous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but if I leave it in, when I come back to this code in several months and I'm like, "what doofus made this junk, who's responsible for this?" it will be a very humbling experience.
BTW my suggestion was to squash the last three commits. Its prob OK to have the protobuf change as a separate commit. But I guess this is fine too @dperny |
I think it's fine to have the commits squashed together. Putting the proto changes with the code changes helps put them into context. And we don't often go through the history anyway. |
Set task desired state to REMOVE for service removal and scale down. Tasks in REMOVE state can be deleted after they are reported SHUTDOWN. Avoids a race conditions where network resources are freed before the running task has shut down fully. Add unit tests for Task state REMOVE behavior. Started by Nishant Totla <nishanttotla@gmail>, finished by Drew Erny <drew.erny@docker.com>. Signed-off-by: Nishant Totla <nishanttotla@gmail.com> Signed-off-by: Drew Erny <drew.erny@docker.com> Backports moby#2461 to 17.06. Cherry pick did not apply cleanly (cherry picked from commit b9e2c2f) Signed-off-by: Drew Erny <drew.erny@docker.com>
Finishes up the work on #2446, because @nishanttotla is unavailable. Added one commit addressing a few nits. See #2446 for original discussion. I'd have just pushed to his branch but I don't have permissions.
/cc @aaronlehmann, @stevvooe, and @anshulpundir, who were involved in the discussion on that PR.