Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carry "Don't delete tasks until they're actually removed by the agent" #2461

Merged
merged 1 commit into from Nov 29, 2017

Conversation

dperny
Copy link
Collaborator

@dperny dperny commented Nov 28, 2017

Finishes up the work on #2446, because @nishanttotla is unavailable. Added one commit addressing a few nits. See #2446 for original discussion. I'd have just pushed to his branch but I don't have permissions.

/cc @aaronlehmann, @stevvooe, and @anshulpundir, who were involved in the discussion on that PR.

return nil
}

checkDo(ctx, t, task, ctlr, &api.TaskStatus{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one more case here that ensures a COMPLETED (or any terminal state) does not proceed to shutdown. It should only proceed to shutdown if the container is not already terminal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe do you mean like this?

// TestDesiredStateRemoveOnlyNonterminal checks that the agent will only stop
// a container on REMOVE if it's not already in a terminal state. If the
// container is already in a terminal state, (like COMPLETE) the agent should
// take no action
func TestDesiredStateRemoveOnlyNonterminal(t *testing.T) {
	// go through all terminal states, just for completeness' sake
	for _, state := range []api.TaskState{
		api.TaskStateCompleted,
		api.TaskStateShutdown,
		api.TaskStateFailed,
		api.TaskStateRejected,
		api.TaskStateRemove,
		// no TaskStateOrphaned becaused that's not a state the task can be in
		// on the agent
	} {
		// capture state variable here to run in parallel
		state := state
		t.Run(state.String(), func(t *testing.T) {
			// go parallel to go faster
			t.Parallel()
			var (
				// create a new task, actual state `state`, desired state
				// shutdown
				task              = newTestTask(t, state, api.TaskStateShutdown)
				ctx, ctlr, finish = buildTestEnv(t, task)
			)
			// make the shutdown function a noop
			ctlr.ShutdownFn = func(_ context.Context) error {
				return nil
			}

			// Note we check for error ErrTaskNoop, which will be raised
			// because nothing happens
			checkDo(ctx, t, task, ctlr, &api.TaskStatus{
				State: state,
			}, ErrTaskNoop)
			defer func() {
				finish()
				// we should never have called shutdown
				assert.Equal(t, 0, ctlr.calls["Shutdown"])
			}()
		})
	}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -44,8 +46,23 @@ func DeleteServiceTasks(ctx context.Context, s *store.MemoryStore, service *api.
err = s.Batch(func(batch *store.Batch) error {
for _, t := range tasks {
err := batch.Update(func(tx store.Tx) error {
if err := store.DeleteTask(tx, t.ID); err != nil {
log.G(ctx).WithError(err).Errorf("failed to delete task")
// time travel is not allowed. if the current desired state is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bug should be filed to have a function that enforces this "math".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Albeit, "desired state" doesn't have the same constraints as the state field, so this is less important.

"DesiredState" is typically directly observable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear what you mean by this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lack of clarity. I was just pointing out that we should probably always do this check when we change the state. I am not sure it should be done in this PR, as it may introduce a regression.

api/types.proto Outdated
// TaskStateOrphaned is used to free up resources associated with service
// tasks on unresponsive nodes without having to delete those tasks. This
// state is directly assigned to the task by the orchestrator.
ORPHANED = 832 [(gogoproto.enumvalue_customname)="TaskStateOrphaned"];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went wrong with indentation here.

@dperny dperny force-pushed the task-management-improvements branch from 07682a0 to be6a62e Compare November 28, 2017 18:28
@codecov
Copy link

codecov bot commented Nov 28, 2017

Codecov Report

Merging #2461 into master will decrease coverage by 3.21%.
The diff coverage is 57.14%.

@@            Coverage Diff             @@
##           master    #2461      +/-   ##
==========================================
- Coverage   63.74%   60.53%   -3.22%     
==========================================
  Files          64      128      +64     
  Lines       11793    26433   +14640     
==========================================
+ Hits         7517    16000    +8483     
- Misses       3662     9031    +5369     
- Partials      614     1402     +788

@dperny dperny force-pushed the task-management-improvements branch from be6a62e to f8f2a85 Compare November 28, 2017 18:50
@stevvooe
Copy link
Contributor

LGTM after adding the extra test.

@dperny
Copy link
Collaborator Author

dperny commented Nov 28, 2017

I'm kicking off a rebuild not because I'm trying to ram this through but because I want to see if the same error happens twice.

@dperny
Copy link
Collaborator Author

dperny commented Nov 28, 2017

This should be ready to merge.

Copy link
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest squashing the commits into a single commit:

Set task desired state to REMOVE for service removal and scale down …
Add unit tests for Task state REMOVE behavior …
Fix some nits …

Set task desired state to REMOVE for service removal and scale down. Tasks in
REMOVE state can be deleted after they are reported SHUTDOWN. Avoids a race
conditions where network resources are freed before the running task has
shut down fully.

Add unit tests for Task state REMOVE behavior.

Started by Nishant Totla <nishanttotla@gmail>, finished by Drew Erny
<drew.erny@docker.com>.

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
Signed-off-by: Drew Erny <drew.erny@docker.com>
@dperny dperny force-pushed the task-management-improvements branch from f8f2a85 to b9e2c2f Compare November 28, 2017 23:00
@dperny
Copy link
Collaborator Author

dperny commented Nov 28, 2017

@anshulpundir rebased and squashed. Reworded commit message.

})

if len(tasks) > 0 {
for _, t := range tasks {
if len(orphanedTasks)+len(removeTasks) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if statement is superfluous.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but if I leave it in, when I come back to this code in several months and I'm like, "what doofus made this junk, who's responsible for this?" it will be a very humbling experience.

@anshulpundir
Copy link
Contributor

anshulpundir commented Nov 29, 2017

BTW my suggestion was to squash the last three commits. Its prob OK to have the protobuf change as a separate commit. But I guess this is fine too @dperny

@dperny
Copy link
Collaborator Author

dperny commented Nov 29, 2017

I think it's fine to have the commits squashed together. Putting the proto changes with the code changes helps put them into context. And we don't often go through the history anyway.

@anshulpundir anshulpundir merged commit e7946de into moby:master Nov 29, 2017
dperny pushed a commit to dperny/swarmkit-1 that referenced this pull request Nov 29, 2017
Set task desired state to REMOVE for service removal and scale down. Tasks in
REMOVE state can be deleted after they are reported SHUTDOWN. Avoids a race
conditions where network resources are freed before the running task has
shut down fully.

Add unit tests for Task state REMOVE behavior.

Started by Nishant Totla <nishanttotla@gmail>, finished by Drew Erny
<drew.erny@docker.com>.

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
Signed-off-by: Drew Erny <drew.erny@docker.com>

Backports moby#2461 to 17.06. Cherry pick did not apply cleanly

(cherry picked from commit b9e2c2f)
Signed-off-by: Drew Erny <drew.erny@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants