New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: Use TaskStatus Err field for non-terminal errors #2287

Merged
merged 1 commit into from Oct 9, 2017

Conversation

Projects
None yet
6 participants
@aaronlehmann
Collaborator

aaronlehmann commented Jun 27, 2017

There are some cases when a task can't advance from a particular state because preconditions are not met. For example, if no nodes meet its constraints, it will not advance to "assigned".

Currently, we put a note about this in the Message field of TaskStatus, but this is not surfaced to the user. It wouldn't make sense to expose Message prominently because it usually contains uninteresting notes about how the task reached that state. Its presence does not indicate that something is wrong.

Expand the scope of Err to also cover non-terminal errors that are blocking the task from progressing, and use it in those cases.

Ideally we would also use this to surface IP allocation failures that block a task in "new", but the current design of the allocator would loop if we updated a task on failure. This might be something for a followup.

cc @aluzzardi @stevvooe

api: Use TaskStatus Err field for non-terminal errors
There are some cases when a task can't advance from a particular state
because preconditions are not met. For example, if no nodes meet its
constraints, it will not advance to "assigned".

Currently, we put a note about this in the Message field of TaskStatus,
but this is not surfaced to the user. It wouldn't make sense to expose
Message prominently because it usually contains uninteresting notes
about how the task reached that state. Its presence does not indicate
that something is wrong.

Expand the scope of Err to also cover non-terminal errors that are
blocking the task from progressing, and use it in those cases.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Jun 27, 2017

Codecov Report

Merging #2287 into master will increase coverage by 0.03%.
The diff coverage is 92.85%.

@@            Coverage Diff            @@
##           master   #2287      +/-   ##
=========================================
+ Coverage   61.06%   61.1%   +0.03%     
=========================================
  Files         128     128              
  Lines       20532   20538       +6     
=========================================
+ Hits        12538   12549      +11     
+ Misses       6611    6602       -9     
- Partials     1383    1387       +4

codecov bot commented Jun 27, 2017

Codecov Report

Merging #2287 into master will increase coverage by 0.03%.
The diff coverage is 92.85%.

@@            Coverage Diff            @@
##           master   #2287      +/-   ##
=========================================
+ Coverage   61.06%   61.1%   +0.03%     
=========================================
  Files         128     128              
  Lines       20532   20538       +6     
=========================================
+ Hits        12538   12549      +11     
+ Misses       6611    6602       -9     
- Partials     1383    1387       +4
@stevvooe

This comment has been minimized.

Show comment
Hide comment
@stevvooe

stevvooe Jun 27, 2017

Contributor

@aaronlehmann Why don't consumers just print the message field correctly?

Contributor

stevvooe commented Jun 27, 2017

@aaronlehmann Why don't consumers just print the message field correctly?

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jun 27, 2017

Collaborator

It is unclear when the Message field contains actionable information.

Collaborator

aaronlehmann commented Jun 27, 2017

It is unclear when the Message field contains actionable information.

@stevvooe

This comment has been minimized.

Show comment
Hide comment
@stevvooe

stevvooe Jun 27, 2017

Contributor

Err is meant to only be actionable when the task is in the error state. I understand that this intends to change that, but the usage of Err is quite clear in that it is valid when in an error state, which is strictly defined as the error states.

The use case that is being described here is exactly what the message field is designed for:

// Message reports a message for the task status. This should provide a
// human readable message that can point to how the task actually arrived
// at a current state.

Do we need something to indicate that a task is stalled?

Contributor

stevvooe commented Jun 27, 2017

Err is meant to only be actionable when the task is in the error state. I understand that this intends to change that, but the usage of Err is quite clear in that it is valid when in an error state, which is strictly defined as the error states.

The use case that is being described here is exactly what the message field is designed for:

// Message reports a message for the task status. This should provide a
// human readable message that can point to how the task actually arrived
// at a current state.

Do we need something to indicate that a task is stalled?

@aluzzardi

This comment has been minimized.

Show comment
Hide comment
@aluzzardi

aluzzardi Jun 27, 2017

Contributor

In this case, tasks are stalled indefinitely - for instance not all constraints were met or we reached network exhaustion.

The operator needs to have this information surfaced somehow since state convergence cannot go forward until action is taken.

Currently, Message contains a bunch of trivial information (e.g. "the task is starting") which doesn't make it suitable for carrying important messages.

Long story short, this needs to be blinking red in some ops' dashboard.

Contributor

aluzzardi commented Jun 27, 2017

In this case, tasks are stalled indefinitely - for instance not all constraints were met or we reached network exhaustion.

The operator needs to have this information surfaced somehow since state convergence cannot go forward until action is taken.

Currently, Message contains a bunch of trivial information (e.g. "the task is starting") which doesn't make it suitable for carrying important messages.

Long story short, this needs to be blinking red in some ops' dashboard.

@stevvooe

This comment has been minimized.

Show comment
Hide comment
@stevvooe

stevvooe Jun 27, 2017

Contributor

@aluzzardi I get what you are trying to do, but I am asking if there is a better way to do it. Overloading the Err field will have the same problems. As of now, its contents are undefined when not in an error state. Expanding the scope of that field is ambiguous.

If a task is stalled, do we have an indicator of that state? For example, a boolean field, indicating that the task is stalled and to check the message field would make sense here.

Contributor

stevvooe commented Jun 27, 2017

@aluzzardi I get what you are trying to do, but I am asking if there is a better way to do it. Overloading the Err field will have the same problems. As of now, its contents are undefined when not in an error state. Expanding the scope of that field is ambiguous.

If a task is stalled, do we have an indicator of that state? For example, a boolean field, indicating that the task is stalled and to check the message field would make sense here.

@wsong

This comment has been minimized.

Show comment
Hide comment
@wsong

wsong Sep 1, 2017

What is the status of this PR? Are we planning on moving this forward? It would be very helpful for users to be able to see when their tasks are in a state where the user must take action for them to be scheduled (e.g. network address exhaustion).

wsong commented Sep 1, 2017

What is the status of this PR? Are we planning on moving this forward? It would be very helpful for users to be able to see when their tasks are in a state where the user must take action for them to be scheduled (e.g. network address exhaustion).

@stevvooe

This comment has been minimized.

Show comment
Hide comment
@stevvooe

stevvooe Sep 1, 2017

Contributor

LGTM

Looking at this again, I think we can go forward with it. My concerns are little abstract, but I think that this will work.

@aluzzardi How should we move this forward? It looks like it will merge cleanly.

Contributor

stevvooe commented Sep 1, 2017

LGTM

Looking at this again, I think we can go forward with it. My concerns are little abstract, but I think that this will work.

@aluzzardi How should we move this forward? It looks like it will merge cleanly.

@nishanttotla

LGTM

@@ -159,7 +159,8 @@ loop:
// restarting the task on another node
// (if applicable).
t.Status.State = api.TaskStateRejected
t.Status.Message = "assigned node no longer meets constraints"
t.Status.Message = "task rejected by constraint enforcer"
t.Status.Err = "assigned node no longer meets constraints"

This comment has been minimized.

@anshulpundir

anshulpundir Sep 29, 2017

Contributor

I feel the Message/Err should be swapped here. The error on the task should be related to the task, so "task rejected by constraint enforcer" seems more relevant.

Also, does it make sense to be more specific around what constraints were failed ?

@anshulpundir

anshulpundir Sep 29, 2017

Contributor

I feel the Message/Err should be swapped here. The error on the task should be related to the task, so "task rejected by constraint enforcer" seems more relevant.

Also, does it make sense to be more specific around what constraints were failed ?

This comment has been minimized.

@stevvooe

stevvooe Sep 29, 2017

Contributor

The problem is that we don't display Err on the command line interface. Message is used to provided better reporting to the user.

@stevvooe

stevvooe Sep 29, 2017

Contributor

The problem is that we don't display Err on the command line interface. Message is used to provided better reporting to the user.

@anshulpundir

This comment has been minimized.

Show comment
Hide comment
@anshulpundir

anshulpundir Sep 29, 2017

Contributor

Looks good, one question/comment, which can probably be addressed later if we need to merge it right away @nishanttotla

Contributor

anshulpundir commented Sep 29, 2017

Looks good, one question/comment, which can probably be addressed later if we need to merge it right away @nishanttotla

@nishanttotla nishanttotla merged commit 28f91d8 into docker:master Oct 9, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/project 61.1% (target 0%)
Details
dco-signed All commits are signed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment