Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent terminology and more ovbious deploy failure on task page #966

Conversation

@Calvinp
Copy link
Contributor

@Calvinp Calvinp commented Mar 21, 2016

This includes several UI improvements around task and deploy failures:

On the task page:

  • Healthcheck failed notification now states that the task was killed, rather than task failed
  • If the healthcheck failed because a connection was refused, the healthcheck failed notification will state that it was probably a problem with the app, not the cluster
  • The healthcheck notification will not display if the deploy failed and killed the task after one healthcheck ran and failed (but not enough to kill the task)
  • If this task caused the deploy to fail, it will have a danger notification saying this (and linking to other causes)
  • If this task's deploy failed, but this task was not a cause of that failure, then it will have a warn notification. This notification will state if the task was killed because the deploy failed, and it will link to failure causes.

On the deploy page:

  • Rather than say (n) tasks failed, now the message states how many causes of deploy failure there were. Those two numbers don't always match and caused some confusion.
  • The links to the tasks that failed will now have the taskIds bolded to make it more clear that they are links to the task pages.
Calvinp added 5 commits Mar 18, 2016
…failed; Healthcheck failure message will mention user app if connection refused; Healthcheck failure message will say 'task killed' instead of 'task failed'
… task that failed the deploy
@@ -18,12 +18,24 @@ class taskHealthcheckNotificationSubview extends View
@listenTo @model, 'sync', @render
@listenTo @pendingDeploys, 'sync', @render

deployFailureKilledTask: =>

This comment has been minimized.

@ssalinas

ssalinas Mar 22, 2016
Member

Doesn't look like this method is used. You can probably eliminate the section below where it is calculated in renderData and just use this

This comment has been minimized.

@Calvinp

Calvinp Mar 23, 2016
Author Contributor

I was getting errors using the function at first, but it seems to be working now so I'll use the function.

synced: @model.synced
config: config
tooManyRetries: @model.get('healthcheckResults').length > maxRetries and maxRetries != 0
numberFailed: @model.get('healthcheckResults').length
secondsElapsed: healthTimeoutSeconds
doNotDisplayHealthcheckNotification: deployFailureKilledTask

healthcheckFailureReasonMessage: () ->

This comment has been minimized.

@ssalinas

ssalinas Mar 22, 2016
Member

A few things here. First, the method name makes me think we are calculating reasons for any healthcheck failure, when in fact we are only looking for connection refused.

Second thing, the connection refused message isn't very clear. The general confusion is that users see 'connection refused to (slave host):(port)' and assume it's a problem with the slave and don't think to check their own app. We need to clarify that the connection refused was to the app (i.e. the app did not start or wasn't listening on the port). Maybe something more like:

'Healthcheck failed due to a refused connection. It is possible your app did not start properly or was not listening on the anticipated port'

edit: Probably good to point then to the logs or something in that, and if we have it available even say the port we expected the app to be listening on

This comment has been minimized.

@Calvinp

Calvinp Mar 23, 2016
Author Contributor

I named that healthcheckFailureReasonMessage because I figured if we wanted to add something else in there later we could do so in that function rather than have the logic spread around. For now, yes, it only looks for connection refused.

I will modify the message to make it more clear.

@@ -0,0 +1,44 @@
{{#ifEqual deploy.deployResult.deployState "FAILED"}}

This comment has been minimized.

@ssalinas

ssalinas Mar 22, 2016
Member

if our task 404s you are not fetching the deploy info and are instead setting it to '', this if statement will throw 'has no attribute deployResult ...' type errors. Probably want to check that deploy is defined first

This comment has been minimized.

@Calvinp

Calvinp Mar 23, 2016
Author Contributor

Fixed

@Calvinp Calvinp added the hs_staging label Mar 23, 2016
@Calvinp Calvinp added the hs_qa label Mar 24, 2016
@Calvinp
Copy link
Contributor Author

@Calvinp Calvinp commented Mar 24, 2016

A task's failed healthchecks caused its deploy to fail. There weren't enough failed healthchecks to fail the task, but there were enough to fail the deploy. This then caused the task to be killed.

It is probably appropriate to show the healthcheck notification in this edge case. It is not being shown right now because this PR doesn't show the notification when the failing deploy caused the task to fail. Maybe if this task is one of the deploy failure causes, show the healthcheck notification anyway. That might cause other edge case failures (task triggers a deploy failure for some reason other than failing healthchecks -> deploy fails -> task killed), but I'm not sure how to detect and account for all possibilities.

@Calvinp Calvinp added the hs_stable label Mar 25, 2016
@ssalinas ssalinas modified the milestone: 0.4.12 Apr 1, 2016
@ssalinas
Copy link
Member

@ssalinas ssalinas commented Apr 5, 2016

This has been looking good, messaging is much more clear, thanks @Calvinp

@ssalinas ssalinas merged commit 4ec150d into master Apr 5, 2016
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/push The Travis CI build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ssalinas ssalinas deleted the consistent-terminology-more-ovbious-deploy-failure-on-task-page branch Apr 5, 2016
@ssalinas ssalinas removed hs_qa labels Apr 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.