Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task alerts #721

Merged
merged 18 commits into from Oct 15, 2015
Merged

Task alerts #721

merged 18 commits into from Oct 15, 2015

Conversation

@kwm4385
Copy link
Contributor

kwm4385 commented Oct 6, 2015

Show an alert on the task page if it is a scheduled task that has been running for longer than twice the average for tasks in the request. /cc @tpetr

screen shot 2015-10-06 at 3 59 44 pm

kwm4385 added 7 commits Oct 6, 2015
Sp
@kwm4385
Copy link
Contributor Author

kwm4385 commented Oct 8, 2015

Alert if a task is killed due to slave decommissioning.

screen shot 2015-10-08 at 11 53 59 am

getAlerts: (requestTaskHistory) =>
task = @models.task
alerts = []
# Is this a scheduled task that has been running much longer than previous ones?

This comment has been minimized.

@tpetr

tpetr Oct 14, 2015 Member

instead of calculating this yourself, you should use averageRuntimeMillis in the SingularityDeployStatistics object, which can be grabbed from the deploy history endpoint (.../api/history/request/REQUEST_ID/deploy/DEPLOY_ID). it would also be wise to surface warnIfScheduledJobIsRunningPastNextRunPct from https://github.com/HubSpot/Singularity/blob/master/SingularityService/src/main/java/com/hubspot/singularity/config/SingularityConfiguration.java#L172 into the window.config object so that we don't have to hardcode the threshold value

title: 'Warning:',
message: 'This scheduled task has been running longer than twice the average for the request and may be stuck.',
level: 'warning'
# Was this task killed by a discomissioning slave?

This comment has been minimized.

@tpetr

tpetr Oct 14, 2015 Member

typo

kwm4385 added 3 commits Oct 14, 2015
@kwm4385
Copy link
Contributor Author

kwm4385 commented Oct 14, 2015

Updated

@@ -63,6 +64,8 @@ public String getFinishedTaskLogPath() {
private boolean hideNewDeployButton = false;
private boolean hideNewRequestButton = false;

private Optional<Integer> warnIfScheduledJobIsRunningPastNextRunPct;

This comment has been minimized.

@tpetr

tpetr Oct 14, 2015 Member

i don't think this is necessary -- using the value inside SingularityConfiguration is probably good enough

if current > (avg * threshold)
alerts.push
title: 'Warning:',
message: 'This scheduled task has been running longer than twice the average for the request and may be stuck.',

This comment has been minimized.

@tpetr

tpetr Oct 14, 2015 Member

can you fix the message to mention the percentage instead of "twice" ?

@@ -53,6 +53,7 @@ exports.config =
runningTaskLogPath: process.env.SINGULARITY_RUNNING_TASK_LOG_PATH ? "stdout"
finishedTaskLogPath: process.env.SINGULARITY_FINISHED_TASK_LOG_PATH ? "stdout"
commonHostnameSuffixToOmit: process.env.SINGULARITY_COMMON_HOSTNAME_SUFFIX_TO_OMIT ? ""
warnIfScheduledJobIsRunningPastNextRunPct: process.env.WARN_IF_SCHEDULED_JOB_IS_RUNNING_PAST_NEXT_RUN_PCT ? 200

This comment has been minimized.

@tpetr

tpetr Oct 14, 2015 Member

prefix this with SINGULARITY_ to match all the others

tpetr and others added 4 commits Oct 14, 2015
Plz
deployInfo = new DeployDetails
deployId: deployId
requestId: requestId
deployInfo.fetch().success =>

This comment has been minimized.

@tpetr

tpetr Oct 14, 2015 Member

technically we could only make this call only if it's needed (i.e. if the task is scheduled + currently running)

tpetr added a commit that referenced this pull request Oct 15, 2015
Task alerts
@tpetr tpetr merged commit 6d6a23b into master Oct 15, 2015
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@tpetr tpetr removed hs_qa labels Oct 15, 2015
@tpetr tpetr deleted the task_alerts branch Oct 15, 2015
@tpetr tpetr added this to the 0.4.6 milestone Oct 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.