Skip to content

separate timeouts for health check and task running#1560

Merged
ssalinas merged 5 commits intomasterfrom
healthcheck-timeout
Jun 28, 2017
Merged

separate timeouts for health check and task running#1560
ssalinas merged 5 commits intomasterfrom
healthcheck-timeout

Conversation

@matush-v
Copy link
Contributor

@matush-v matush-v commented Jun 6, 2017

Looks like the max time for a task to get to TASK_RUNNING and pass health checks was set by killHealthcheckAfterDefaultSeconds. Since we were previously using System.currentTimeMillis() - task.getTaskId().getStartedAt() we used the same clock for both parts.

This separates the checks for the max time it can take to get a task running and then the max time it can take to pass health checks

@matush-v matush-v requested a review from ssalinas June 6, 2017 19:50
private boolean isOverdue(SingularityTask task) {
final long taskDuration = System.currentTimeMillis() - task.getTaskId().getStartedAt();
private boolean isHealthcheckOverdue(SingularityTask task) {
final long healthcheckDuration = taskManager.getLastHealthcheck(task.getTaskId()).get().getDurationMillis().or(0L);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure of the best route here since we don't want to use the time the task was launched for health checks. If we reach this logic, that means there is a health check so we can for sure have that, but the duration millis can be absent. If the duration never gets updated, this could lead to the case of a health check running forever

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to use the time since the task entered the running state here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which time are you referring to? task.getTaskid().getStartedAt() looked like launch time. i couldn't find anything that shows the time is updated once it starts running. I also don't see any other times except for system time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in task history updates, there will be an update with status of TASK_RUNNING that has an associated timestamp


private Optional<Integer> healthcheckMaxTotalTimeoutSeconds = Optional.absent();

private long killHealthcheckAfterDefaultSeconds = 600;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

killHealthcheck isn't quite accurate here. We are killing the task if it is not considered healthy by this hard timeout. Maybe more like killTaskIfNotHealthyAfterSeconds

private boolean isOverdue(SingularityTask task) {
final long taskDuration = System.currentTimeMillis() - task.getTaskId().getStartedAt();
private boolean isHealthcheckOverdue(SingularityTask task) {
final long healthcheckDuration = taskManager.getLastHealthcheck(task.getTaskId()).get().getDurationMillis().or(0L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to use the time since the task entered the running state here

@ssalinas ssalinas modified the milestone: 0.16.0 Jun 8, 2017
@ssalinas
Copy link
Member

should be gtg to staging once the comments above are addressed 👍

}

return System.currentTimeMillis();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had it fall back to the current time if it doesn't find the state info. could run over in this case, but we should be notified by one of those error messages

@ssalinas ssalinas merged commit 03109d2 into master Jun 28, 2017
@ssalinas ssalinas deleted the healthcheck-timeout branch June 28, 2017 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants