New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate timeouts for health check and task running #1560

Merged
merged 5 commits into from Jun 28, 2017

Conversation

Projects
None yet
2 participants
@darcatron
Contributor

darcatron commented Jun 6, 2017

Looks like the max time for a task to get to TASK_RUNNING and pass health checks was set by killHealthcheckAfterDefaultSeconds. Since we were previously using System.currentTimeMillis() - task.getTaskId().getStartedAt() we used the same clock for both parts.

This separates the checks for the max time it can take to get a task running and then the max time it can take to pass health checks

@darcatron darcatron requested a review from ssalinas Jun 6, 2017

Show outdated Hide outdated ...in/java/com/hubspot/singularity/scheduler/SingularityNewTaskChecker.java
private boolean isOverdue(SingularityTask task) {
final long taskDuration = System.currentTimeMillis() - task.getTaskId().getStartedAt();
private boolean isHealthcheckOverdue(SingularityTask task) {
final long healthcheckDuration = taskManager.getLastHealthcheck(task.getTaskId()).get().getDurationMillis().or(0L);

This comment has been minimized.

@darcatron

darcatron Jun 6, 2017

Contributor

I wasn't sure of the best route here since we don't want to use the time the task was launched for health checks. If we reach this logic, that means there is a health check so we can for sure have that, but the duration millis can be absent. If the duration never gets updated, this could lead to the case of a health check running forever

@darcatron

darcatron Jun 6, 2017

Contributor

I wasn't sure of the best route here since we don't want to use the time the task was launched for health checks. If we reach this logic, that means there is a health check so we can for sure have that, but the duration millis can be absent. If the duration never gets updated, this could lead to the case of a health check running forever

This comment has been minimized.

@ssalinas

ssalinas Jun 8, 2017

Member

we should be able to use the time since the task entered the running state here

@ssalinas

ssalinas Jun 8, 2017

Member

we should be able to use the time since the task entered the running state here

This comment has been minimized.

@darcatron

darcatron Jun 13, 2017

Contributor

which time are you referring to? task.getTaskid().getStartedAt() looked like launch time. i couldn't find anything that shows the time is updated once it starts running. I also don't see any other times except for system time

@darcatron

darcatron Jun 13, 2017

Contributor

which time are you referring to? task.getTaskid().getStartedAt() looked like launch time. i couldn't find anything that shows the time is updated once it starts running. I also don't see any other times except for system time

This comment has been minimized.

@ssalinas

ssalinas Jun 13, 2017

Member

in task history updates, there will be an update with status of TASK_RUNNING that has an associated timestamp

@ssalinas

ssalinas Jun 13, 2017

Member

in task history updates, there will be an update with status of TASK_RUNNING that has an associated timestamp

Show outdated Hide outdated ...c/main/java/com/hubspot/singularity/config/SingularityConfiguration.java
@@ -162,6 +162,8 @@
private Optional<Integer> healthcheckMaxTotalTimeoutSeconds = Optional.absent();
private long killHealthcheckAfterDefaultSeconds = 600;

This comment has been minimized.

@ssalinas

ssalinas Jun 8, 2017

Member

killHealthcheck isn't quite accurate here. We are killing the task if it is not considered healthy by this hard timeout. Maybe more like killTaskIfNotHealthyAfterSeconds

@ssalinas

ssalinas Jun 8, 2017

Member

killHealthcheck isn't quite accurate here. We are killing the task if it is not considered healthy by this hard timeout. Maybe more like killTaskIfNotHealthyAfterSeconds

Show outdated Hide outdated ...in/java/com/hubspot/singularity/scheduler/SingularityNewTaskChecker.java
private boolean isOverdue(SingularityTask task) {
final long taskDuration = System.currentTimeMillis() - task.getTaskId().getStartedAt();
private boolean isHealthcheckOverdue(SingularityTask task) {
final long healthcheckDuration = taskManager.getLastHealthcheck(task.getTaskId()).get().getDurationMillis().or(0L);

This comment has been minimized.

@ssalinas

ssalinas Jun 8, 2017

Member

we should be able to use the time since the task entered the running state here

@ssalinas

ssalinas Jun 8, 2017

Member

we should be able to use the time since the task entered the running state here

@ssalinas ssalinas modified the milestone: 0.16.0 Jun 8, 2017

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jun 12, 2017

Member

should be gtg to staging once the comments above are addressed 👍

Member

ssalinas commented Jun 12, 2017

should be gtg to staging once the comments above are addressed 👍

}
return System.currentTimeMillis();
}

This comment has been minimized.

@darcatron

darcatron Jun 13, 2017

Contributor

had it fall back to the current time if it doesn't find the state info. could run over in this case, but we should be notified by one of those error messages

@darcatron

darcatron Jun 13, 2017

Contributor

had it fall back to the current time if it doesn't find the state info. could run over in this case, but we should be notified by one of those error messages

@ssalinas ssalinas merged commit 03109d2 into master Jun 28, 2017

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ssalinas ssalinas deleted the healthcheck-timeout branch Jun 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment