separate timeouts for health check and task running#1560
Conversation
| private boolean isOverdue(SingularityTask task) { | ||
| final long taskDuration = System.currentTimeMillis() - task.getTaskId().getStartedAt(); | ||
| private boolean isHealthcheckOverdue(SingularityTask task) { | ||
| final long healthcheckDuration = taskManager.getLastHealthcheck(task.getTaskId()).get().getDurationMillis().or(0L); |
There was a problem hiding this comment.
I wasn't sure of the best route here since we don't want to use the time the task was launched for health checks. If we reach this logic, that means there is a health check so we can for sure have that, but the duration millis can be absent. If the duration never gets updated, this could lead to the case of a health check running forever
There was a problem hiding this comment.
we should be able to use the time since the task entered the running state here
There was a problem hiding this comment.
which time are you referring to? task.getTaskid().getStartedAt() looked like launch time. i couldn't find anything that shows the time is updated once it starts running. I also don't see any other times except for system time
There was a problem hiding this comment.
in task history updates, there will be an update with status of TASK_RUNNING that has an associated timestamp
|
|
||
| private Optional<Integer> healthcheckMaxTotalTimeoutSeconds = Optional.absent(); | ||
|
|
||
| private long killHealthcheckAfterDefaultSeconds = 600; |
There was a problem hiding this comment.
killHealthcheck isn't quite accurate here. We are killing the task if it is not considered healthy by this hard timeout. Maybe more like killTaskIfNotHealthyAfterSeconds
| private boolean isOverdue(SingularityTask task) { | ||
| final long taskDuration = System.currentTimeMillis() - task.getTaskId().getStartedAt(); | ||
| private boolean isHealthcheckOverdue(SingularityTask task) { | ||
| final long healthcheckDuration = taskManager.getLastHealthcheck(task.getTaskId()).get().getDurationMillis().or(0L); |
There was a problem hiding this comment.
we should be able to use the time since the task entered the running state here
|
should be gtg to staging once the comments above are addressed 👍 |
| } | ||
|
|
||
| return System.currentTimeMillis(); | ||
| } |
There was a problem hiding this comment.
had it fall back to the current time if it doesn't find the state info. could run over in this case, but we should be notified by one of those error messages
Looks like the max time for a task to get to TASK_RUNNING and pass health checks was set by
killHealthcheckAfterDefaultSeconds. Since we were previously usingSystem.currentTimeMillis() - task.getTaskId().getStartedAt()we used the same clock for both parts.This separates the checks for the max time it can take to get a task running and then the max time it can take to pass health checks