New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update mesos-actor; cleanup orphaned failed task launches #4109
update mesos-actor; cleanup orphaned failed task launches #4109
Conversation
eadeda0
to
b3435e8
Compare
MetricEmitter.emitCounterMetric(LoggingMarkers.INVOKER_MESOS_CMD_TIMEOUT(MesosTask.KILL_CMD)) | ||
case Failure(t) => transid.failed(this, start, s"task destroy failed ${t.getMessage}", ErrorLevel) | ||
} | ||
.map(_ => {})(ec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec
is implicitly available so we can drop passing it explicitly.
destroy(mesosClientActor, mesosConfig, taskId) | ||
case Failure(t) => | ||
//kill the task whose launch timed out | ||
destroy(mesosClientActor, mesosConfig, taskId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the future returned by destroy
for andThen
call would be discarded so its more like ask and forget here. Is that the intention or it would be better to have destroy completed by the time create
s resulting future completes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case destroy is just cleanup that is mesos-specific, so it should happen independent of the create future; i.e. create()
timeout should return immediately, but trigger the cleanup of destroy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
update mesos-actor and cleanup orphaned failed task launches * review feedback
Description
When task launch times out, cleanup tasks that were sent to mesos (or not yet sent).
Also updating mesos-actor which includes a fix for accepting resources that are not assigned to the frameworks role.
Related issue and scope
My changes affect the following components
Types of changes
Checklist: