Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update mesos-actor; cleanup orphaned failed task launches #4109

Merged
merged 2 commits into from Nov 13, 2018

Conversation

tysonnorris
Copy link
Contributor

Description

When task launch times out, cleanup tasks that were sent to mesos (or not yet sent).
Also updating mesos-actor which includes a fix for accepting resources that are not assigned to the frameworks role.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

MetricEmitter.emitCounterMetric(LoggingMarkers.INVOKER_MESOS_CMD_TIMEOUT(MesosTask.KILL_CMD))
case Failure(t) => transid.failed(this, start, s"task destroy failed ${t.getMessage}", ErrorLevel)
}
.map(_ => {})(ec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ec is implicitly available so we can drop passing it explicitly.

destroy(mesosClientActor, mesosConfig, taskId)
case Failure(t) =>
//kill the task whose launch timed out
destroy(mesosClientActor, mesosConfig, taskId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the future returned by destroy for andThen call would be discarded so its more like ask and forget here. Is that the intention or it would be better to have destroy completed by the time creates resulting future completes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case destroy is just cleanup that is mesos-specific, so it should happen independent of the create future; i.e. create() timeout should return immediately, but trigger the cleanup of destroy()

Copy link
Member

@chetanmeh chetanmeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chetanmeh chetanmeh merged commit 775b757 into apache:master Nov 13, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
update mesos-actor and cleanup orphaned failed task launches

* review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants