Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle errors during task launching #230

Merged
merged 2 commits into from Jul 11, 2018

Conversation

Projects
None yet
3 participants
@alenkacz
Copy link
Collaborator

commented Jul 11, 2018

See the JIRA description for full details.

JIRA issues: DCOS-39102

@alenkacz alenkacz requested review from kensipe and meichstedt Jul 11, 2018

log.info("addTaskToLaunchQueue")
import dcos.metronome.utils.glue.MarathonImplicits._
launchQueue.add(jobRun.toRunSpec, count = 1)
if (existsInLaunchQueue()) {

This comment has been minimized.

Copy link
@meichstedt

meichstedt Jul 11, 2018

Collaborator

IIUC the issue was the the initial store timed out (b/c zk was slow?), the actor was restarted, tried to add to the launchQueue again, which failed because the initial store actually persisted the node.

Could you add a comment explaining why this could be the case? Not this particular scenario, but maybe something like if the actor was restarted due to an exception, we have to check whether the jobRun was added to the launchQueue before already.

@kensipe

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

this solution should be back ported to releases/0.3 and forward ported to master 0.5

@kensipe kensipe merged commit 878b018 into releases/0.4 Jul 11, 2018

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details

kensipe added a commit that referenced this pull request Jul 11, 2018

Gracefully handle errors during task launching (#230)
* Gracefully handle errors during task launching

* Add explaining comments

kensipe added a commit that referenced this pull request Jul 11, 2018

Gracefully handle errors during task launching (#230)
* Gracefully handle errors during task launching

* Add explaining comments

alenkacz added a commit that referenced this pull request Jul 12, 2018

kensipe added a commit that referenced this pull request Jul 16, 2018

Gracefully handle errors during task launching (#230)
* Gracefully handle errors during task launching

* Add explaining comments

kensipe added a commit that referenced this pull request Jul 16, 2018

Forward ports from releases 0.4 (#236)
* Allow Docker Params for Job Runs (#225)

* starting params priv ability on docker
Pass the docker params to Marathon when launching a task

* added examples and documentation

* Edit the docs based on Ivan't comments

* Gracefully handle errors during task launching (#230)

* Gracefully handle errors during task launching

* Add explaining comments

* doing what the log error message says... giving up (#234)

@kensipe kensipe deleted the av/launchFailure branch Aug 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.