Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the dynamic spawn strategy is broken for remote execution #8646

Closed
buchgr opened this issue Jun 17, 2019 · 1 comment

Comments

Projects
None yet
2 participants
@buchgr
Copy link
Contributor

commented Jun 17, 2019

We have received several bug reports about the dynamic spawn strategy not working properly with remote execution / caching. I took some time to look through the code to better understand the issue. I believe to now understand the heart of the issue.

The dynamic spawn strategy in a single line:

SpawnResult result =
ExecutorService.invokeAny(localExecutionCallable, remoteExecutionCallable)`

This line takes the result of the first callable to finish and calls Future.cancel(mayInterruptIfRunning=true) on the slower one. The bug is that the Callable's have side effects in that both write their outputs to the file system and incidentally also to the same files.

This has two issues:

  1. The effective output files of a single action can end up being a mix of both local and remote execution. This would mostly go unnoticed (esp. if the actions are reproducible) except when it doesn't then it would lead to hard to debug correctness bugs.
  2. The output files Bazel sees can be corrupted. This can be the case if both the remote strategy and local execution strategy write to the same file concurrently.

I believe 2.) is what users are most likely to see.

@buchgr

This comment has been minimized.

Copy link
Contributor Author

commented Jun 17, 2019

The fix is to call SpawnExecutionContext.lockOutputfiles() before starting downloading of outputs.

buchgr added a commit to buchgr/bazel that referenced this issue Jun 17, 2019

remote: make the dynamic spawn scheduler work. Fixes bazelbuild#8646
This change fixes the correctness issue of dynamic spawn scheduler when
being used with remote execution. See bazelbuild#8646 for more details.

buchgr added a commit to buchgr/bazel that referenced this issue Jun 17, 2019

remote: make the dynamic spawn scheduler work. Fixes bazelbuild#8646
This change fixes the correctness issue of dynamic spawn scheduler when
being used with remote execution. See bazelbuild#8646 for more details.

There's a performance issue remaining: bazelbuild#8467

buchgr added a commit to buchgr/bazel that referenced this issue Jun 18, 2019

remote: make the dynamic spawn scheduler work. Fixes bazelbuild#8646
This change fixes the correctness issue of dynamic spawn scheduler when
being used with remote execution. See bazelbuild#8646 for more details.

There's a performance issue remaining: bazelbuild#8467

@bazel-io bazel-io closed this in d75b6cf Jun 19, 2019

siberex added a commit to siberex/bazel that referenced this issue Jul 4, 2019

remote: make the dynamic spawn scheduler work. Fixes bazelbuild#8646
This change fixes the correctness issue of dynamic spawn scheduler when being used with remote execution. See bazelbuild#8646 for more details.

There's a performance issue remaining: bazelbuild#8647

Closes bazelbuild#8648.

PiperOrigin-RevId: 253998300

irengrig added a commit to irengrig/bazel that referenced this issue Jul 15, 2019

remote: make the dynamic spawn scheduler work. Fixes bazelbuild#8646
This change fixes the correctness issue of dynamic spawn scheduler when being used with remote execution. See bazelbuild#8646 for more details.

There's a performance issue remaining: bazelbuild#8647

Closes bazelbuild#8648.

PiperOrigin-RevId: 253998300
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.