-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fault-isolate): prefer independent tasks and don't fail-fast #276
Conversation
nb i haven't properly tested that i actually understand how fail-fast works in github, and the continue-on-error flag is not clearly documented so I don't know if it's actually required to get the behaviour i want: https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#handling-failures |
Test 1: Happy Path✅ Normal releases still work fine
Test 2: Break arm64 macosI intentionally sabotaged the build on aarch64-apple-darwin. ✅ Only arm64 macos failed
Test 3: Break x64 linuxChanged test 2 to break the usually-fastest build: x64 linux. ✅ Same results as before, but this time linux finished (failed) first, so we can be confident everything worked properly. |
@@ -927,12 +973,18 @@ impl<'pkg_graph> DistGraphBuilder<'pkg_graph> { | |||
if package_config.cargo_dist_version.is_some() { | |||
warn!("package.metadata.dist.cargo-dist-version is set, but this is only accepted in workspace.metadata (value is being ignored): {}", package.manifest_path); | |||
} | |||
if package_config.cargo_dist_version.is_some() { | |||
if package_config.rust_toolchain_version.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
This config specifies whether we should try to merge otherwise-parallelizable tasks onto the same machine, sacrificing latency and fault-isolation for more the sake of minor effeciency gains. For example, if you build for x64 macos and arm64 macos, by default we will generate ci which builds those independently on separate logical machines. With this enabled we will build both of those platforms together on the same machine, making it take twice as long as any other build and making it impossible for only one of them to succeed. The default is alse. Before 0.1.0 it was always rue and couldn't be changed, making releases annoyingly slow (and technically less fault-isolated). This config was added to allow you to restore the old behaviour, if you really want.
When building a release you might discover that an obscure platform's build is broken. When this happens you have two options: give up on the release entirely (ail-fast = true), or keep trying to build all the other platforms anyway (ail-fast = false). cargo-dist was designed around the 'keep trying' approach, as we create a draft Release and upload results to it over time, undrafting the release only if all tasks succeeded. The idea is that even if a platform fails to build, you can decide that's acceptable and manually undraft the release with some missing platforms. (Note that the dist-manifest.json is produced before anything else, and so it will assume that all tasks succeeded when listing out supported platforms/artifacts. This may make you sad if you do this kind of undrafting and also trust the dist-manifest to be correct.) Prior to 0.1.0 we didn't set the correct flags in our CI scripts to do this, but now we do. This flag was introduced to allow you to restore the old behaviour if you prefer.
feat(fault-isolate): add merge-tasks config and default it to false
This config specifies whether we should try to merge otherwise-parallelizable tasks onto the same machine, sacrificing latency and fault-isolation for more the sake of minor effeciency gains.
For example, if you build for x64 macos and arm64 macos, by default we will generate ci which builds those independently on separate logical machines. With this enabled we will build both of those platforms together on the same machine, making it take twice as long as any other build and making it impossible for only one of them to succeed.
The default is false. Before 0.1.0 it was always true and couldn't be changed, making releases annoyingly slow (and technically less fault-isolated). This config was added to allow you to restore the old behaviour, if you really want.
feat(fault-isolate): add fail-fast config and default it false
When building a release you might discover that an obscure platform's build is broken. When this happens you have two options: give up on the release entirely (fail-fast = true), or keep trying to build all the other platforms anyway (fail-fast = false).
cargo-dist was designed around the 'keep trying' approach, as we create a draft Release
and upload results to it over time, undrafting the release only if all tasks succeeded.
The idea is that even if a platform fails to build, you can decide that's acceptable
and manually undraft the release with some missing platforms.
(Note that the dist-manifest.json is produced before anything else, and so it will assume
that all tasks succeeded when listing out supported platforms/artifacts. This may make
you sad if you do this kind of undrafting and also trust the dist-manifest to be correct.)
Prior to 0.1.0 we didn't set the correct flags in our CI scripts to do this, but now we do.
This flag was introduced to allow you to restore the old behaviour if you prefer.