Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fault-isolate): prefer independent tasks and don't fail-fast #276

Merged
merged 2 commits into from
Jul 19, 2023

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Jul 17, 2023

feat(fault-isolate): add merge-tasks config and default it to false

This config specifies whether we should try to merge otherwise-parallelizable tasks onto the same machine, sacrificing latency and fault-isolation for more the sake of minor effeciency gains.

For example, if you build for x64 macos and arm64 macos, by default we will generate ci which builds those independently on separate logical machines. With this enabled we will build both of those platforms together on the same machine, making it take twice as long as any other build and making it impossible for only one of them to succeed.

The default is false. Before 0.1.0 it was always true and couldn't be changed, making releases annoyingly slow (and technically less fault-isolated). This config was added to allow you to restore the old behaviour, if you really want.


feat(fault-isolate): add fail-fast config and default it false

When building a release you might discover that an obscure platform's build is broken. When this happens you have two options: give up on the release entirely (fail-fast = true), or keep trying to build all the other platforms anyway (fail-fast = false).

cargo-dist was designed around the 'keep trying' approach, as we create a draft Release
and upload results to it over time, undrafting the release only if all tasks succeeded.
The idea is that even if a platform fails to build, you can decide that's acceptable
and manually undraft the release with some missing platforms.

(Note that the dist-manifest.json is produced before anything else, and so it will assume
that all tasks succeeded when listing out supported platforms/artifacts. This may make
you sad if you do this kind of undrafting and also trust the dist-manifest to be correct.)

Prior to 0.1.0 we didn't set the correct flags in our CI scripts to do this, but now we do.
This flag was introduced to allow you to restore the old behaviour if you prefer.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 17, 2023

nb i haven't properly tested that i actually understand how fail-fast works in github, and the continue-on-error flag is not clearly documented so I don't know if it's actually required to get the behaviour i want: https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#handling-failures

@Gankra
Copy link
Contributor Author

Gankra commented Jul 18, 2023

Test 1: Happy Path

✅ Normal releases still work fine
✅ Mac builds are concurrent/independent now


Test 2: Break arm64 macos

I intentionally sabotaged the build on aarch64-apple-darwin.

✅ Only arm64 macos failed
✅ All other tasks in the matrix succeeded
✅ The release was left as a draft with all other assets succesfully uploaded.
✅ The Release workflow was marked as an overall failure
✅ Workflows that were watching for Release success skipped themselves as status != success.
✅ I edited a mention of the arm64 macos build from the github release body, and undrafted it without any issues

⚠️ The arm64 macos build took the longest, so the result is inconclusive (it couldn't cancel anything else in the matrix).


Test 3: Break x64 linux

Changed test 2 to break the usually-fastest build: x64 linux.

✅ Same results as before, but this time linux finished (failed) first, so we can be confident everything worked properly.

@@ -927,12 +973,18 @@ impl<'pkg_graph> DistGraphBuilder<'pkg_graph> {
if package_config.cargo_dist_version.is_some() {
warn!("package.metadata.dist.cargo-dist-version is set, but this is only accepted in workspace.metadata (value is being ignored): {}", package.manifest_path);
}
if package_config.cargo_dist_version.is_some() {
if package_config.rust_toolchain_version.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

This config specifies whether we should try to merge otherwise-parallelizable tasks onto the same machine, sacrificing latency and fault-isolation for more the sake of minor effeciency gains.

For example, if you build for x64 macos and arm64 macos, by default we will generate ci which builds those independently on separate logical machines. With this enabled we will build both of those platforms together on the same machine, making it take twice as long as any other build and making it impossible for only one of them to succeed.

The default is alse. Before 0.1.0 it was always 	rue and couldn't be changed, making releases annoyingly slow (and technically less fault-isolated). This config was added to allow you to restore the old behaviour, if you really want.
When building a release you might discover that an obscure platform's build is broken. When this happens you have two options: give up on the release entirely (ail-fast = true), or keep trying to build all the other platforms anyway (ail-fast = false).

cargo-dist was designed around the 'keep trying' approach, as we create a draft Release
and upload results to it over time, undrafting the release only if all tasks succeeded.
The idea is that even if a platform fails to build, you can decide that's acceptable
and manually undraft the release with some missing platforms.

(Note that the dist-manifest.json is produced before anything else, and so it will assume
that all tasks succeeded when listing out supported platforms/artifacts. This may make
you sad if you do this kind of undrafting and also trust the dist-manifest to be correct.)

Prior to 0.1.0 we didn't set the correct flags in our CI scripts to do this, but now we do.
This flag was introduced to allow you to restore the old behaviour if you prefer.
@Gankra Gankra merged commit 290112d into main Jul 19, 2023
@Gankra Gankra deleted the mac-demultiplex branch July 19, 2023 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants