-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISLE: guard against stale generated source in default build config. #3534
ISLE: guard against stale generated source in default build config. #3534
Conversation
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me! One thing I might recommend though is to use this logic even if rebuild-isle
is specified, since that helps cut down on spurious rebuilds since we don't overwrite source files with their same contents. I'm imagining something like:
fn main() {
maybe_rebuild_isle();
}
fn maybe_rebuild_isle() {
let manifest = isle_manifest();
if !manifest.up_to_date() {
rebuild_isle(&manifest);
}
}
#[cfg(feature = "rebuild-isle")]
fn rebuild_isle(manifest: &Manifest) {
// do the actual build
}
#[cfg(not(feature = "rebuild-isle"))]
fn rebuild_isle(_manifest: &Manifest) {
panic!("ISLE needs a rebuild but `rebuild-isle` isn't specified");
}
Otherwise one of the downsides today is that if you rebuild ISLE it tricks Cargo into thinking that the files changed so everything is always rebuilt, causing somewhat unnecessary rebuilds.
Another thing I think we might need to address (which I suspect is why Windows is failing CI) is that line endings can change which can tamper with a source file's checksum. I think that's fine though we can do a replace("\r\n", "\n")
before we calculate the checksum of the file and that should do the trick.
a1b1e39
to
632761f
Compare
@alexcrichton That's an excellent idea -- I've refactored now so that it always computes the manifest, and then either rebuilds or errors out when a recompilation is required, depending on the |
Currently, the `build.rs` script that generates Rust source from the ISLE DSL will only do this generation if the `rebuild-isle` Cargo feature is specified. By default, it is not. This is based on the principle that we (the build script) do not modify the source tree as managed by git; git-managed files are strictly a human-managed and human-edited resource. By adding the opt-in Cargo feature, a developer is requesting the build script to perform an explicit action. (In my understanding at least, this principle comes from the general philosophy of hermetic builds: the output should be a pure function of the input, and part of this is that the input is read-only. If we modify the source tree, then all bets are off.) Unfortunately, requiring the opt-in feature also creates a footgun that is easy to hit: if a developer modifies the ISLE DSL source, but forgets to specify the Cargo feature, then the compiler will silently be built successfully with stale source, and will silently exclude any changes that were made. The generated source is checked into git for a good reason: we want DSL compiler to not affect build times for the overwhelmingly common case that Cranelift is used as a dependency but the backends are not being actively developed. (This overhead comes mainly from building `islec` itself.) So, what to do? This PR implements a middle ground first described in [this conversation](bytecodealliance#3506 (comment)), in which we: - Generate a hash (SHA-512) of the ISLE DSL source and produce a "manifest" of ISLE inputs alongside the generated source; and - Always read the ISLE DSL source, and see if the manifest is still valid, on builds that do not have the opt-in "rebuild" feature. This allows us to know whether the ISLE compiler output would have been the same (modulo changes to the DSL compiler itself, which are out-of-scope here), without actually building the ISLE compiler and running it. If the compiler-backend developer modifies an ISLE source file and then tries to build `cranelift-codegen` without adding the `rebuild-isle` Cargo feature, they get the following output: ```text Error: the ISLE source files that resulted in the generated Rust source * src/isa/x64/lower/isle/generated_code.rs have changed but the generated source was not rebuilt! These ISLE source files are: * src/clif.isle * src/prelude.isle * src/isa/x64/inst.isle * src/isa/x64/lower.isle Please add `--features rebuild-isle` to your `cargo build` command if you wish to rebuild the generated source, then include these changes in any git commits you make that include the changes to the ISLE. For example: $ cargo build -p cranelift-codegen --features rebuild-isle (This build script cannot do this for you by default because we cannot modify checked-into-git source without your explicit opt-in.) ``` which will tell them exactly what they need to do to fix the problem! Note that I am leaving the "Rebuild ISLE" CI job alone for now, because otherwise, we are trusting whomever submits a PR to generate the correct generated source. In other words, the manifest is a communication from the checked-in tree to the developer, but we still need to verify that the checked-in generated Rust source and the manifest are correct with respect to the checked-in ISLE source.
632761f
to
a2b9664
Compare
Just an interesting data point: some of the latest CI failures were because this PR needed a rebase, so the manifest hashes were out-of-date with respect to latest Also I had to disable the ISLE build logic via a special Cargo feature in the "Meta deterministic check"; as explained in the commit message, this runs the build script outside of the source tree, which worked previously because the default build options did not look for ISLE source, but now they do. The CI job is checking other output (unrelated to ISLE) though, so this seems fine. |
…inistic check". - The Windows line-ending canonicalization was incomplete: we need to canonicalize the manifest text itself too! - The "meta deterministic check" runs the cranelift-codegen build script N times outside of the source tree, examining what it produces to ensure the output is always the same (is detministic). This works fine when everything comes from the internal DSL, but when reading ISLE, this breaks because we no longer have the ISLE source paths. The initial ISLE integration did not hit this because without the `rebuild-isle` feature, it simply did nothing in the build script; now, with the manifest check, we hit the issue. The fix for now is just to turn off all ISLE-specific behavior in the build script by setting a special-purpose Cargo feature in the specific CI job. Eventually IMHO we should remove or find a better way to do this check.
d071191
to
6a4716f
Compare
writeln!() to a string on Windows to generate the expected manifest needs newline-character canonicalization, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too stoked about the stuff to deal with the deterministic check, but it's pretty minor and we can live with it. I don't think it's ever actually caught anything, so I'd be in favor of removing it myself or, as you mentioned, figuring out a better way to test it.
Cool, yeah, we can maybe think more about this at the same time that we think about whether we want to check in the meta-DSL-generated source too (which IIRC we've discussed as a way to reduce compile time). I'll leave it as-is for this PR and create a followup issue to track it. |
…ependencies where needed. Some use-cases are very sensitive to the number of crates pulled in, both for audit-related reasons and for build-time reasons. Our manifest-based ISLE-up-to-date approach in bytecodealliance#3534 was meant to help with this while still avoiding the footgun of a completely opt-in source rebuild: by including generated source in the checked-in tree, and just checking that source is up to date, we allow for fast builds without the whole ISLE compiler meta-step, but still catch attempted use of stale generated source (and allow the developer to opt-in to regenerating the in-tree source). Unfortunately this still requires the hash algorithm itself (`sha2`) which turns out to pull in a number of other small crates. In cases where we know the source won't be stale -- for example, depending on the `main` branch in git, or a published crate version of `cranelift-codegen` -- the checks are actually not needed at all. This PR thus introduces a feature `check-isle` that controls whether `build.rs` does the checks. It is on by default, so developer safety remains: if someone checks out the source, modifies some ISLE, and then does a `cargo build`, they will get an error that helps them with the proper steps to regenerate the source. But, dependencies that know what they are doing can turn it off with `default-features = false`. I've verified that we have the expected small dependency tree now: ``` $ cargo tree --depth 1 -p cranelift-codegen --no-default-features --features "std unwind all-arch" cranelift-codegen v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen) ├── cranelift-bforest v0.79.0 (/home/cfallin/work/wasmtime/cranelift/bforest) ├── cranelift-codegen-shared v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/shared) ├── cranelift-entity v0.79.0 (/home/cfallin/work/wasmtime/cranelift/entity) ├── gimli v0.26.1 ├── log v0.4.14 ├── regalloc v0.0.33 ├── smallvec v1.6.1 └── target-lexicon v0.12.0 [build-dependencies] ├── cranelift-codegen-meta v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/meta) └── sha2 v0.9.8 [dev-dependencies] └── criterion v0.3.5 ``` Fixes bytecodealliance#3609.
…ependencies where needed. Some use-cases are very sensitive to the number of crates pulled in, both for audit-related reasons and for build-time reasons. Our manifest-based ISLE-up-to-date approach in bytecodealliance#3534 was meant to help with this while still avoiding the footgun of a completely opt-in source rebuild: by including generated source in the checked-in tree, and just checking that source is up to date, we allow for fast builds without the whole ISLE compiler meta-step, but still catch attempted use of stale generated source (and allow the developer to opt-in to regenerating the in-tree source). Unfortunately this still requires the hash algorithm itself (`sha2`) which turns out to pull in a number of other small crates. In cases where we know the source won't be stale -- for example, depending on the `main` branch in git, or a published crate version of `cranelift-codegen` -- the checks are actually not needed at all. This PR thus introduces a feature `check-isle` that controls whether `build.rs` does the checks. It is on by default, so developer safety remains: if someone checks out the source, modifies some ISLE, and then does a `cargo build`, they will get an error that helps them with the proper steps to regenerate the source. But, dependencies that know what they are doing can turn it off with `default-features = false`. I've verified that we have the expected small dependency tree now: ``` $ cargo tree --depth 1 -p cranelift-codegen --no-default-features --features "std unwind all-arch" cranelift-codegen v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen) ├── cranelift-bforest v0.79.0 (/home/cfallin/work/wasmtime/cranelift/bforest) ├── cranelift-codegen-shared v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/shared) ├── cranelift-entity v0.79.0 (/home/cfallin/work/wasmtime/cranelift/entity) ├── gimli v0.26.1 ├── log v0.4.14 ├── regalloc v0.0.33 ├── smallvec v1.6.1 └── target-lexicon v0.12.0 [build-dependencies] └── cranelift-codegen-meta v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/meta) [dev-dependencies] └── criterion v0.3.5 ``` Fixes bytecodealliance#3609.
Currently, the
build.rs
script that generates Rust source from theISLE DSL will only do this generation if the
rebuild-isle
Cargofeature is specified. By default, it is not. This is based on the
principle that we (the build script) do not modify the source tree as
managed by git; git-managed files are strictly a human-managed and
human-edited resource. By adding the opt-in Cargo feature, a developer
is requesting the build script to perform an explicit action. (In my
understanding at least, this principle comes from the general philosophy
of hermetic builds: the output should be a pure function of the input,
and part of this is that the input is read-only. If we modify the source
tree, then all bets are off.)
Unfortunately, requiring the opt-in feature also creates a footgun that
is easy to hit: if a developer modifies the ISLE DSL source, but forgets
to specify the Cargo feature, then the compiler will silently be built
successfully with stale source, and will silently exclude any changes
that were made.
The generated source is checked into git for a good reason: we want DSL
compiler to not affect build times for the overwhelmingly common case
that Cranelift is used as a dependency but the backends are not being
actively developed. (This overhead comes mainly from building
islec
itself.)
So, what to do? This PR implements a middle ground first described in
this conversation, in which we:
"manifest" of ISLE inputs alongside the generated source; and
valid, on builds that do not have the opt-in "rebuild" feature.
This allows us to know whether the ISLE compiler output would have been
the same (modulo changes to the DSL compiler itself, which are
out-of-scope here), without actually building the ISLE compiler and
running it.
If the compiler-backend developer modifies an ISLE source file and then
tries to build
cranelift-codegen
without adding therebuild-isle
Cargo feature, they get the following output:
which will tell them exactly what they need to do to fix the problem!
Note that I am leaving the "Rebuild ISLE" CI job alone for now, because
otherwise, we are trusting whomever submits a PR to generate the correct
generated source. In other words, the manifest is a communication from
the checked-in tree to the developer, but we still need to verify that
the checked-in generated Rust source and the manifest are correct with
respect to the checked-in ISLE source.