Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: sha2 dependency introduces many new indirect dependencies #3609

Closed
bjorn3 opened this issue Dec 16, 2021 · 4 comments · Fixed by #3619
Closed

Cranelift: sha2 dependency introduces many new indirect dependencies #3609

bjorn3 opened this issue Dec 16, 2021 · 4 comments · Fixed by #3619

Comments

@bjorn3
Copy link
Contributor

bjorn3 commented Dec 16, 2021

   Compiling version_check v0.9.3
   Compiling typenum v1.14.0
   Compiling opaque-debug v0.3.0
   Compiling cpufeatures v0.2.1
   Compiling generic-array v0.14.4
   Compiling block-buffer v0.9.0
   Compiling digest v0.9.0
   Compiling sha2 v0.9.8

This regresses compilation time by a couple of seconds and requires me to add them to the list of permitted dependencies in rustc at https://github.com/rust-lang/rust/blob/f8402169aaa12e7bbb9630796a8caec90a3055ca/src/tools/tidy/src/deps.rs#L236

@sunfishcode
Copy link
Member

sha2 and its dependencies already appear in the PERMITTED_DEPENDENCIES list. Is it valuable for cranelift to avoid dependencies that rustc itself already depends on?

@bjorn3
Copy link
Contributor Author

bjorn3 commented Dec 17, 2021

I'm trying to keep the dependency list as small as reasonably possible.

@cfallin
Copy link
Member

cfallin commented Dec 17, 2021

That's definitely a good first-order principle and I share that instinct too -- but in this case I think it's worthwhile to see the context: computing a hash in the build script and checking against the manifest lets us avoid building the whole ISLE compiler, so it's replacing an even larger dependency tree.

The best option w.r.t. dependency-set size is to build without any up-to-date checks at all, but that's a huge developer-friction issue if it is the default when hacking away in a local checkout. (Forget rebuild-isle, spend time debugging mysterious issue, discover source was never updated, etc.)

One option that may be reasonable is to put the manifest check under an on-by-default feature (say check-isle-up-to-date) so that we have safety / least-surprising behavior by default, but dependents that care strongly about minimal dependencies can specify default-features = false and then turn back on the features they need. The crate as uploaded to crates.io or on main in the git repo should always have up-to-date generated source, so skipping the checks (and thus sha512-hashing) is safe. Thoughts?

@bjorn3
Copy link
Contributor Author

bjorn3 commented Dec 17, 2021

One option that may be reasonable is to put the manifest check under an on-by-default feature (say check-isle-up-to-date)

That sounds great!

cfallin added a commit to cfallin/wasmtime that referenced this issue Dec 17, 2021
…ependencies where needed.

Some use-cases are very sensitive to the number of crates pulled in,
both for audit-related reasons and for build-time reasons.

Our manifest-based ISLE-up-to-date approach in bytecodealliance#3534 was meant to help
with this while still avoiding the footgun of a completely opt-in source
rebuild: by including generated source in the checked-in tree, and just
checking that source is up to date, we allow for fast builds without the
whole ISLE compiler meta-step, but still catch attempted use of stale
generated source (and allow the developer to opt-in to regenerating the
in-tree source).

Unfortunately this still requires the hash algorithm itself (`sha2`)
which turns out to pull in a number of other small crates. In cases
where we know the source won't be stale -- for example, depending on the
`main` branch in git, or a published crate version of
`cranelift-codegen` -- the checks are actually not needed at all.

This PR thus introduces a feature `check-isle` that controls whether
`build.rs` does the checks. It is on by default, so developer safety
remains: if someone checks out the source, modifies some ISLE, and then
does a `cargo build`, they will get an error that helps them with the
proper steps to regenerate the source. But, dependencies that know what
they are doing can turn it off with `default-features = false`.

I've verified that we have the expected small dependency tree now:

```
$ cargo tree --depth 1 -p cranelift-codegen --no-default-features --features "std unwind all-arch"
cranelift-codegen v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen)
├── cranelift-bforest v0.79.0 (/home/cfallin/work/wasmtime/cranelift/bforest)
├── cranelift-codegen-shared v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/shared)
├── cranelift-entity v0.79.0 (/home/cfallin/work/wasmtime/cranelift/entity)
├── gimli v0.26.1
├── log v0.4.14
├── regalloc v0.0.33
├── smallvec v1.6.1
└── target-lexicon v0.12.0
[build-dependencies]
├── cranelift-codegen-meta v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/meta)
└── sha2 v0.9.8
[dev-dependencies]
└── criterion v0.3.5
```

Fixes bytecodealliance#3609.
cfallin added a commit to cfallin/wasmtime that referenced this issue Dec 17, 2021
…ependencies where needed.

Some use-cases are very sensitive to the number of crates pulled in,
both for audit-related reasons and for build-time reasons.

Our manifest-based ISLE-up-to-date approach in bytecodealliance#3534 was meant to help
with this while still avoiding the footgun of a completely opt-in source
rebuild: by including generated source in the checked-in tree, and just
checking that source is up to date, we allow for fast builds without the
whole ISLE compiler meta-step, but still catch attempted use of stale
generated source (and allow the developer to opt-in to regenerating the
in-tree source).

Unfortunately this still requires the hash algorithm itself (`sha2`)
which turns out to pull in a number of other small crates. In cases
where we know the source won't be stale -- for example, depending on the
`main` branch in git, or a published crate version of
`cranelift-codegen` -- the checks are actually not needed at all.

This PR thus introduces a feature `check-isle` that controls whether
`build.rs` does the checks. It is on by default, so developer safety
remains: if someone checks out the source, modifies some ISLE, and then
does a `cargo build`, they will get an error that helps them with the
proper steps to regenerate the source. But, dependencies that know what
they are doing can turn it off with `default-features = false`.

I've verified that we have the expected small dependency tree now:

```
$ cargo tree --depth 1 -p cranelift-codegen --no-default-features --features "std unwind all-arch"
cranelift-codegen v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen)
├── cranelift-bforest v0.79.0 (/home/cfallin/work/wasmtime/cranelift/bforest)
├── cranelift-codegen-shared v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/shared)
├── cranelift-entity v0.79.0 (/home/cfallin/work/wasmtime/cranelift/entity)
├── gimli v0.26.1
├── log v0.4.14
├── regalloc v0.0.33
├── smallvec v1.6.1
└── target-lexicon v0.12.0
[build-dependencies]
└── cranelift-codegen-meta v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/meta)
[dev-dependencies]
└── criterion v0.3.5
```

Fixes bytecodealliance#3609.
cfallin added a commit to cfallin/wasmtime that referenced this issue Dec 17, 2021
Fixes bytecodealliance#3609. It turns out that `sha2` is a nontrivial dependency for
Cranelift in many contexts, partly because it pulls in a number of other
crates as well.

One option is to remove the hash check under certain circumstances, as
implemented in bytecodealliance#3616. However, this is undesirable for other reasons:
having different dependency options in Wasmtime in particular for
crates.io vs. local builds is not really possible, and so either we
still have the higher build cost in Wasmtime, or we turn off the checks
by default, which goes against the original intent of ensuring developer
safety (no mysterious stale-source bugs).

This PR uses `SipHash` instead, which is built into the standard
library. `SipHash` is deprecated, but it's fixed and deterministic
(across runs and across Rust versions), which is what we need, unlike
the suggested replacement `std::collections::hash_map::DefaultHasher`.
The result is only 64 bits, and is not cryptographically secure, but we
never needed that; we just need a simple check to indicate when we
forget a `rebuild-isle`.
cfallin added a commit to cfallin/wasmtime that referenced this issue Dec 17, 2021
Fixes bytecodealliance#3609. It turns out that `sha2` is a nontrivial dependency for
Cranelift in many contexts, partly because it pulls in a number of other
crates as well.

One option is to remove the hash check under certain circumstances, as
implemented in bytecodealliance#3616. However, this is undesirable for other reasons:
having different dependency options in Wasmtime in particular for
crates.io vs. local builds is not really possible, and so either we
still have the higher build cost in Wasmtime, or we turn off the checks
by default, which goes against the original intent of ensuring developer
safety (no mysterious stale-source bugs).

This PR uses `SipHash` instead, which is built into the standard
library. `SipHash` is deprecated, but it's fixed and deterministic
(across runs and across Rust versions), which is what we need, unlike
the suggested replacement `std::collections::hash_map::DefaultHasher`.
The result is only 64 bits, and is not cryptographically secure, but we
never needed that; we just need a simple check to indicate when we
forget a `rebuild-isle`.
cfallin added a commit to cfallin/wasmtime that referenced this issue Dec 17, 2021
Fixes bytecodealliance#3609. It turns out that `sha2` is a nontrivial dependency for
Cranelift in many contexts, partly because it pulls in a number of other
crates as well.

One option is to remove the hash check under certain circumstances, as
implemented in bytecodealliance#3616. However, this is undesirable for other reasons:
having different dependency options in Wasmtime in particular for
crates.io vs. local builds is not really possible, and so either we
still have the higher build cost in Wasmtime, or we turn off the checks
by default, which goes against the original intent of ensuring developer
safety (no mysterious stale-source bugs).

This PR uses `SipHash` instead, which is built into the standard
library. `SipHash` is deprecated, but it's fixed and deterministic
(across runs and across Rust versions), which is what we need, unlike
the suggested replacement `std::collections::hash_map::DefaultHasher`.
The result is only 64 bits, and is not cryptographically secure, but we
never needed that; we just need a simple check to indicate when we
forget a `rebuild-isle`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants