Improve robustness of cache loading/storing #974

alexcrichton · 2020-02-24T21:48:10Z

Today wasmtime incorrectly loads compiled compiled modules from the
global cache when toggling settings such as optimizations. For example
if you execute wasmtime foo.wasm that will cache globally an
unoptimized version of the wasm module. If you then execute wasmtime -O foo.wasm it would then reload the unoptimized version from cache, not
realizing the compilation settings were different, and use that instead.
This can lead to very surprising behavior naturally!

This commit updates how the cache is managed in an attempt to make it
much more robust against these sorts of issues. This takes a leaf out of
rustc's playbook and models the cache with a function that looks like:

fn load<T: Hash>(
    &self,
    data: T,
    compute: fn(T) -> CacheEntry,
) -> CacheEntry;

The goal here is that it guarantees that all the data necessary to
compute the result of the cache entry is hashable and stored into the
hash key entry. This was previously open-coded and manually managed
where items were hashed explicitly, but this construction guarantees
that everything reasonable compute could use to compile the module is
stored in data, which is itself hashable.

This refactoring then resulted in a few workarounds and a few fixes,
including the original issue:

The Module type was split into Module and ModuleLocal where only
the latter is hashed. The previous hash function for a Module left
out items like the start_func and didn't hash items like the imports
of the module. Omitting the start_func was fine since compilation
didn't actually use it, but omitting imports seemed uncomfortable
because while compilation didn't use the import values it did use the
number of imports, which seems like it should then be put into the
cache key. The ModuleLocal type now derives Hash to guarantee that
all of its contents affect the hash key.
The ModuleTranslationState from cranelift-wasm doesn't implement
Hash which means that we have a manual wrapper to work around that.
This will be fixed with an upstream implementation, since this state
affects the generated wasm code. Currently this is just a map of
signatures, which is present in Module anyway, so we should be good
for the time being.
Hashing dyn TargetIsa was also added, where previously it was not
fully hashed. Previously only the target name was used as part of the
cache key, but crucially the flags of compilation were omitted (for
example the optimization flags). Unfortunately the trait object itself
is not hashable so we still have to manually write a wrapper to hash
it, but we likely want to add upstream some utilities to hash isa
objects into cranelift itself. For now though we can continue to add
hashed fields as necessary.

Overall the goal here was to use the compiler to expose what we're not
hashing, and then make sure we organize data and write the right code to
ensure everything is hashed, and nothing more.

Today wasmtime incorrectly loads compiled compiled modules from the global cache when toggling settings such as optimizations. For example if you execute `wasmtime foo.wasm` that will cache globally an unoptimized version of the wasm module. If you then execute `wasmtime -O foo.wasm` it would then reload the unoptimized version from cache, not realizing the compilation settings were different, and use that instead. This can lead to very surprising behavior naturally! This commit updates how the cache is managed in an attempt to make it much more robust against these sorts of issues. This takes a leaf out of rustc's playbook and models the cache with a function that looks like: fn load<T: Hash>( &self, data: T, compute: fn(T) -> CacheEntry, ) -> CacheEntry; The goal here is that it guarantees that all the `data` necessary to `compute` the result of the cache entry is hashable and stored into the hash key entry. This was previously open-coded and manually managed where items were hashed explicitly, but this construction guarantees that everything reasonable `compute` could use to compile the module is stored in `data`, which is itself hashable. This refactoring then resulted in a few workarounds and a few fixes, including the original issue: * The `Module` type was split into `Module` and `ModuleLocal` where only the latter is hashed. The previous hash function for a `Module` left out items like the `start_func` and didn't hash items like the imports of the module. Omitting the `start_func` was fine since compilation didn't actually use it, but omitting imports seemed uncomfortable because while compilation didn't use the import values it did use the *number* of imports, which seems like it should then be put into the cache key. The `ModuleLocal` type now derives `Hash` to guarantee that all of its contents affect the hash key. * The `ModuleTranslationState` from `cranelift-wasm` doesn't implement `Hash` which means that we have a manual wrapper to work around that. This will be fixed with an upstream implementation, since this state affects the generated wasm code. Currently this is just a map of signatures, which is present in `Module` anyway, so we should be good for the time being. * Hashing `dyn TargetIsa` was also added, where previously it was not fully hashed. Previously only the target name was used as part of the cache key, but crucially the flags of compilation were omitted (for example the optimization flags). Unfortunately the trait object itself is not hashable so we still have to manually write a wrapper to hash it, but we likely want to add upstream some utilities to hash isa objects into cranelift itself. For now though we can continue to add hashed fields as necessary. Overall the goal here was to use the compiler to expose what we're not hashing, and then make sure we organize data and write the right code to ensure everything is hashed, and nothing more.

alexcrichton · 2020-02-24T21:48:39Z

Note that most of the diff here is by splitting Module into Module and ModuleLocal, but I'd be interested to see if others have other ideas about how to handle this and/or bikeshedding!

crates/environ/src/module.rs

Co-Authored-By: Peter Huene <peterhuene@protonmail.com>

peterhuene

👍 I think this approach makes sense and I don't have an alternative to propose. Curious what others think, though.

sunfishcode · 2020-02-24T22:38:30Z

This is a good catch, and a good refactoring! Splitting ModuleLocal out of Module is nicer than naming the individual fields to be cached.

Compilation looks at the number of imports in order to translate from module index spaces into definition index spaces, but the output shouldn't ever depend on the absolute number of imports. In other words, adding imports to a module that aren't used anywhere shouldn't change the compilation output, even though it shifts all the definitions up in the index space.

One reason we didn't consider the target flags in the cache was that we were thinking about the JIT-for-the-host-target case, but you're right, it's best to include these flags, as the user may customize the flags, or they may share a cache directory across hosts.

alexcrichton · 2020-02-24T22:57:26Z

@sunfishcode yeah I wasn't able to think of a module which would break because we didn't account for imports changing, but even just the number of imports affects quite a lot of decisions made during translation, so I was basically not confident enough in the caching that I think we should restructure the way this is where it has explicit Hash implementations for everything.

I think you're probably still right in that so much about a module changes when everything is renumbered, but is there a use case for not hashing anything about the imports? I'm not sure when you'd get cache hits between the two, especially when the raw bytes of a function body are hashed.

tschneidereit · 2020-02-25T12:56:03Z

@alexcrichton does this contain a regression test to ensure that we don't regress the caching behavior again? The changes to the cache test file don't seem to add any coverage, but maybe I'm mistaken. In any case, having a regression test here seems valuable :)

This reverts commit 2b53fee.

alexcrichton · 2020-02-25T23:13:42Z

This doesn't currently include a test, no, I'll have to see if I can figure out a way to add a test for that.

alexcrichton · 2020-02-26T17:17:32Z

Ok I've added a test for this behavior as well now.

@sunfishcode did you want to take a look again before merging?

sunfishcode · 2020-02-26T22:17:32Z

Looks good. I agree that including the import counts in the hash is unlikely to significantly reduce cache efficiency in practice.

peterhuene reviewed Feb 24, 2020

View reviewed changes

crates/environ/src/module.rs Outdated Show resolved Hide resolved

alexcrichton and others added 2 commits February 24, 2020 16:09

Update crates/environ/src/module.rs

502c36a

Co-Authored-By: Peter Huene <peterhuene@protonmail.com>

Fix lightbeam

29da88d

peterhuene approved these changes Feb 24, 2020

View reviewed changes

Fix compilation of tests

08c1220

Update the expected structure of the cache

2b53fee

alexcrichton added 2 commits February 25, 2020 09:02

Revert "Update the expected structure of the cache"

c544f20

This reverts commit 2b53fee.

Separate the cache dir a bit

f84efc2

Add a test the cache is busted with opt levels

56e3897

rustfmt

5e2264b

alexcrichton merged commit c8ab1e2 into bytecodealliance:master Feb 26, 2020

alexcrichton deleted the cache-opt-level branch February 26, 2020 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve robustness of cache loading/storing #974

Improve robustness of cache loading/storing #974

alexcrichton commented Feb 24, 2020

alexcrichton commented Feb 24, 2020

peterhuene left a comment

sunfishcode commented Feb 24, 2020

alexcrichton commented Feb 24, 2020

tschneidereit commented Feb 25, 2020

alexcrichton commented Feb 25, 2020

alexcrichton commented Feb 26, 2020

sunfishcode commented Feb 26, 2020

Improve robustness of cache loading/storing #974

Improve robustness of cache loading/storing #974

Conversation

alexcrichton commented Feb 24, 2020

alexcrichton commented Feb 24, 2020

peterhuene left a comment

Choose a reason for hiding this comment

sunfishcode commented Feb 24, 2020

alexcrichton commented Feb 24, 2020

tschneidereit commented Feb 25, 2020

alexcrichton commented Feb 25, 2020

alexcrichton commented Feb 26, 2020

sunfishcode commented Feb 26, 2020