cranelift-wasm: Add a bounds-checking optimization for dynamic memories and guard pages #6031

fitzgen · 2023-03-16T19:26:50Z

This is a new special case for when we know that there are enough guard pages to cover the memory access's offset and access size.

The precise should-we-trap condition is

index + offset + access_size > bound

However, if we instead check only the partial condition

index > bound

then the most out of bounds that the access can be, while that partial check still succeeds, is offset + access_size.

However, when we have a guard region that is at least as large as offset + access_size, we can rely on the virtual memory subsystem handling these out-of-bounds errors at runtime. Therefore, the partial index > bound check is sufficient for this heap configuration.

Additionally, this has the advantage that a series of Wasm loads that use the same dynamic index operand but different static offset immediates -- which is a common code pattern when accessing multiple fields in the same struct that is in linear memory -- will all emit the same index > bound check, which we can GVN.

Partially.

The bounds check comparison is GVN'd but we still branch on values we should know will always be true if we get this far in the code. This is actual br_ifs in the non-Spectre code and select_spectre_guards that we should know will always go a certain way if we have Spectre mitigations enabled. See the second commit for examples.

Improving the non-Spectre case is pretty straightforward: walk the dominator tree and remember which values we've already branched on at this point, and therefore we can simplify any further conditional branches on those same values into direct jumps.

Improving the Spectre case requires something that is morally the same, but has a couple snags:

We don't have actual br_ifs to determine whether the bounds checking condition succeeded or not. We need to instead reason about dominating select_spectre_guard; {load, store} instruction pairs.
We have to be SUPER careful about reasoning "through" select_spectre_guards. Our general rule is never to do that, since it could break the speculative execution sandboxing that the instruction is designed for.

This pull request leaves implementing these new optimization passes for follow ups.

fitzgen · 2023-03-16T19:28:52Z

FWIW: I looked over the wasm filetest changes pretty closely, but did not go over the ISA-specific versions of those same tests in much detail.

fitzgen · 2023-03-16T19:29:21Z

Going to gather sightglass numbers now...

fitzgen · 2023-03-16T19:45:11Z

Looks like this gives not just execution speed ups, but also compilation speed ups (presumably due to processing less IR):

$ sightglass benchmark -e main.so -e opt.so --engine-flags="--static-memory-maximum-size=0 --dynamic-memory-guard-size=65536" -m insts-retired -- benchmarks/default.suite 
execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 354918771.70 ± 209612.26 (confidence = 99%)

  opt.so is 1.08x to 1.08x faster than main.so!

  [4757395008 4757613072.90 4757854325] main.so
  [4402584769 4402694301.20 4403032801] opt.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 2246146.90 ± 322.54 (confidence = 99%)

  opt.so is 1.07x to 1.07x faster than main.so!

  [35406698 35406850.20 35407505] main.so
  [33160532 33160703.30 33161229] opt.so

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 20037307.30 ± 5.37 (confidence = 99%)

  opt.so is 1.05x to 1.05x faster than main.so!

  [402201103 402201106.00 402201111] main.so
  [382163791 382163798.70 382163803] opt.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 80907.10 ± 30178.64 (confidence = 99%)

  opt.so is 1.01x to 1.01x faster than main.so!

  [9350309 9389659.00 9444444] main.so
  [9276889 9308751.90 9338252] opt.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 1534345.20 ± 211755.93 (confidence = 99%)

  opt.so is 1.01x to 1.01x faster than main.so!

  [218580786 218834739.10 219048529] main.so
  [216969499 217300393.90 217501790] opt.so

alexcrichton

Seems reasonble to me, but I think it'd be good to get one more approval on this PR as well.

Also FWIW I find the tests sort of unhelpful to review since there's so many (100+ files). I think you added some tests here but I only happend to run into them by happenstance when scrolling through. I realize the full matrix is useful in the limit but, perhaps in a future PR, the test suite could be trimmed down to something a bit mor easily reviewable?

alexcrichton · 2023-03-16T21:01:51Z

cranelift/wasm/src/code_translator/bounds_checks.rs

+        HeapStyle::Dynamic { bound_gv }
+            if offset_and_size <= heap.offset_guard_size && spectre_mitigations_enabled =>
+        {
+            let bound = builder.ins().global_value(env.pointer_type(), bound_gv);
+            Reachable(compute_addr(
+                &mut builder.cursor(),
+                heap,
+                env.pointer_type(),
+                index,
+                offset,
+                Some(SpectreOobComparison {
+                    cc: IntCC::UnsignedGreaterThan,
+                    lhs: index,
+                    rhs: bound,
+                }),
+            ))
+        }
+
+        //   2.b. Emit explicit `index > bound` check.
+        HeapStyle::Dynamic { bound_gv } if offset_and_size <= heap.offset_guard_size => {
+            let bound = builder.ins().global_value(env.pointer_type(), bound_gv);
+            let oob = builder.ins().icmp(IntCC::UnsignedGreaterThan, index, bound);
+            builder.ins().trapnz(oob, ir::TrapCode::HeapOutOfBounds);
+            Reachable(compute_addr(
+                &mut builder.cursor(),
+                heap,
+                env.pointer_type(),
+                index,
+                offset,
+                None,
+            ))
+        }


I know at the top of this match it says that some duplication is necessary, but I personally found it a bit confusing to have the spectre/non-spectre cases split apart. Would it be possible, for dynamic heaps, to fuse these two arms together with a helper?

Yeah I was thinking of doing this in a follow up. That cool?

cranelift/wasm/src/code_translator/bounds_checks.rs

alexcrichton · 2023-03-16T21:07:56Z

Hm ok well I also see now that a test failed with something that should trap which didn't, which means now that I missed something in review, so I'm much less confident in my review now.

cfallin

The actual bounds-check logic looks correct to me; modulo perhaps the overflow issue that Alex points to.

The failing test has this comment:

    // Technically this is out of bounds...
    assert!(i32_load.call(&mut store, page).is_err());
    // ... but implementation-wise it should still be mapped memory from before.
    // Note though that prior writes should all appear as zeros and we can't see
    // data from the prior instance.

which makes me think that the configuration in that test is a lie (we indicate that a guard region is present that is not, actually), or else the pooling allocator isn't respecting the requested offset-guard size for some reason...

alexcrichton · 2023-03-16T21:39:46Z

Ah yeah after digging in I believe that's correct. The configuration requires dynamic memories to have a 64k guard page region but the pooling allocator in this configuration specifically isn't decommitting memory past the end to avoid syscalls, which means that there's actually no guard pages at all upon reinstantiation. That's ok with today's impelmentation of bounds checks which don't try to exploit the fact that the arithmetic can be simplified (e.g. this PR).

This is a tough problem though because that was one of the assumptions about dynamic memory and the pooling allocator, that we could avoid decommitting memory and save on syscalls/contention. This is making me realize though that if we want to take advantage of guard pages on dynamic memories (which I suspect we do since it should help drastically simplify the complexity of analysis required to handle multiple bounds checks) then this optimization isn't possible.

The offending lines I think are here which those need to be executed unconditionally if the guard size for memory is more than 0 bytes, which I think is basically always.

fitzgen · 2023-03-16T22:04:30Z

The offending lines I think are here which those need to be executed unconditionally if the guard size for memory is more than 0 bytes, which I think is basically always.

Thanks! I was digging into the test but hadn't got that far before I was interrupted. Always nice when someone debugs the issue for you :)

tests/all/pooling_allocator.rs

…es and guard pages This is a new special case for when we know that there are enough guard pages to cover the memory access's offset and access size. The precise should-we-trap condition is index + offset + access_size > bound However, if we instead check only the partial condition index > bound then the most out of bounds that the access can be, while that partial check still succeeds, is `offset + access_size`. However, when we have a guard region that is at least as large as `offset + access_size`, we can rely on the virtual memory subsystem handling these out-of-bounds errors at runtime. Therefore, the partial `index > bound` check is sufficient for this heap configuration. Additionally, this has the advantage that a series of Wasm loads that use the same dynamic index operand but different static offset immediates -- which is a common code pattern when accessing multiple fields in the same struct that is in linear memory -- will all emit the same `index > bound` check, which we can GVN.

… index but different offsets The bounds check comparison is GVN'd but we still branch on values we should know will always be true if we get this far in the code. This is actual `br_if`s in the non-Spectre code and `select_spectre_guard`s that we should know will always go a certain way if we have Spectre mitigations enabled. Improving the non-Spectre case is pretty straightforward: walk the dominator tree and remember which values we've already branched on at this point, and therefore we can simplify any further conditional branches on those same values into direct jumps. Improving the Spectre case requires something that is morally the same, but has a few snags: * We don't have actual `br_if`s to determine whether the bounds checking condition succeeded or not. We need to instead reason about dominating `select_spectre_guard; {load, store}` instruction pairs. * We have to be SUPER careful about reasoning "through" `select_spectre_guard`s. Our general rule is never to do that, since it could break the speculative execution sandboxing that the instruction is designed for.

fitzgen requested review from alexcrichton and cfallin March 16, 2023 19:26

github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 16, 2023

alexcrichton approved these changes Mar 16, 2023

View reviewed changes

cfallin reviewed Mar 16, 2023

View reviewed changes

fitzgen force-pushed the dynamic-memories-with-guard-regions branch from 41647a3 to b0e2e63 Compare March 17, 2023 02:11

fitzgen requested a review from alexcrichton March 17, 2023 02:11

alexcrichton approved these changes Mar 17, 2023

View reviewed changes

tests/all/pooling_allocator.rs Outdated Show resolved Hide resolved

fitzgen force-pushed the dynamic-memories-with-guard-regions branch from b0e2e63 to 66b4973 Compare March 17, 2023 18:02

fitzgen enabled auto-merge March 17, 2023 18:02

fitzgen added 2 commits March 17, 2023 11:44

fitzgen force-pushed the dynamic-memories-with-guard-regions branch from 66b4973 to 107b383 Compare March 17, 2023 18:44

fitzgen mentioned this pull request Mar 17, 2023

cranelift-wasm: Refactor bounds checks to avoid repetition of Spectre and non-Spectre #6054

Merged

fitzgen added this pull request to the merge queue Mar 17, 2023

fitzgen mentioned this pull request Mar 17, 2023

Cranelift: Consider taking trapnz and trapz all the way to the backends rather than legalizing them #6055

Open

fitzgen merged commit 2e48bab into bytecodealliance:main Mar 17, 2023

fitzgen deleted the dynamic-memories-with-guard-regions branch March 17, 2023 19:57

fitzgen mentioned this pull request Mar 23, 2023

Further Improve Execution Speed with Explicit Bounds Checks #6094

Open

jameysharp mentioned this pull request Apr 5, 2023

Add release notes for 8.0.0 #6145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cranelift-wasm: Add a bounds-checking optimization for dynamic memories and guard pages #6031

cranelift-wasm: Add a bounds-checking optimization for dynamic memories and guard pages #6031

fitzgen commented Mar 16, 2023

fitzgen commented Mar 16, 2023

fitzgen commented Mar 16, 2023

fitzgen commented Mar 16, 2023

alexcrichton left a comment

alexcrichton Mar 16, 2023

fitzgen Mar 16, 2023

alexcrichton commented Mar 16, 2023

cfallin left a comment

alexcrichton commented Mar 16, 2023

fitzgen commented Mar 16, 2023

cranelift-wasm: Add a bounds-checking optimization for dynamic memories and guard pages #6031

cranelift-wasm: Add a bounds-checking optimization for dynamic memories and guard pages #6031

Conversation

fitzgen commented Mar 16, 2023

fitzgen commented Mar 16, 2023

fitzgen commented Mar 16, 2023

fitzgen commented Mar 16, 2023

alexcrichton left a comment

Choose a reason for hiding this comment

alexcrichton Mar 16, 2023

Choose a reason for hiding this comment

fitzgen Mar 16, 2023

Choose a reason for hiding this comment

alexcrichton commented Mar 16, 2023

cfallin left a comment

Choose a reason for hiding this comment

alexcrichton commented Mar 16, 2023

fitzgen commented Mar 16, 2023