Skip to content

Conversation

kripken
Copy link
Member

@kripken kripken commented Aug 26, 2025

The main changes here are:

  1. Rather than precompute sometimes while ignoring effects of tees etc.
    and sometimes not, do so in a single manner: while considering the effects
    carefully and deciding which children to keep.
  2. This lets us remove the dual cache from [GC] Precompute: Fix ref.eq comparisons of structs with nested effects #7857, as now there is a single
    mode.

But really, this is a rewrite of that core logic from scratch in a cleaner and
less hackish way, while fixing issues with the dual cache and even the
earlier single cache that it fixed:

  1. We must have a single cached object for each expression. Having a dual
    cache opened us up to bugs, because it turns out we might actually
    cache an object in the propagate phase, and use it in the main phase, and
    each used a different cache. Now both phases do the same thing, so there
    is no risk.
  2. We must compute effects when there are effects, because they are
    state in the PrecomputingExpressionRunner, a source of bugs with all
    previous caches. I realized that the solution here is simple: note when there
    are effects, and if so, just compute them. This is fine, because the quadratic
    case happens in global objects, which have no effects anyhow (and even inside
    functions it is rare to have such effects). And, after computing the effects, we
    use the single cached heap location, keeping identity stable (a key fix here).

Then, the main visitExpression is straightforward: compute in the most
general manner: NOT trying to replace the entire expression, which requires
no side effects, but allowing them, and looking at the children afterwards to
see which are actually needed. This is necessary to avoid a regression in this
PR, but it actually ends up as a progression, since we can handle more cases,
like (ref.eq (tee) (get)). Before the tee would stop us, and propagate doesn't
handle this if it isn't written to a local, but now we can just compute it, and
keep the tee around.

Also, I figured out how to avoid the monotonically increasing code size
problem, which e.g. GUFA has, where you see expression A, figure out it
evaluates to constant C, but has effects you must keep, so you emit (A, C).
That lets the constant get optimized, but if you run twice you can add C
twice, unless you carefully look at the parent, which is annoying. Here,
that is avoided because while we may add such a constant, regressing
size, we still make progress because we remove the main expression itself -
we may keep some children, but never the parent, so the increase is
bounded.

This improves not just GC code:

Emscripten size diff diff --git a/test/code_size/test_codesize_hello_O2.json b/test/code_size/test_codesize_hello_O2.json index 2ddadfd5c7..352c0831ec 100644 --- a/test/code_size/test_codesize_hello_O2.json +++ b/test/code_size/test_codesize_hello_O2.json @@ -1,17 +1,17 @@ { "a.out.js": 4369, "a.out.js.gz": 2146, - "a.out.nodebug.wasm": 1979, - "a.out.nodebug.wasm.gz": 1157, - "total": 6348, - "total_gz": 3303, + "a.out.nodebug.wasm": 1927, + "a.out.nodebug.wasm.gz": 1138, + "total": 6296, + "total_gz": 3284, "sent": [ "fd_write" ], "imports": [ "wasi_snapshot_preview1.fd_write" ], "exports": [ "__indirect_function_table", "__wasm_call_ctors", "_emscripten_stack_alloc", diff --git a/test/code_size/test_codesize_hello_O3.json b/test/code_size/test_codesize_hello_O3.json index 1e497bc59f..892da4c3d5 100644 --- a/test/code_size/test_codesize_hello_O3.json +++ b/test/code_size/test_codesize_hello_O3.json @@ -1,17 +1,17 @@ { "a.out.js": 4311, "a.out.js.gz": 2104, - "a.out.nodebug.wasm": 1733, - "a.out.nodebug.wasm.gz": 980, - "total": 6044, - "total_gz": 3084, + "a.out.nodebug.wasm": 1681, + "a.out.nodebug.wasm.gz": 960, + "total": 5992, + "total_gz": 3064, "sent": [ "a (fd_write)" ], "imports": [ "a (fd_write)" ], "exports": [ "b (memory)", "c (__wasm_call_ctors)", "d (main)" diff --git a/test/code_size/test_codesize_hello_Os.json b/test/code_size/test_codesize_hello_Os.json index 128a92afd1..0a660aeb20 100644 --- a/test/code_size/test_codesize_hello_Os.json +++ b/test/code_size/test_codesize_hello_Os.json @@ -1,17 +1,17 @@ { "a.out.js": 4311, "a.out.js.gz": 2104, - "a.out.nodebug.wasm": 1723, - "a.out.nodebug.wasm.gz": 985, - "total": 6034, - "total_gz": 3089, + "a.out.nodebug.wasm": 1671, + "a.out.nodebug.wasm.gz": 964, + "total": 5982, + "total_gz": 3068, "sent": [ "a (fd_write)" ], "imports": [ "a (fd_write)" ], "exports": [ "b (memory)", "c (__wasm_call_ctors)", "d (main)" diff --git a/test/code_size/test_codesize_hello_Oz.json b/test/code_size/test_codesize_hello_Oz.json index d38e027f7a..593e603ff0 100644 --- a/test/code_size/test_codesize_hello_Oz.json +++ b/test/code_size/test_codesize_hello_Oz.json @@ -1,17 +1,17 @@ { "a.out.js": 3930, "a.out.js.gz": 1905, - "a.out.nodebug.wasm": 1257, - "a.out.nodebug.wasm.gz": 763, - "total": 5187, - "total_gz": 2668, + "a.out.nodebug.wasm": 1205, + "a.out.nodebug.wasm.gz": 740, + "total": 5135, + "total_gz": 2645, "sent": [ "a (fd_write)" ], "imports": [ "a (fd_write)" ], "exports": [ "b (memory)", "c (__wasm_call_ctors)", "d (main)"

There are also some minor theoretical regressions, as a few tests
show, but those are things other passes handle better (like
(return (return ..))), so they only happen when running the pass
by itself (production code using the full pipeline should only get
better).

This is also a slight improvement in compile times.

@kripken kripken requested a review from tlively August 26, 2025 22:51
return flow;
}
heapValuesMap[curr] = flow.getSingleValue().getGCData();
heapValues.map[curr] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use insert with a placeholder null data in place of the original heapValues.map.find to avoid the second lookup here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you mean, but this code feels complex enough to me that I'd rather not add that? That is, I feel the current code is easier to read, even if slightly less efficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be more complicated! It even gives you a nice inserted bool to check rather than comparing against heapValues.map.end().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference of opinion I guess 😄 To me, starting with "insert into the map, just a placeholder we'll fill in later" feels odd, when mentally the situation at the start is "ok, let's see if this is already in the map".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I interest you in porting these tests to lit while we're touching them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it as a followup to this PR.

kripken and others added 5 commits August 26, 2025 17:42
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I still think it would be nice to use insert(). We try to use it where possible elsewhere.

@kripken
Copy link
Member Author

kripken commented Aug 27, 2025

Fuzzed for 100K iterations overnight, looks good. Landing.

Let's talk offline about the insert issue - I'm not strongly opposed but I would like to understand better why you favor it.

@kripken kripken merged commit 9de4aca into WebAssembly:main Aug 27, 2025
17 checks passed
@kripken kripken deleted the precompute.drop branch August 27, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants