Skip to content

Add compression-oriented function reordering pass#8696

Open
brendandahl wants to merge 1 commit into
WebAssembly:mainfrom
brendandahl:reorder
Open

Add compression-oriented function reordering pass#8696
brendandahl wants to merge 1 commit into
WebAssembly:mainfrom
brendandahl:reorder

Conversation

@brendandahl
Copy link
Copy Markdown
Collaborator

Implement the --reorder-functions-by-similarity optimization pass in wasm-opt.

Gzip and Brotli compression algorithms rely on finding repetitive byte patterns inside a sliding window (e.g., 32KB for Gzip). If structurally similar functions are placed far apart in the Wasm binary, the compressor cannot detect matches across them. While the existing --reorder-functions pass sorts functions strictly by call frequency to shrink LEB128 indexes, it scatters mutually compressible functions and ultimately increases gzipped delivery sizes.

This new pass traverses defined function bodies in post-order and extracts a similarity sorting key based on signature type IDs, local variables types, and structural opcode sequences. By sorting defined functions lexicographically by this key, structurally similar functions are physically grouped together in the output binary, providing adjacent compressible bytes.

@brendandahl brendandahl requested a review from a team as a code owner May 13, 2026 00:26
@brendandahl brendandahl requested review from tlively and removed request for a team May 13, 2026 00:26
@brendandahl
Copy link
Copy Markdown
Collaborator Author

Below is a comparison of the uncompressed and gzip-compressed binary sizes for both configurations. There are still some tweaks I think we can make. I've been able to get 2% on some files, but it wasn't doing as well on others (still need to figure out why).

Benchmark File Uncompressed Baseline (bytes) Uncompressed Similarity (bytes) Uncompressed Change Gzip Baseline (bytes) Gzip Similarity (bytes) Gzip Change (Savings)
dart-flute-complex.opt.wasm 1,081,549 1,083,288 +0.16% 392,180 386,221 -1.52%
dart-flute-complex.unopt.wasm 1,284,344 1,286,148 +0.14% 458,367 452,629 -1.25%
dart-pop.unopt.wasm 398,114 398,114 0.00% 148,474 146,737 -1.17%
dart-pop.opt.wasm 350,546 350,546 0.00% 133,329 131,929 -1.05%
v8_poppler.wasm 2,067,741 2,076,431 +0.42% 987,474 982,825 -0.47%
v8_sqlite.c.wasm 931,440 936,924 +0.59% 378,918 375,992 -0.77%
v8_box2d.wasm 86,598 86,598 0.00% 39,983 39,978 -0.01%

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly comments on algorithmic improvements. Let me know if you'd rather land as-is to get the measured benefit without investing more time in algorithmic improvements and I can review with that in mind.

Comment on lines +48 to +49
// Capture important immediate type/operator information
// TODO: There's probably more data that would be useful to capture.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably extract and reuse the HashStringifyWalker from Outlining.cpp. It turns expression trees into strings by shallowly hashing each expression, including all of its immediates. You would just want it to use a normal PostWalker (but probably modified to also call addUniqueSymbol at control flow boundaries, e.g. end and else) instead of the custom StringifyWalker it currently uses. Nothing a little extra templating can't solve!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will be a bigger change (and potentially much slower). I'd like to save this for a v2 experiment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to just look at the bytes - that would be most precise (actually use the encoding of the enums), and not hard to do, but slower. Anyhow, yes, larger changes/investigations can be left for later, this looks like a great start!

Comment thread src/passes/ReorderFunctionsBySimilarity.cpp Outdated
Comment thread src/passes/ReorderFunctionsBySimilarity.cpp Outdated
ThreadPool::get()->work(doWorkers);

// 3. Sort defined functions by the similarity heuristic
std::sort(keys.begin(), keys.end());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting only works when the similarities are at the beginning of the strings, right? It seems like looking for matching substrings would be more robust. You could check out what Outlining.cpp does with a suffix tree to find common substrings, for example.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the idea here was prologues are usually very common and doing full substring matching is very slow. As mentioned above, seems like something to explore in v2.

@kripken
Copy link
Copy Markdown
Member

kripken commented May 13, 2026

I assume the background here is #4322 ? Some prior work is there.

@brendandahl
Copy link
Copy Markdown
Collaborator Author

No, though I did find that after starting this. Awhile ago I was playing with compressed wat vs wasm with brotli/gzip and added a note to try reordering for gzip. I haven't tried out the idea from cromulate. I was also going to ask if you still have your similarity-ordering branch somewhere?

@kripken
Copy link
Copy Markdown
Member

kripken commented May 13, 2026

Hmm, unfortunately I seem to have deleted it when I moved my branches to my fork, but it isn't there either... Should have been at

https://github.com/kripken/binaryen/tree/similarity-ordering

Github had a way to restore deleted branches back in the day but maybe just for recent ones... anyhow, the code there was probably not great 😄

@kripken
Copy link
Copy Markdown
Member

kripken commented May 13, 2026

iirc, the approach was to write the binary bytes and compare them (so not at the IR level). Not sure if that is better (certainly slower).

Implement the --reorder-functions-by-similarity optimization pass
in wasm-opt.

Gzip and Brotli compression algorithms rely on finding repetitive byte
patterns inside a sliding window (e.g., 32KB for Gzip). If structurally
similar functions are placed far apart in the Wasm binary, the
compressor cannot detect matches across them. While the existing
--reorder-functions pass sorts functions strictly by call frequency to
shrink LEB128 indexes, it scatters mutually compressible functions and
ultimately increases gzipped delivery sizes.

This new pass traverses defined function bodies in post-order and
extracts a similarity sorting key based on signature type IDs, local
variables types, and structural opcode sequences. By sorting defined
functions lexicographically by this key, structurally similar
functions are physically grouped together in the output binary,
providing adjacent compressible bytes.

Empirical benchmarks on real-world Flutter and Poppler Wasm examples
show a significant improvement, saving up to 2.13% and .98% in compressed
delivery size compared to the baseline (no reordering).
@brendandahl
Copy link
Copy Markdown
Collaborator Author

brendandahl commented May 13, 2026

Added brotli to the comparison. Helps there even with the bigger sliding window of 4MiB

File Gzip Baseline (bytes) Gzip Similarity (bytes) Gzip Diff Brotli Baseline (bytes) Brotli Similarity (bytes) Brotli Diff
dart-flute-complex.opt.wasm 392,180 386,221 -1.52% 353,061 349,273 -1.07%
dart-flute-complex.unopt.wasm 458,367 452,629 -1.25% 409,995 406,029 -0.97%
dart-pop.unopt.wasm 148,474 146,737 -1.17% 135,640 134,493 -0.85%
dart-pop.opt.wasm 133,329 131,929 -1.05% 122,487 121,162 -1.08%
v8_poppler.wasm 987,474 982,825 -0.47% 924,941 921,716 -0.35%
v8_sqlite.c.wasm 378,918 375,992 -0.77% 329,704 327,791 -0.58%
v8_box2d.wasm 39,983 39,978 -0.01% 37,332 37,332 0.00%

@MaxGraey
Copy link
Copy Markdown
Contributor

Could you add zstd to comparion, please?

: public PostWalker<OpcodeSequenceBuilder,
UnifiedExpressionVisitor<OpcodeSequenceBuilder>> {
std::vector<uint32_t> sequence;
const size_t max_len = 512;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const size_t max_len = 512;
const size_t MaxLen = 512;

Comment on lines +48 to +49
// Capture important immediate type/operator information
// TODO: There's probably more data that would be useful to capture.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to just look at the bytes - that would be most precise (actually use the encoding of the enums), and not hard to do, but slower. Anyhow, yes, larger changes/investigations can be left for later, this looks like a great start!

sequence.push_back(localSet->type.getID());
} else if (auto* const_ = curr->dynCast<Const>()) {
sequence.push_back(const_->type.getID());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, you can get all enums using wasm-delegations-fields. It would be shorter than the current code. That + the type would make sense I think?


void run(Module* module) override {
// If the number of defined functions is small, similarity-based reordering
// does not help and can regress size due to increasing LEB size.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, doesn't this matter more for large modules? Where there are enough for LEBs to matter?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a deeper dive into this. This heuristic was a quick hack to avoid regressing the small v8_box2d.wasm where re-ordering it made it worse. I assumed this was LEB's or the original ordering was better, but what actually was happening was the gzip command was adding the filename into the file!

I got rid of this code locally and now use gzip -9 -k -n and v8_box2d.wasm also improves 0.33%.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, funny about the filename! 😄

Nice, yeah, I'd hope this works even on small things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants