Add compression-oriented function reordering pass#8696
Conversation
|
Below is a comparison of the uncompressed and gzip-compressed binary sizes for both configurations. There are still some tweaks I think we can make. I've been able to get 2% on some files, but it wasn't doing as well on others (still need to figure out why).
|
tlively
left a comment
There was a problem hiding this comment.
Mostly comments on algorithmic improvements. Let me know if you'd rather land as-is to get the measured benefit without investing more time in algorithmic improvements and I can review with that in mind.
| // Capture important immediate type/operator information | ||
| // TODO: There's probably more data that would be useful to capture. |
There was a problem hiding this comment.
You could probably extract and reuse the HashStringifyWalker from Outlining.cpp. It turns expression trees into strings by shallowly hashing each expression, including all of its immediates. You would just want it to use a normal PostWalker (but probably modified to also call addUniqueSymbol at control flow boundaries, e.g. end and else) instead of the custom StringifyWalker it currently uses. Nothing a little extra templating can't solve!
There was a problem hiding this comment.
This looks like it will be a bigger change (and potentially much slower). I'd like to save this for a v2 experiment.
There was a problem hiding this comment.
Another option is to just look at the bytes - that would be most precise (actually use the encoding of the enums), and not hard to do, but slower. Anyhow, yes, larger changes/investigations can be left for later, this looks like a great start!
| ThreadPool::get()->work(doWorkers); | ||
|
|
||
| // 3. Sort defined functions by the similarity heuristic | ||
| std::sort(keys.begin(), keys.end()); |
There was a problem hiding this comment.
Sorting only works when the similarities are at the beginning of the strings, right? It seems like looking for matching substrings would be more robust. You could check out what Outlining.cpp does with a suffix tree to find common substrings, for example.
There was a problem hiding this comment.
Yeah, the idea here was prologues are usually very common and doing full substring matching is very slow. As mentioned above, seems like something to explore in v2.
|
I assume the background here is #4322 ? Some prior work is there. |
|
No, though I did find that after starting this. Awhile ago I was playing with compressed wat vs wasm with brotli/gzip and added a note to try reordering for gzip. I haven't tried out the idea from cromulate. I was also going to ask if you still have your |
|
Hmm, unfortunately I seem to have deleted it when I moved my branches to my fork, but it isn't there either... Should have been at https://github.com/kripken/binaryen/tree/similarity-ordering Github had a way to restore deleted branches back in the day but maybe just for recent ones... anyhow, the code there was probably not great 😄 |
|
iirc, the approach was to write the binary bytes and compare them (so not at the IR level). Not sure if that is better (certainly slower). |
Implement the --reorder-functions-by-similarity optimization pass in wasm-opt. Gzip and Brotli compression algorithms rely on finding repetitive byte patterns inside a sliding window (e.g., 32KB for Gzip). If structurally similar functions are placed far apart in the Wasm binary, the compressor cannot detect matches across them. While the existing --reorder-functions pass sorts functions strictly by call frequency to shrink LEB128 indexes, it scatters mutually compressible functions and ultimately increases gzipped delivery sizes. This new pass traverses defined function bodies in post-order and extracts a similarity sorting key based on signature type IDs, local variables types, and structural opcode sequences. By sorting defined functions lexicographically by this key, structurally similar functions are physically grouped together in the output binary, providing adjacent compressible bytes. Empirical benchmarks on real-world Flutter and Poppler Wasm examples show a significant improvement, saving up to 2.13% and .98% in compressed delivery size compared to the baseline (no reordering).
|
Added brotli to the comparison. Helps there even with the bigger sliding window of 4MiB
|
|
Could you add zstd to comparion, please? |
| : public PostWalker<OpcodeSequenceBuilder, | ||
| UnifiedExpressionVisitor<OpcodeSequenceBuilder>> { | ||
| std::vector<uint32_t> sequence; | ||
| const size_t max_len = 512; |
There was a problem hiding this comment.
| const size_t max_len = 512; | |
| const size_t MaxLen = 512; |
| // Capture important immediate type/operator information | ||
| // TODO: There's probably more data that would be useful to capture. |
There was a problem hiding this comment.
Another option is to just look at the bytes - that would be most precise (actually use the encoding of the enums), and not hard to do, but slower. Anyhow, yes, larger changes/investigations can be left for later, this looks like a great start!
| sequence.push_back(localSet->type.getID()); | ||
| } else if (auto* const_ = curr->dynCast<Const>()) { | ||
| sequence.push_back(const_->type.getID()); | ||
| } |
There was a problem hiding this comment.
For this PR, you can get all enums using wasm-delegations-fields. It would be shorter than the current code. That + the type would make sense I think?
|
|
||
| void run(Module* module) override { | ||
| // If the number of defined functions is small, similarity-based reordering | ||
| // does not help and can regress size due to increasing LEB size. |
There was a problem hiding this comment.
Wait, doesn't this matter more for large modules? Where there are enough for LEBs to matter?
There was a problem hiding this comment.
I did a deeper dive into this. This heuristic was a quick hack to avoid regressing the small v8_box2d.wasm where re-ordering it made it worse. I assumed this was LEB's or the original ordering was better, but what actually was happening was the gzip command was adding the filename into the file!
I got rid of this code locally and now use gzip -9 -k -n and v8_box2d.wasm also improves 0.33%.
There was a problem hiding this comment.
Oh, funny about the filename! 😄
Nice, yeah, I'd hope this works even on small things.
Implement the
--reorder-functions-by-similarityoptimization pass in wasm-opt.Gzip and Brotli compression algorithms rely on finding repetitive byte patterns inside a sliding window (e.g., 32KB for Gzip). If structurally similar functions are placed far apart in the Wasm binary, the compressor cannot detect matches across them. While the existing --reorder-functions pass sorts functions strictly by call frequency to shrink LEB128 indexes, it scatters mutually compressible functions and ultimately increases gzipped delivery sizes.
This new pass traverses defined function bodies in post-order and extracts a similarity sorting key based on signature type IDs, local variables types, and structural opcode sequences. By sorting defined functions lexicographically by this key, structurally similar functions are physically grouped together in the output binary, providing adjacent compressible bytes.