-
Notifications
You must be signed in to change notification settings - Fork 815
[wasm-split] Do multi-split at once #7956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once. This reduces the running time on a reproducer provided by @biggs0125 before (to fix WebAssembly#7725) from 236s to 88s, reducing it by around 63%. Some side-products of this PR are: - Now we only create a single table to host placeholders (or `ref.null`s in case of `--no-placeholders`) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables. - The names of trampoline functions have been changed in the tests, but semantically they are the same. (e.g. in `test/lit/wasm-split/multi-split.wast`) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhacing readability. --- Some detailed analysis run using the reproducer of WebAssembly#7725, a case where we split a module into 301 (1 primary + 300 secondary) modules: - Before this PR: Time: 236.8s Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- shareImportableItems 62661.1860 28.24% classifyFunctions 42366.7451 19.09% removeUnusedSecondaryElements 33083.6602 14.91% indirectReferencesToSecondaryFunctions 27852.3143 12.55% indirectCallsToSecondaryFunctions 25091.4263 11.31% moveSecondaryFunctions 14159.9166 6.38% writeModule_secondary 9331.1667 4.20% setupTablePatching 3099.9597 1.40% initExportedPrimaryFuncs 1657.0465 0.75% writeModule_primary 901.6800 0.41% exportImportCalledPrimaryFunctions 892.0132 0.40% thunkExportedSecondaryFunctions 826.8599 0.37% initSecondary 0.2241 0.00% --------------------------------------------------------------------------- Overall Total 221924.1985 100.00% ``` - After this PR: Time : 88.40207334437098 Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- shareImportableItems 40176.7000 50.38% removeUnusedSecondaryElements 28635.2000 35.91% moveSecondaryFunctions 5998.9600 7.52% writeModule_secondary 2611.0099 3.27% writeModule_primary 935.7750 1.17% exportImportCalledPrimaryFunctions 646.9860 0.81% indirectReferencesToSecondaryFunctions 318.2980 0.40% classifyFunctions 238.5780 0.30% indirectCallsToSecondaryFunctions 139.1730 0.17% setupTablePatching 44.1466 0.06% thunkExportedSecondaryFunctions 3.9405 0.00% initExportedPrimaryFuncs 0.6870 0.00% --------------------------------------------------------------------------- Overall Total 79749.4539 100.00% ``` We can see time taken in `classifyFunctions`, `indirectReferencesToSecondaryFunctions`, and `indirectCallsToSecondaryFunctions` has reduced basically to nothing. This is because now we can all functions only once in those functions, where we used to scan the functions n times or similar. Now `shareImportableItems` and `moveSecondaryFunctions` take up around 85% of the execution time. The reason `shareImportableItems` takes so long is the reproducer has 90k globals. ``` Analysis of shareImportableItems: Sub-Task Total Time (ms) Percentage --------------------------------------------------------------------------- globals 41166.4904 98.35% tables 535.5134 1.28% tags 10.3355 0.02% memories 7.5937 0.02% exports 1.5482 0.00% --------------------------------------------------------------------------- Total 41857.1000 100.00% ``` ('exports' meaning processing existing exports) We can probably improve this by selectively importing module items, as already noted by the existing TODO. `moveSecondaryFunctions` basically just runs RemoveUnusedModuleElements on each module. We can also consider parallelizing `moveSecondaryFunctions` by modules but not sure how much improvements it can bring given that the pass is already parallized in function granularity. But if we export only used items in `shareImportableItems`, running this pass may become unnecessary after all.
2089525
to
9bfbbe6
Compare
"Hide whitespace" will make the diff easier to view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This is a really nice improvement.
src/ir/module-splitting.h
Outdated
// exists. May or may not include imported functions, which are always kept in | ||
// the primary module regardless. | ||
std::set<Name> secondaryFuncs; | ||
// Module names to the functions that should be in the modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just secondary modules? It would good to clarify in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have this variable anymore after 0c853c5.
src/ir/module-splitting.h
Outdated
|
||
struct Results { | ||
std::unique_ptr<Module> secondary; | ||
std::map<Name, std::unique_ptr<Module>> secondaryPtrMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be reasonable to use a std::vector
instead of a std::map
here and in the Config
to avoid having to worry about generating names before splitting? It seems that would dramatically reduce the number of necessary map lookups necessary because it would lend itself to using indices as the keys used to identify secondary modules and look up their associated data in vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0c853c5. Now we don't pass names from wasm-split.cpp
to ModuleSplitter
at all. We needed a reverse map of function name to index though.
src/tools/wasm-split/wasm-split.cpp
Outdated
currModule = Names::getValidName( | ||
name, [&](Name n) { return moduleNames.find(n) == moduleNames.end(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the names are provided by the user, it might be better to fail loudly if a module name is repeated rather than silently changing one of the names to avoid the collision. (Or keep the current behavior of appending the new functions to the existing module, but that's also probably more dangerous than failing loudly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 7e7fa66
src/tools/wasm-split/wasm-split.cpp
Outdated
currModule = Names::getValidName( | ||
name, [&](Name n) { return moduleNames.find(n) == moduleNames.end(); }); | ||
moduleNames.insert(currModule); | ||
currFuncs = &config.moduleToFuncs[currModule]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid the second lookup here by using the iterator returned by moduleNames.insert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have this map anymore after 0c853c5.
collector.walkFunction(func); | ||
}); | ||
|
||
CalledPrimaryToModules calledPrimaryToModules; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to do a walkModuleCode
or similar here to pick up function references in globals, too. That can be investigated separately, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secondary modules don't have (defined) globals, because we don't split them, no?
Name exportName = Names::getValidExportName(primary, baseName); | ||
primary.addExport(new Export(exportName, kind, primaryItem.name)); | ||
secondaryItem.base = exportName; | ||
exports[std::make_pair(kind, primaryItem.name)] = exportName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a fix to avoid unnecessary duplicate exports? If so, it might be nice to separate that out to reduce test churn in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make any more test churn. This does not have any effect on the current main branch because, we only do a single two-way split at a time, there's no duplicate export. For example, everything you export when you split the first module becomes "existing exports" you need to process in the beginning of this function by the time you split second module here:
binaryen/src/ir/module-splitting.cpp
Lines 833 to 843 in 2010baa
// Map internal names to (one of) their corresponding export names. Don't | |
// consider functions because they have already been imported and exported as | |
// necessary. | |
std::unordered_map<std::pair<ExternalKind, Name>, Name> exports; | |
for (auto& ex : primary.exports) { | |
if (ex->kind != ExternalKind::Function) { | |
if (auto* name = ex->getInternalName()) { | |
exports[std::make_pair(ex->kind, *name)] = ex->name; | |
} | |
} | |
} |
;; MOD1: (import "" "table" (table $timport$0 3 funcref)) | ||
|
||
;; MOD1: (import "" "std::operator<<\\28std::__2::basic_ostream<char\\2c\\20std::__2::char_traits<char>>&\\2c\\20wasm::Module&\\29" (func $std::operator<<\28std::__2::basic_ostream<char\2c\20std::__2::char_traits<char>>&\2c\20wasm::Module&\29 (result f32))) | ||
;; MOD1: (import "" "trampoline_std::operator<<\\28std::__2::basic_ostream<char\\2c\\20std::__2::char_traits<char>>&\\2c\\20wasm::Module&\\29" (func $trampoline_std::operator<<\28std::__2::basic_ostream<char\2c\20std::__2::char_traits<char>>&\2c\20wasm::Module&\29 (result f32))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split the unrelated improvements like adding "trampoline" prefixes to names out into preliminary PRs to minimize test churn? It would also be nice if we can avoid changing the order of trampoline functions in the output (but I can imagine that might be more effort than it's worth).
- Remove old comments + fix comments - Rename a variable - Take primary module's symbolmap and placeholdermap writing out of the for loop
Co-authored-by: Thomas Lively <tlively123@gmail.com>
// Avoid visitRefFunc on element segment data | ||
void walkElementSegment(ElementSegment* segment) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary because CallIndirector
does not have visitRefFunc
. (I guess this class used to have visitRefFunc
and this was added then)
This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once.
This reduces the running time on a reproducer provided by @biggs0125 before (to fix #7725) from 236s to 88s on my machine, reducing it by around 63%.
Some side-products of this PR are:
ref.null
s in case of--no-placeholders
) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables.test/lit/wasm-split/multi-split.wast
) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhancing readability.Some detailed analysis run using the reproducer of #7725, a case where we split a module into 301 (1 primary + 300 secondary) modules:
Time: 236.8s
Task breakdown:
Time : 88.4s
Task breakdown:
We can see time taken in
classifyFunctions
,indirectReferencesToSecondaryFunctions
, andindirectCallsToSecondaryFunctions
has reduced basically to nothing. This is because now we can scan all functions only once in those functions, where we used to scan the functions n times or similar.Now
shareImportableItems
andmoveSecondaryFunctions
take up around 85% of the execution time. The reasonshareImportableItems
takes so long is the reproducer has 90k globals.('exports' meaning processing existing exports)
We can probably improve this by selectively importing module items, as already noted by the existing TODO.
moveSecondaryFunctions
basically just runs RemoveUnusedModuleElements on each module. But if we export only used items inshareImportableItems
, running this pass may become unnecessary after all.