[wasm-split] Do multi-split at once #7956

aheejin · 2025-10-08T01:06:22Z

This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once.

This reduces the running time on a reproducer provided by @biggs0125 before (to fix #7725) from 236s to 88s on my machine, reducing it by around 63%.

Some side-products of this PR are:

Now we only create a single table to host placeholders (or ref.nulls in case of --no-placeholders) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables.
The names of trampoline functions have been changed in the tests, but semantically they are the same. (e.g. in test/lit/wasm-split/multi-split.wast) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhancing readability.

Some detailed analysis run using the reproducer of #7725, a case where we split a module into 301 (1 primary + 300 secondary) modules:

Before this PR:

Time: 236.8s

Task breakdown:

Task                                          Total Time (ms)      Percentage
-----------------------------------------------------------------------------
shareImportableItems                               62661.1860          28.24%
classifyFunctions                                  42366.7451          19.09%
removeUnusedSecondaryElements                      33083.6602          14.91%
indirectReferencesToSecondaryFunctions             27852.3143          12.55%
indirectCallsToSecondaryFunctions                  25091.4263          11.31%
moveSecondaryFunctions                             14159.9166           6.38%
writeModule_secondary                               9331.1667           4.20%
setupTablePatching                                  3099.9597           1.40%
initExportedPrimaryFuncs                            1657.0465           0.75%
writeModule_primary                                  901.6800           0.41%
exportImportCalledPrimaryFunctions                   892.0132           0.40%
thunkExportedSecondaryFunctions                      826.8599           0.37%
-----------------------------------------------------------------------------
Overall Total                                     221924.1985         100.00%

After this PR:

Time : 88.4s

Task breakdown:

Task                                          Total Time (ms)      Percentage
-----------------------------------------------------------------------------
shareImportableItems                               40176.7000          50.38%
removeUnusedSecondaryElements                      28635.2000          35.91%
moveSecondaryFunctions                              5998.9600           7.52%
writeModule_secondary                               2611.0099           3.27%
writeModule_primary                                  935.7750           1.17%
exportImportCalledPrimaryFunctions                   646.9860           0.81%
indirectReferencesToSecondaryFunctions               318.2980           0.40%
classifyFunctions                                    238.5780           0.30%
indirectCallsToSecondaryFunctions                    139.1730           0.17%
setupTablePatching                                    44.1466           0.06%
thunkExportedSecondaryFunctions                        3.9405           0.00%
initExportedPrimaryFuncs                               0.6870           0.00%
-----------------------------------------------------------------------------
Overall Total                                      79749.4539         100.00%

We can see time taken in classifyFunctions, indirectReferencesToSecondaryFunctions, and indirectCallsToSecondaryFunctions has reduced basically to nothing. This is because now we can scan all functions only once in those functions, where we used to scan the functions n times or similar.

Now shareImportableItems and moveSecondaryFunctions take up around 85% of the execution time. The reason shareImportableItems takes so long is the reproducer has 90k globals.

Analysis of shareImportableItems:
Sub-Task                                      Total Time (ms)      Percentage
-----------------------------------------------------------------------------
globals                                            41166.4904          98.35%
tables                                               535.5134           1.28%
tags                                                  10.3355           0.02%
memories                                               7.5937           0.02%
exports                                                1.5482           0.00%
-----------------------------------------------------------------------------
Total                                              41857.1000         100.00%

('exports' meaning processing existing exports)

We can probably improve this by selectively importing module items, as already noted by the existing TODO.

moveSecondaryFunctions basically just runs RemoveUnusedModuleElements on each module. But if we export only used items in shareImportableItems, running this pass may become unnecessary after all.

@biggs0125

This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once. This reduces the running time on a reproducer provided by @biggs0125 before (to fix WebAssembly#7725) from 236s to 88s, reducing it by around 63%. Some side-products of this PR are: - Now we only create a single table to host placeholders (or `ref.null`s in case of `--no-placeholders`) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables. - The names of trampoline functions have been changed in the tests, but semantically they are the same. (e.g. in `test/lit/wasm-split/multi-split.wast`) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhacing readability. --- Some detailed analysis run using the reproducer of WebAssembly#7725, a case where we split a module into 301 (1 primary + 300 secondary) modules: - Before this PR: Time: 236.8s Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- shareImportableItems 62661.1860 28.24% classifyFunctions 42366.7451 19.09% removeUnusedSecondaryElements 33083.6602 14.91% indirectReferencesToSecondaryFunctions 27852.3143 12.55% indirectCallsToSecondaryFunctions 25091.4263 11.31% moveSecondaryFunctions 14159.9166 6.38% writeModule_secondary 9331.1667 4.20% setupTablePatching 3099.9597 1.40% initExportedPrimaryFuncs 1657.0465 0.75% writeModule_primary 901.6800 0.41% exportImportCalledPrimaryFunctions 892.0132 0.40% thunkExportedSecondaryFunctions 826.8599 0.37% initSecondary 0.2241 0.00% --------------------------------------------------------------------------- Overall Total 221924.1985 100.00% ``` - After this PR: Time : 88.40207334437098 Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- shareImportableItems 40176.7000 50.38% removeUnusedSecondaryElements 28635.2000 35.91% moveSecondaryFunctions 5998.9600 7.52% writeModule_secondary 2611.0099 3.27% writeModule_primary 935.7750 1.17% exportImportCalledPrimaryFunctions 646.9860 0.81% indirectReferencesToSecondaryFunctions 318.2980 0.40% classifyFunctions 238.5780 0.30% indirectCallsToSecondaryFunctions 139.1730 0.17% setupTablePatching 44.1466 0.06% thunkExportedSecondaryFunctions 3.9405 0.00% initExportedPrimaryFuncs 0.6870 0.00% --------------------------------------------------------------------------- Overall Total 79749.4539 100.00% ``` We can see time taken in `classifyFunctions`, `indirectReferencesToSecondaryFunctions`, and `indirectCallsToSecondaryFunctions` has reduced basically to nothing. This is because now we can all functions only once in those functions, where we used to scan the functions n times or similar. Now `shareImportableItems` and `moveSecondaryFunctions` take up around 85% of the execution time. The reason `shareImportableItems` takes so long is the reproducer has 90k globals. ``` Analysis of shareImportableItems: Sub-Task Total Time (ms) Percentage --------------------------------------------------------------------------- globals 41166.4904 98.35% tables 535.5134 1.28% tags 10.3355 0.02% memories 7.5937 0.02% exports 1.5482 0.00% --------------------------------------------------------------------------- Total 41857.1000 100.00% ``` ('exports' meaning processing existing exports) We can probably improve this by selectively importing module items, as already noted by the existing TODO. `moveSecondaryFunctions` basically just runs RemoveUnusedModuleElements on each module. We can also consider parallelizing `moveSecondaryFunctions` by modules but not sure how much improvements it can bring given that the pass is already parallized in function granularity. But if we export only used items in `shareImportableItems`, running this pass may become unnecessary after all.

aheejin · 2025-10-08T01:10:06Z

"Hide whitespace" will make the diff easier to view.

tlively

Great work! This is a really nice improvement.

tlively · 2025-10-08T01:17:57Z

src/ir/module-splitting.h

-  // exists. May or may not include imported functions, which are always kept in
-  // the primary module regardless.
-  std::set<Name> secondaryFuncs;
+  // Module names to the functions that should be in the modules.


Is this just secondary modules? It would good to clarify in the comment.

We don't have this variable anymore after 0c853c5.

tlively · 2025-10-08T01:22:35Z

src/ir/module-splitting.h


 struct Results {
-  std::unique_ptr<Module> secondary;
+  std::map<Name, std::unique_ptr<Module>> secondaryPtrMap;


Would it be reasonable to use a std::vector instead of a std::map here and in the Config to avoid having to worry about generating names before splitting? It seems that would dramatically reduce the number of necessary map lookups necessary because it would lend itself to using indices as the keys used to identify secondary modules and look up their associated data in vectors.

Done in 0c853c5. Now we don't pass names from wasm-split.cpp to ModuleSplitter at all. We needed a reverse map of function name to index though.

tlively · 2025-10-08T01:26:22Z

src/tools/wasm-split/wasm-split.cpp

+      currModule = Names::getValidName(
+        name, [&](Name n) { return moduleNames.find(n) == moduleNames.end(); });


Since the names are provided by the user, it might be better to fail loudly if a module name is repeated rather than silently changing one of the names to avoid the collision. (Or keep the current behavior of appending the new functions to the existing module, but that's also probably more dangerous than failing loudly.)

Done: 7e7fa66

tlively · 2025-10-08T01:28:16Z

src/tools/wasm-split/wasm-split.cpp

+      currModule = Names::getValidName(
+        name, [&](Name n) { return moduleNames.find(n) == moduleNames.end(); });
+      moduleNames.insert(currModule);
+      currFuncs = &config.moduleToFuncs[currModule];


We can avoid the second lookup here by using the iterator returned by moduleNames.insert.

We don't have this map anymore after 0c853c5.

src/ir/module-splitting.cpp

tlively · 2025-10-08T02:31:45Z

src/ir/module-splitting.cpp

+        collector.walkFunction(func);
+      });
+
+    CalledPrimaryToModules calledPrimaryToModules;


I wonder if we need to do a walkModuleCode or similar here to pick up function references in globals, too. That can be investigated separately, though.

Secondary modules don't have (defined) globals, because we don't split them, no?

tlively · 2025-10-08T02:36:49Z

src/ir/module-splitting.cpp

      Name exportName = Names::getValidExportName(primary, baseName);
      primary.addExport(new Export(exportName, kind, primaryItem.name));
      secondaryItem.base = exportName;
+      exports[std::make_pair(kind, primaryItem.name)] = exportName;


Is this a fix to avoid unnecessary duplicate exports? If so, it might be nice to separate that out to reduce test churn in this PR.

It doesn't make any more test churn. This does not have any effect on the current main branch because, we only do a single two-way split at a time, there's no duplicate export. For example, everything you export when you split the first module becomes "existing exports" you need to process in the beginning of this function by the time you split second module here:

binaryen/src/ir/module-splitting.cpp

Lines 833 to 843 in 2010baa

// Map internal names to (one of) their corresponding export names. Don't

// consider functions because they have already been imported and exported as

// necessary.

std::unordered_map<std::pair<ExternalKind, Name>, Name> exports;

for (auto& ex : primary.exports) {

if (ex->kind != ExternalKind::Function) {

if (auto* name = ex->getInternalName()) {

exports[std::make_pair(ex->kind, *name)] = ex->name;

}

}

}

tlively · 2025-10-08T02:42:26Z

test/lit/wasm-split/multi-split-escape-names.wast

+ ;; MOD1:      (import "" "table" (table $timport$0 3 funcref))

- ;; MOD1:      (import "" "std::operator<<\\28std::__2::basic_ostream<char\\2c\\20std::__2::char_traits<char>>&\\2c\\20wasm::Module&\\29" (func $std::operator<<\28std::__2::basic_ostream<char\2c\20std::__2::char_traits<char>>&\2c\20wasm::Module&\29 (result f32)))
+ ;; MOD1:      (import "" "trampoline_std::operator<<\\28std::__2::basic_ostream<char\\2c\\20std::__2::char_traits<char>>&\\2c\\20wasm::Module&\\29" (func $trampoline_std::operator<<\28std::__2::basic_ostream<char\2c\20std::__2::char_traits<char>>&\2c\20wasm::Module&\29 (result f32)))


Can we split the unrelated improvements like adding "trampoline" prefixes to names out into preliminary PRs to minimize test churn? It would also be nice if we can avoid changing the order of trampoline functions in the output (but I can imagine that might be more effort than it's worth).

- Remove old comments + fix comments - Rename a variable - Take primary module's symbolmap and placeholdermap writing out of the for loop

Co-authored-by: Thomas Lively <tlively123@gmail.com>

aheejin · 2025-10-09T22:17:12Z

src/ir/module-splitting.cpp

-    // Avoid visitRefFunc on element segment data
-    void walkElementSegment(ElementSegment* segment) {}


This is unnecessary because CallIndirector does not have visitRefFunc. (I guess this class used to have visitRefFunc and this was added then)

…modules"

aheejin requested a review from tlively October 8, 2025 01:06

aheejin force-pushed the wasm_split_multi_once branch from 2089525 to 9bfbbe6 Compare October 8, 2025 01:08

tlively reviewed Oct 8, 2025

View reviewed changes

aheejin and others added 3 commits October 8, 2025 08:44

Misc fixes

7de862e

- Remove old comments + fix comments - Rename a variable - Take primary module's symbolmap and placeholdermap writing out of the for loop

Comment fix

4d2475e

Apply suggestions from code review

566249c

Co-authored-by: Thomas Lively <tlively123@gmail.com>

aheejin commented Oct 9, 2025

View reviewed changes

aheejin added 6 commits October 9, 2025 22:24

Fixing errors after apply suggestions

d071d5c

Make CallCollector calls one-liner

aece184

Use vector instead of map to store secondary modules

0c853c5

Error out when there are duplicate module names

7e7fa66

Example test fix after "Use vector instead of map to store secondary …

d275cc4

…modules"

Revert an accidental change

46c907f

		currModule = Names::getValidName(
		name, [&](Name n) { return moduleNames.find(n) == moduleNames.end(); });

	// Map internal names to (one of) their corresponding export names. Don't
	// consider functions because they have already been imported and exported as
	// necessary.
	std::unordered_map<std::pair<ExternalKind, Name>, Name> exports;
	for (auto& ex : primary.exports) {
	if (ex->kind != ExternalKind::Function) {
	if (auto* name = ex->getInternalName()) {
	exports[std::make_pair(ex->kind, *name)] = ex->name;
	}
	}
	}

		// Avoid visitRefFunc on element segment data
		void walkElementSegment(ElementSegment* segment) {}

[wasm-split] Do multi-split at once #7956

Are you sure you want to change the base?

[wasm-split] Do multi-split at once #7956

Conversation

aheejin commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aheejin commented Oct 8, 2025

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aheejin commented Oct 8, 2025 •

edited

Loading

aheejin Oct 9, 2025 •

edited

Loading

aheejin Oct 9, 2025 •

edited

Loading