Skip to content

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Oct 8, 2025

This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once.

This reduces the running time on a reproducer provided by @biggs0125 before (to fix #7725) from 236s to 88s on my machine, reducing it by around 63%.

Some side-products of this PR are:

  • Now we only create a single table to host placeholders (or ref.nulls in case of --no-placeholders) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables.
  • The names of trampoline functions have been changed in the tests, but semantically they are the same. (e.g. in test/lit/wasm-split/multi-split.wast) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhancing readability.

Some detailed analysis run using the reproducer of #7725, a case where we split a module into 301 (1 primary + 300 secondary) modules:

  • Before this PR:

Time: 236.8s

Task breakdown:

Task                                          Total Time (ms)      Percentage
-----------------------------------------------------------------------------
shareImportableItems                               62661.1860          28.24%
classifyFunctions                                  42366.7451          19.09%
removeUnusedSecondaryElements                      33083.6602          14.91%
indirectReferencesToSecondaryFunctions             27852.3143          12.55%
indirectCallsToSecondaryFunctions                  25091.4263          11.31%
moveSecondaryFunctions                             14159.9166           6.38%
writeModule_secondary                               9331.1667           4.20%
setupTablePatching                                  3099.9597           1.40%
initExportedPrimaryFuncs                            1657.0465           0.75%
writeModule_primary                                  901.6800           0.41%
exportImportCalledPrimaryFunctions                   892.0132           0.40%
thunkExportedSecondaryFunctions                      826.8599           0.37%
-----------------------------------------------------------------------------
Overall Total                                     221924.1985         100.00%
  • After this PR:

Time : 88.4s

Task breakdown:

Task                                          Total Time (ms)      Percentage
-----------------------------------------------------------------------------
shareImportableItems                               40176.7000          50.38%
removeUnusedSecondaryElements                      28635.2000          35.91%
moveSecondaryFunctions                              5998.9600           7.52%
writeModule_secondary                               2611.0099           3.27%
writeModule_primary                                  935.7750           1.17%
exportImportCalledPrimaryFunctions                   646.9860           0.81%
indirectReferencesToSecondaryFunctions               318.2980           0.40%
classifyFunctions                                    238.5780           0.30%
indirectCallsToSecondaryFunctions                    139.1730           0.17%
setupTablePatching                                    44.1466           0.06%
thunkExportedSecondaryFunctions                        3.9405           0.00%
initExportedPrimaryFuncs                               0.6870           0.00%
-----------------------------------------------------------------------------
Overall Total                                      79749.4539         100.00%

We can see time taken in classifyFunctions, indirectReferencesToSecondaryFunctions, and indirectCallsToSecondaryFunctions has reduced basically to nothing. This is because now we can scan all functions only once in those functions, where we used to scan the functions n times or similar.

Now shareImportableItems and moveSecondaryFunctions take up around 85% of the execution time. The reason shareImportableItems takes so long is the reproducer has 90k globals.

Analysis of shareImportableItems:
Sub-Task                                      Total Time (ms)      Percentage
-----------------------------------------------------------------------------
globals                                            41166.4904          98.35%
tables                                               535.5134           1.28%
tags                                                  10.3355           0.02%
memories                                               7.5937           0.02%
exports                                                1.5482           0.00%
-----------------------------------------------------------------------------
Total                                              41857.1000         100.00%

('exports' meaning processing existing exports)

We can probably improve this by selectively importing module items, as already noted by the existing TODO.

moveSecondaryFunctions basically just runs RemoveUnusedModuleElements on each module. But if we export only used items in shareImportableItems, running this pass may become unnecessary after all.

This does multi-splitting of modules at once, rather than splitting them
one by one by doing 2-way split n times. Previously when we did
multi-splitting we split the 1st module as the "secondary" module
assuming all other functions belonging to 2nd-nth modules as "primary"
module. And then we repeat the same task for the 2nd module, assuming
3rd-nth module functions belong to the "primary" module. This
unnecessarily repeated some tasks that could have been done once.

This reduces the running time on a reproducer provided by @biggs0125
before (to fix WebAssembly#7725) from 236s to 88s, reducing it by around 63%.

Some side-products of this PR are:
- Now we only create a single table to host placeholders (or `ref.null`s
  in case of `--no-placeholders`) even when reference-types is enabled.
  Previously we created a table per secondary module, resulting in n
  tables.
- The names of trampoline functions have been changed in the tests, but
  semantically they are the same. (e.g. in
  `test/lit/wasm-split/multi-split.wast`) The reason for the change is,
  previously we split modules one by one, by the time we split the first
  module, it assumed functions belonging to other secondary modules were
  primary functions, but they later changed to trampolines as well. Now
  they are all named as trampolines, arguably enhacing readability.

---

Some detailed analysis run using the reproducer of WebAssembly#7725, a case where
we split a module into 301 (1 primary + 300 secondary) modules:

- Before this PR:

Time: 236.8s

Task breakdown:
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
shareImportableItems                               62661.1860          28.24%
classifyFunctions                                  42366.7451          19.09%
removeUnusedSecondaryElements                      33083.6602          14.91%
indirectReferencesToSecondaryFunctions             27852.3143          12.55%
indirectCallsToSecondaryFunctions                  25091.4263          11.31%
moveSecondaryFunctions                             14159.9166           6.38%
writeModule_secondary                               9331.1667           4.20%
setupTablePatching                                  3099.9597           1.40%
initExportedPrimaryFuncs                            1657.0465           0.75%
writeModule_primary                                  901.6800           0.41%
exportImportCalledPrimaryFunctions                   892.0132           0.40%
thunkExportedSecondaryFunctions                      826.8599           0.37%
initSecondary                                          0.2241           0.00%
---------------------------------------------------------------------------
Overall Total                                     221924.1985         100.00%
```

- After this PR:

Time : 88.40207334437098

Task breakdown:
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
shareImportableItems                               40176.7000          50.38%
removeUnusedSecondaryElements                      28635.2000          35.91%
moveSecondaryFunctions                              5998.9600           7.52%
writeModule_secondary                               2611.0099           3.27%
writeModule_primary                                  935.7750           1.17%
exportImportCalledPrimaryFunctions                   646.9860           0.81%
indirectReferencesToSecondaryFunctions               318.2980           0.40%
classifyFunctions                                    238.5780           0.30%
indirectCallsToSecondaryFunctions                    139.1730           0.17%
setupTablePatching                                    44.1466           0.06%
thunkExportedSecondaryFunctions                        3.9405           0.00%
initExportedPrimaryFuncs                               0.6870           0.00%
---------------------------------------------------------------------------
Overall Total                                      79749.4539         100.00%
```

We can see time taken in `classifyFunctions`,
`indirectReferencesToSecondaryFunctions`, and
`indirectCallsToSecondaryFunctions` has reduced basically to nothing.
This is because now we can all functions only once in those functions,
where we used to scan the functions n times or similar.

Now `shareImportableItems` and `moveSecondaryFunctions` take up around
85% of the execution time. The reason `shareImportableItems` takes so
long is the reproducer has 90k globals.
```
Analysis of shareImportableItems:
Sub-Task                                      Total Time (ms)      Percentage
---------------------------------------------------------------------------
globals                                            41166.4904          98.35%
tables                                               535.5134           1.28%
tags                                                  10.3355           0.02%
memories                                               7.5937           0.02%
exports                                                1.5482           0.00%
---------------------------------------------------------------------------
Total                                              41857.1000         100.00%
```
('exports' meaning processing existing exports)

We can probably improve this by selectively importing module items, as
already noted by the existing TODO.

`moveSecondaryFunctions` basically just runs RemoveUnusedModuleElements
on each module. We can also consider parallelizing
`moveSecondaryFunctions` by modules but not sure how much improvements
it can bring given that the pass is already parallized in function
granularity. But if we export only used items in `shareImportableItems`,
running this pass may become unnecessary after all.
@aheejin aheejin requested a review from tlively October 8, 2025 01:06
@aheejin aheejin force-pushed the wasm_split_multi_once branch from 2089525 to 9bfbbe6 Compare October 8, 2025 01:08
@aheejin
Copy link
Member Author

aheejin commented Oct 8, 2025

"Hide whitespace" will make the diff easier to view.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This is a really nice improvement.

// exists. May or may not include imported functions, which are always kept in
// the primary module regardless.
std::set<Name> secondaryFuncs;
// Module names to the functions that should be in the modules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just secondary modules? It would good to clarify in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this variable anymore after 0c853c5.


struct Results {
std::unique_ptr<Module> secondary;
std::map<Name, std::unique_ptr<Module>> secondaryPtrMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to use a std::vector instead of a std::map here and in the Config to avoid having to worry about generating names before splitting? It seems that would dramatically reduce the number of necessary map lookups necessary because it would lend itself to using indices as the keys used to identify secondary modules and look up their associated data in vectors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0c853c5. Now we don't pass names from wasm-split.cpp to ModuleSplitter at all. We needed a reverse map of function name to index though.

Comment on lines 411 to 412
currModule = Names::getValidName(
name, [&](Name n) { return moduleNames.find(n) == moduleNames.end(); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the names are provided by the user, it might be better to fail loudly if a module name is repeated rather than silently changing one of the names to avoid the collision. (Or keep the current behavior of appending the new functions to the existing module, but that's also probably more dangerous than failing loudly.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 7e7fa66

currModule = Names::getValidName(
name, [&](Name n) { return moduleNames.find(n) == moduleNames.end(); });
moduleNames.insert(currModule);
currFuncs = &config.moduleToFuncs[currModule];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the second lookup here by using the iterator returned by moduleNames.insert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this map anymore after 0c853c5.

collector.walkFunction(func);
});

CalledPrimaryToModules calledPrimaryToModules;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to do a walkModuleCode or similar here to pick up function references in globals, too. That can be investigated separately, though.

Copy link
Member Author

@aheejin aheejin Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondary modules don't have (defined) globals, because we don't split them, no?

Name exportName = Names::getValidExportName(primary, baseName);
primary.addExport(new Export(exportName, kind, primaryItem.name));
secondaryItem.base = exportName;
exports[std::make_pair(kind, primaryItem.name)] = exportName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a fix to avoid unnecessary duplicate exports? If so, it might be nice to separate that out to reduce test churn in this PR.

Copy link
Member Author

@aheejin aheejin Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make any more test churn. This does not have any effect on the current main branch because, we only do a single two-way split at a time, there's no duplicate export. For example, everything you export when you split the first module becomes "existing exports" you need to process in the beginning of this function by the time you split second module here:

// Map internal names to (one of) their corresponding export names. Don't
// consider functions because they have already been imported and exported as
// necessary.
std::unordered_map<std::pair<ExternalKind, Name>, Name> exports;
for (auto& ex : primary.exports) {
if (ex->kind != ExternalKind::Function) {
if (auto* name = ex->getInternalName()) {
exports[std::make_pair(ex->kind, *name)] = ex->name;
}
}
}

;; MOD1: (import "" "table" (table $timport$0 3 funcref))

;; MOD1: (import "" "std::operator<<\\28std::__2::basic_ostream<char\\2c\\20std::__2::char_traits<char>>&\\2c\\20wasm::Module&\\29" (func $std::operator<<\28std::__2::basic_ostream<char\2c\20std::__2::char_traits<char>>&\2c\20wasm::Module&\29 (result f32)))
;; MOD1: (import "" "trampoline_std::operator<<\\28std::__2::basic_ostream<char\\2c\\20std::__2::char_traits<char>>&\\2c\\20wasm::Module&\\29" (func $trampoline_std::operator<<\28std::__2::basic_ostream<char\2c\20std::__2::char_traits<char>>&\2c\20wasm::Module&\29 (result f32)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split the unrelated improvements like adding "trampoline" prefixes to names out into preliminary PRs to minimize test churn? It would also be nice if we can avoid changing the order of trampoline functions in the output (but I can imagine that might be more effort than it's worth).

aheejin and others added 3 commits October 8, 2025 08:44
- Remove old comments + fix comments
- Rename a variable
- Take primary module's symbolmap and placeholdermap writing out of the
  for loop
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Comment on lines -655 to -656
// Avoid visitRefFunc on element segment data
void walkElementSegment(ElementSegment* segment) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary because CallIndirector does not have visitRefFunc. (I guess this class used to have visitRefFunc and this was added then)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[wasm-split] Not splitting correctly after wasm-opt
2 participants