Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose a new LLVM CPU configuration: Lime1 #235

Merged
merged 3 commits into from
Nov 19, 2024

Conversation

sunfishcode
Copy link
Member

Overview

I propose a new target CPU configuration for LLVM, and any toolchain or engine that wishes to share it, named "trail1".

Trail1 would enable mutable-globals, bulk-memory-opt*, multivalue, sign-ext, nontrapping-fptoint, extended-const, and call-indirect-overlong**. The specific set is open to debate here. The idea is, once we stabilize it, Trail1 would be stable and we would not add or remove features from that point on. Trail1 could be followed by Trail2 and so on in the future if we wish to add or remove features after that. And we can always define new non-Trail CPUs as well, if there are needs for different sets.

* bulk-memory-opt would be a new feature that just includes memory.copy and memory.fill, and not the rest of bulk-memory

** call-indirect-overlong would be a new feature that just introduces the new call_indirect encoding from reference-types and not the actual reference types.

Further background

The LLVM Wasm target defaults to a "generic" CPU, which adds features over time as major engines support them. That's good for some users, but other users would benefit from more stable CPU options. Currently the only stable CPU in LLVM is "mvp", however that's pretty old at this point. We've always planned to add more CPUs, but haven't had convenient occasions. The Wasm spec gives us "Wasm 1.0" and "Wasm 2.0" and so on, however while those names may sound like curated user-facing versions, in practice they're more like snapshots of the spec at moments when the spec process reaches certain points.

There have been discussions at the CG level about defining official subsets of the language, for engines that don't want to implement threads or other things. If the CG ends up defining specific subsets, we can certainly add new stable CPUs based on them. However, I expect that'll will be some time in the future yet, and it seems beneficial to define something LLVM can use sooner as well.

So because there is no convenient external milestones, let's invent one! The name "Trail" is arbitrarily chosen, and open to debate here. We mainly just need to pick something that won't be misleading. Just as the Wasm spec itself has no concept of "wasm32", it won't have any concept of "trail" either. Trail will just be an informal tooling and engine convention.

## Overview

I propose a new target CPU configuration for LLVM, and any toolchain or engine that wishes to share it, named "trail1".

Trail1 would enable mutable-globals, bulk-memory-opt*, multivalue, sign-ext, nontrapping-fptoint, extended-const, and call-indirect-overlong**. The specific set is open to debate here. The idea is, once we stabilize it, Trail1 would be stable and we would not add or remove features from that point on. Trail1 could be followed by Trail2 and so on in the future if we wish to add or remove features after that. And we can always define new non-Trail CPUs as well, if there are needs for different sets.

\* bulk-memory-opt would be a new feature that just includes `memory.copy` and `memory.fill`, and not the rest of bulk-memory

\*\* call-indirect-overlong would be a new feature that just introduces the new `call_indirect` encoding from reference-types and not the actual reference types.

## Further background

The LLVM Wasm target defaults to a "generic" CPU, which adds features over time as major engines support them. That's good for some users, but other users would benefit from more stable CPU options. Currently the only stable CPU in LLVM is "mvp", however that's pretty old at this point. We've always planned to add more CPUs, but haven't had convenient occasions. The Wasm spec gives us "Wasm 1.0" and "Wasm 2.0" and so on, however while those names may sound like curated user-facing versions, in practice they're more like snapshots of the spec at moments when the spec process reaches certain points.

There have been discussions at the CG level about defining official subsets of the language, for engines that don't want to implement threads or other things. If the CG ends up defining specific subsets, we can certainly add new stable CPUs based on them. However, I expect that'll will be some time in the future yet, and it seems beneficial to define something LLVM can use sooner as well.

So because there is no convenient external milestones, let's invent one! The name "Trail" is arbitrarily chosen, and open to debate here. We mainly just need to pick something that won't be misleading. Just as the Wasm spec itself has no concept of "wasm32", it won't have any concept of "trail" either. Trail will just be an informal tooling and engine convention.
@sunfishcode
Copy link
Member Author

I've now posted a draft PR to LLVM showing specifically what the "trail1" CPU would look like in llvm/llvm-project#112035.

@alexcrichton
Copy link
Collaborator

In the absence of an official subslice of the spec do you think it would be reasonable to define these CPU configurations as "this defined version of the spec" plus extensions? (or the other way around, a defined version of the spec minus some features). That I think would make it a bit easier to have a canonical reference for what this is supposed to be.

In the long-term I'd also love to see the wasm test suite itself curated along these lines too. For example if an engine claims to implement "trail1" there's no actual compliance suite for it to run because the spec test suite is too expansive and it's not clear which tests can be run and which shouldn't. That's more of a concern for if/when this picks up energy to take it upstream though.

@sunfishcode
Copy link
Member Author

sunfishcode commented Oct 11, 2024

Trail1 as defined here is Wasm 1.0 plus mutable-globals, bulk-memory-opt*, multivalue, sign-ext, nontrapping-fptoint, extended-const, and call-indirect-overlong**. I've now added that to the document.

Alternatively, it's Wasm 2.0 minus reference types (overlong call_indirect encodings notwithstanding), table instructions, memory.init, data.drop, multiple tables, and vector instructions. Wasm 2.0 is still an evolving draft at this time, but when new Wasm standards are published, we'll probably want to update to this form of definition.

For what it's worth, all of Table1's features except extended-const are currently merged into the current draft and published on the spec website.

From talking with folks, it seems one of the main things holding back official subslices at the spec level is a lack of real-world experience motivating the design. This Trail1 concept could be a step toward building that real-world experience. If Trail1 turns out to be useful, perhaps it'll be a good motivator and first candidate for subslicing.

@dschuff
Copy link
Member

dschuff commented Oct 11, 2024

Generally-speaking I like the idea of defining useful subsets of features. I agree that the obvious spec-version-based slices are more artifacts of when things got finalized than anything else, and having sets that are stable (even if it's just in a particular context like LLVM) could be useful. I think that how useful it would be beyond this context remains to be seen. The subset you have for trail1 seems obvious for the set of languages that would use LLVM, but another language might really need those plus reftypes or exceptions (or multi-value!), and it's less obvious that those groupings are ideal sets for an implementation. But maybe they are!
Just from the pure toolchain perspective, Emscripten has so far mostly focused on helping users handle the variation in browser support, but in a world where all of the relevant features are supported by all the browsers (perhaps not too far in the future now), grouping them in more logical ways could be useful, and help both us and users avoid the worst of the combinatorial explosions. (In the Emscripten context, I'd also add BigInt to the bottom tier.)

The idea of subsetting the spec proposals is interesting, but it makes a lot of sense. You could imagine a "threading" addon including more of bulk memory (passive segments, memory.init, data.drop) and threads (I call that an "addon" since there could be good reasons why you would want to disable it even if it's supported everywhere). An "exceptions" addon already adds exnref, reference types, and multi-value (also tables? AFAIK nobody really needs multi-table). Although, whether exceptions is something that you'd just always have on or make opt-in might depend on how much you care about the code size bloat. So there's still room for disagreements based on more specific priorities.
I guess the downside of slicing the proposals would be bigger if this starts to be something implementations use. The feature matrix is already pretty big.
But yeah generally speaking, I also like the idea of trying this out as a way to get experience and potentially expand it more broadly.

@workingjubilee
Copy link

workingjubilee commented Oct 11, 2024

Hm, multi-value is part of the proposed trail1 featureset:

enable mutable-globals, bulk-memory-opt*, multivalue, sign-ext, nontrapping-fptoint, extended-const, and call-indirect-overlong**.

Relevant to the discussion, a feature matrix has already been collected of what is currently supported by actively-maintained non-"major" implementations of wasm that were surveyed: https://docs.google.com/spreadsheets/d/1VEGeOP9coOBKt-N5FQ9G3ACPPavhf66m4A_3u6iVldc/edit?gid=0#gid=0

Of the trail1 features, the only ones that aren't widely implemented is extended-const and call-indirect-overlong. Those are expected to be trivial to support, because one is using a LEB128 parser that you already have in a new place, and the other is just constifying a few integer operations.

@dschuff
Copy link
Member

dschuff commented Oct 11, 2024

Hm, multi-value is part of the proposed trail1 featureset:

Oops, I just missed that, thanks. Currently LLVM doesn't really make much use of multi-value at all (at least not for block types), but more broadly it the feature does make sense as being in the "lowest" tier.

Relevant to the discussion, a feature matrix has already been collected of what is currently supported by actively-maintained non-"major" implementations of wasm that were surveyed

Thanks for that link!
(A small digression here, but I wonder if there's a useful way to make that information even more broadly available. I don't think our existing feature page on webassembly.org would scale that far, but it seems like it would be useful to have that info surfaced somewhere. Maybe even a wikipedia page or something.)

@kripken
Copy link
Member

kripken commented Oct 14, 2024

Thinking of names for CPU configurations, perhaps we could do something like a tech tree:

Math
  1. Non-trapping float to int
    2. SIMD
      3. Relaxed SIMD

Linear memory
  1. Bulk memory operations

References
  1. Reference types
    2. Exceptions
      3. GC

Then someone compiling Rust or C would want the Math and Linear memory trees (but not References) and set them at certain levels (e.g. Math level 2, Linear memory 1; someone doing C++ with exceptions would add References 2). Someone doing Kotlin or Java would want Math and Reference types but not Linear memory.

@sunfishcode
Copy link
Member Author

@kripken Yes, I think something like that makes sense. Trail1 here could be seen as a first step in that direction. Trail1 (or TrailN) could be interpreted as the "Linear memory" branch of that tree.

That said, I'd suggest some minor tweaks:

  • I think nontrapping-fptoint could become part of the "base language" that everyone just uses, rather than being forever optional, and thus often not used in practice.

  • I don't want to split features too much, but Bulk Memory feels like a case where it's justified. memory.copy/memory.init are things that could well be treated as the base of the "Linear memory" branch, so that linear-memory-using toolchains can just use them instead of forever feature-testing/versioning for them. And on the other hand, mutable and/or multiple tables are things that I'd imagine we'd need to see more use cases for before fitting them into a tech tree like this.

@kripken
Copy link
Member

kripken commented Oct 14, 2024

@sunfishcode

Perhaps Trail1 => LinearMemory1, then? I wouldn't be opposed to having nontrapping-fptoint there (as 99% of linear memory languages will want it).

My main concern is that "Trail1"/"Path1"/"Milestone1"/etc. sound very generic, and it isn't obvious who should be using them. "LinearMemory", "References", etc. are more self-explanatory.

I do agree to the point that "LLVM is really all about Linear Memory languages anyhow", so it seems almost redundant to use LinearMemory inside LLVM, but perhaps these terms would be useful in a broader context? E.g. maybe other tools like Binaryen could support these configurations.

But with all that said, these are bikeshedding details, and I don't feel strongly here or want to slow things down. Just some thoughts.

@kg
Copy link

kg commented Oct 15, 2024

At risk of bikeshedding: Could you perhaps name them based on what they're useful for? i.e. 'mainstream2024' for features that are mainstream in all the major browsers as of i.e. january 2024. Then developers can look at that and evaluate whether they want to support users on year-old browsers, and that can lead them to pick a previous target or use the baseline wasm1.0 target instead. That would also give a natural way to do more nuanced versioning, instead of i.e. trail1.5 it would just be 'mainstream2024-10' or such.

@sunfishcode
Copy link
Member Author

I'm not necessarily opposed, but such a "mainstream2024" concept would be different from what I'm proposing here. I notice I didn't explain this in my initial post, but my focus for this Trail1 proposal isn't features that are mainstream in browsers, but features that non-browser engines that would otherwise stick with "mvp" could reasonably implement. So Trail1 as I'm currently proposing it excludes reference-types and simd128, even though those features are mainstream in all browsers today.

@workingjubilee
Copy link

Heh, mvp++ then?

One point of information that is worth making explicit: Some people use wasm32 in "embedded, as in MCU, not 'in a bigger program'" scenarios. They are running wasm on a computer with a tiny ISA, whether physical or virtual. These people want to use an existing toolchain with an -mcpu=something that picks a set of features that they can either count on already being implemented in a wasm32 interpreter, or that they can easily add.

They will never, ever care about what browsers currently implement, because for them, WebAssembly is less about the Web and more about the Assembly.

My understanding is that part of the drive here is to define a slightly-more-powerful minimal target than the absolute baseline we get now of -mcpu=mvp, so that code generators don't feel stuck on trying to optimize for the most-barebones-of-barebones wasm32 implementations?

And of course it also provides a target for "this is a fleshed-out wasm32 executor that is still easy to impl but won't suck to use".

@tlively
Copy link
Member

tlively commented Oct 16, 2024

I do think we should find a name that's more descriptive than "trail." I actually like the mvp++ idea. I think it succinctly captures the intent @sunfishcode describes.

I'm a little concerned about complexity due to splitting up features. Would we add new -mfeature flags to selectively enable and disable the subfeatures? I would prefer to err on the side of granularity and encourage engines to implement the full proposals. (For bulk table operations in particular, it would also be fine if we enabled them to avoid dividing features, even if engines don't actually implement them, since they're not emitted under normal circumstances.)

@workingjubilee
Copy link

(For bulk table operations in particular, it would also be fine if we enabled them to avoid dividing features, even if engines don't actually implement them, since they're not emitted under normal circumstances.)

Unfortunately, "not emitted under normal circumstances" is not a hard promise it won't be under all circumstances, and "can be emitted, but only for difficult-to-imagine reasons that are technically allowed by the contract between codegen and engine" would mean it would "only" happen if someone working on the code generator feels Too Clever By Half.

And, well... programmers.

@sunfishcode
Copy link
Member Author

I do think we should find a name that's more descriptive than "trail." I actually like the mvp++ idea. I think it succinctly captures the intent @sunfishcode describes.

It does capture the sense, but I'm not fond of reinforcing "mvp" as the base language, with other features being seen as extras. What would you think about the name "lime" (short for "LInear MEmory", due to its focus on linear-memory languages)? The first version would be "lime1".

I'm a little concerned about complexity due to splitting up features. Would we add new -mfeature flags to selectively enable and disable the subfeatures? I would prefer to err on the side of granularity and encourage engines to implement the full proposals. (For bulk table operations in particular, it would also be fine if we enabled them to avoid dividing features, even if engines don't actually implement them, since they're not emitted under normal circumstances.)

To add to what @workingjubilee said, engine implementors want a fixed list of features that they can target, so they can say "we can run any Trail1 program", so we want to be very explicit about which features are in the set and which are not.

I also don't like splitting up features, and initially resisted it. But I believe the specific splits here really are quite motivated.

  • call_indirect should just support overlong encodings. They're useful to toolchains that don't otherwise use reference types, and they're trivial to implement in engines that don't have reference types.
  • memory.copy and memory.fill are very widely used and don't require engines track additional memory at runtime (eg. passive segments) or support mutable tables.

@kripken
Copy link
Member

kripken commented Oct 16, 2024

@sunfishcode

"lime" (short for "LInear MEmory", due to its focus on linear-memory languages)? The first version would be "lime1".

I like that short name. Thinking of the non-LLVM ecosystem, we can keep other names 4 letters as well:

  • math: 1: nontrapping-float-to-int, 2: simd, 3: relaxed-simd
  • refs: 1: reference-types, 2: exceptions, 3: GC

lime1 as proposed in this PR would include math1 (maybe a future lime2 would include math2, etc.). refs might also include math as well. In theory some modules would want just math and nothing else, but even if not, I think it's nice to have the concept of a "math tree" that we move up along, as we move up along linear memory and/or references.

@dschuff
Copy link
Member

dschuff commented Oct 16, 2024

Thinking ahead:

  • I could imagine someone wanting to implement wide-arithmetic but not SIMD. In terms of tree height, it's more like 1.5.
  • I think funcrefs is refs3 and GC is refs4, unless we want to combine them like every(?) engine has done so far.
  • stack-switching requires reftypes, but does it require GC? Maybe refs5 is fine?
  • memory64 is clearly lime2 (or is lime2 multi-memory and memory64 is lime3?). Threads is in a logical/complexity sense maybe lime4, even though it was standardized before memory64
  • Maybe not relevant to LLVM, but if we wanted, Bigint could be JS1, String builtins JS2 and JSPI JS3? (although... type reflection, assuming it happens, would be ~JS1.5)
  • Where does tail call go? In terms of what languages are likely to want it, maybe on the ref tree, but it doesn't fit cleanly in that taxonomy.

Seems like the overall fit is ok, but not perfect; it reduces the number of combinations, although maybe not by a huge amount unless we also make cross-branch dependencies like lime1 including math1. I'm not sure we can get around at least some conflation of complexity/logical ordering vs when a proposal was standardized.

@programmerjake
Copy link

programmerjake commented Oct 16, 2024

  • memory64 is clearly lime2 (or is lime2 multi-memory and memory64 is lime3?).

I would put memory64 on a separate axis by itself as the target arch: wasm64-lime1-unknown or wasm64-unknown-unknown with -mcpu=lime1

@sunfishcode
Copy link
Member Author

@programmerjake What I'm proposing here is what LLVM calls a "target CPU", which is usually not part of the target triple.

@dschuff memory64 is orthogonal to "lime" because memory64 is part of the architecture ("wasm64").

@kripken @dschuff Those are interesting things to think about, though I also do want to avoid getting too far into the details of which additional sets we might define here because, as you say, there are a lot of choices that we'd need to make, and I don't think we know enough to make all those choices at this time.

@programmerjake
Copy link

@programmerjake What I'm proposing here is what LLVM calls a "target CPU", which is usually not part of the target triple.

yeah, that's why I wrote

or wasm64-unknown-unknown with -mcpu=lime1

@sunfishcode sunfishcode changed the title Propose a new LLVM CPU configuration: Trail1 Propose a new LLVM CPU configuration: Lime1 Oct 22, 2024
@sunfishcode
Copy link
Member Author

I've now renamed "Trail" to "Lime", which is short for "LInear MEmeory".

Looking through the comments again, I realized I didn't answer this question:

Would we add new -mfeature flags to selectively enable and disable the subfeatures?

Yes, my prototype LLVM patch does split out call-indirect-overlong and bulk-memory-opt as separate -mfeature flags.

@sunfishcode
Copy link
Member Author

I believe I've addressed all the objections raised here. I'll give this another week, and if there are no further objections, I propose to merge this.

@sunfishcode
Copy link
Member Author

With no further objections, let's merge this! My next step here will be to update the LLVM PR (llvm/llvm-project#112035), and then mark it as ready for review.

If anyone has any concerns or questions, please file new issues or PRs!

@sunfishcode sunfishcode merged commit 5bf631d into WebAssembly:main Nov 19, 2024
@sunfishcode sunfishcode deleted the sunfishcode/trail1 branch November 19, 2024 00:22
alexrp added a commit to alexrp/zig that referenced this pull request Nov 29, 2024
…embly.

See: WebAssembly/tool-conventions#235

This is not *quite* using the same features as the spec'd lime1 model because
LLVM 19 doesn't have the level of feature granularity that we need for that.
This will be fixed once we upgrade to LLVM 20.

Part of ziglang#21818.
alexrp added a commit to alexrp/zig that referenced this pull request Nov 29, 2024
…embly.

See: WebAssembly/tool-conventions#235

This is not *quite* using the same features as the spec'd lime1 model because
LLVM 19 doesn't have the level of feature granularity that we need for that.
This will be fixed once we upgrade to LLVM 20.

Part of ziglang#21818.
alexrp added a commit to alexrp/zig that referenced this pull request Nov 29, 2024
…embly.

See: WebAssembly/tool-conventions#235

This is not *quite* using the same features as the spec'd lime1 model because
LLVM 19 doesn't have the level of feature granularity that we need for that.
This will be fixed once we upgrade to LLVM 20.

Part of ziglang#21818.
alexrp added a commit to alexrp/zig that referenced this pull request Nov 29, 2024
…embly.

See: WebAssembly/tool-conventions#235

This is not *quite* using the same features as the spec'd lime1 model because
LLVM 19 doesn't have the level of feature granularity that we need for that.
This will be fixed once we upgrade to LLVM 20.

Part of ziglang#21818.
alexrp added a commit to alexrp/zig that referenced this pull request Nov 30, 2024
…embly.

See: WebAssembly/tool-conventions#235

This is not *quite* using the same features as the spec'd lime1 model because
LLVM 19 doesn't have the level of feature granularity that we need for that.
This will be fixed once we upgrade to LLVM 20.

Part of ziglang#21818.
alexrp added a commit to alexrp/zig that referenced this pull request Nov 30, 2024
…embly.

See: WebAssembly/tool-conventions#235

This is not *quite* using the same features as the spec'd lime1 model because
LLVM 19 doesn't have the level of feature granularity that we need for that.
This will be fixed once we upgrade to LLVM 20.

Part of ziglang#21818.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants