-
Notifications
You must be signed in to change notification settings - Fork 24
PROPOSAL: "frame" as a better alternative to pick/let #41
Comments
Agreed this sounds like a good idea, it matches nicely with the way WebAssembly already uses the stack and locals. When you say it's different to
This is unlike other WebAssembly block functions, which don't tear down their stack implicitly. I think it would be more natural for this to require a
The |
I generally do not see the advantage of this over Compare this hypothetical example:
|
Is it? In many ways it fits nicely with what engines already have to do with block parameters in the multi-value proposal. The difference is that now those values can also be accessed by
See some of the discussion here, in particular @rossberg's comment:
|
If wasm had "pick" from the start in MVP v1.0 (and left fn parameters on stack rather than mapping them into locals), then I agree my "frame" proposal wouldn't be as useful as "pick". I don't think getlocal/setlocal will be deprecated any time soon though... For better or worse, wasm today is a "hybrid" stack machine that is [stack]+[local vars]+[params in locals]. It manages this complexity by adhering strictly to stack as top-only access, ie: the wasm stack is only push/pop and doesn't try to do random access. IMO, "pick" will blur the conceptual distinction between stack vs local vars, ie: both will have random access. Also, what the "pick" example above doesn't show is after blocks/funcs returning multi-values, extra "drop" instructions may be needed to remove left over stack values. (There is likely to be at least one left over value, as otherwise pick wouldn't be needed, to duplicate values into correct stack order). In addition to "pick", a "swap" will also be needed to restructure the stack sequence to comply with the end of block stack signature.... once the extra swaps/drops are all added, is there really any code density benefit with providing "pick"? IMO, we need to be very clear about this before we throw out the conceptual "purity" of wasm's stack (of not providing random access to the stack). Looking at @rossberg 's proposal for "let", it is similar as what I propose for frame, but goes an additional step of remapping outer locals into local #[orig local idx + N], where N is the locals declared by an inner let. For CPU's with plentiful registers, the wasm "stack" is really a CPU register "window" scheme, ie: a way of directing which CPU register the next wasm arithmetic instruction will operate on. (Local vars deeper inside the stack are then candidates for a "simplistic" JIT to spill into RAM). What both "frame" and "let" are doing (which "pick" doesn't do) is indicating upfront to a simplistic JIT compiler to pin down & not to spill certain stack slots for the lifetime of the block. However, "let" doesn't convey as much information on what local vars are safe to spill to RAM (as any outer local var can still be accessed inside), whereas "frame" is very explicit in telling the JIT upfront which local vars to keep inside CPU registers. (Effectively the outer local var "push" needed by "frame" is like the "register" keyword in C, it's an advisory hint to the JIT to keep the var in a CPU register... if these hints are not useful to real JIT's, as opposed to a simplistic JIT, then Rossberg's proposal is superior to my proposal - as it has better encoding density). Also, my own envisaged use case for "frame" is that it will process/re-sequence a set of multi-values, which then will be handed over to the outer block to complete processing on... I am guessing in most cases, the processing done inside "frame" is likely to be small and simple and will not need to access many outer vars, but it would be interesting to see data on this. |
Hi, I can't resist thinking more about the "pick" vs "let" duality, and whether there is a case to have both pick & let. I remain a proponent of providing "let" and not providing "pick" (I've already more or less conceded above that @rossberg 's "let" is superior to my "frame"). However, here is a suggestion: provide "let" and "dup", ie: pick #0 - but not a generalised pick. dup is a top-of-stack only operation, so it doesn't change the existing wasm model - that is, the wasm stack is limited to non-random top of stack ops, and local vars are to be the mechanism for random access. There is possibly even a case for providing a "swap #1" - ie: limited to top of stack, so only swapping top two slots of the stack. Limiting dup and swap to only the top of stack means we don't need to encode an immediate argument for these instructions, so they will only be 1 byte rather than 2+ bytes. This is a win for code density - as well as the keeping the stack conceptually pure, by not providing random access to deeper slots in the stack. I suspect the 80/20 law applies to this case - ie: 80% of the use cases of a hypothetical generalised pick/swap are handled by a more simple dup/swap, without immediate arguments. The other more complex 20% are then better addressed by "let" (making generalised pick/swap unnecessary). |
I think this is an interesting idea, but it would be much more compelling with data. |
Java VM and CIL data might be useful here. Java VM appears to have a number of variants of dup. These insert the top value of the stack into slots 2-3 levels down. It also has swap, but no pick (anyone with more detail JVM knowledge correct me if I am wrong). Hence, profiling instruction counts of the "dup" variants could give a frequency histogram of stack manipulation by slot dept (0 vs 1 vs 2 vs 3) but this would be a "reverse" of the pick data we want (so not perfect) CIL has dup, but does not seem to have swap or pick (again, anyone with detailed CIL knowledge correct me if wrong). So CIL won't be helpful in giving data on need for pick beyond swap, although presumably the incidence of multiple pops followed by dup might be very approximate hint on this. If there is interest in the JVM instruction count data, then I could do some profiling of some java classes to get the dup_XX instruction counts. |
Sounds potentially useful, but I'm note quite sure how to translate that info into wasm binary savings. I guess if we knew how common (local $new i32)
...
(local.tee $new)
(local.get $new) The new local is practically free (just requires incrementing the local count for that value type, so requires an extra byte only at 27n boundary). The |
Instead of providing "pick" or "let" instructions, I propose providing a "frame" block instruction (effectively providing frame-pointer like access to the wasm stack, similar to what happens for the "call" instruction), eg:
instead of:
i32.const 100 ;; push some values onto stack
i32.const 200
i32.const 300
pick 2 ;; result is 100
you would have:
i32.const 100 ;; push some values onto stack like an inline function call
i32.const 200
i32.const 300
frame param(i32 i32 i32) result(i32)
;; above 3 stack values are now locals 0,1,2 (equivalent to a call into an inlined function)
getlocal 0 ;; result is 100
end
;; stack frame is torn down, and all associated param values (100,200,300) are dropped from stack
;; (equivalent to return from an inline function)
Advantages over "pick" and "let":
Disadvantages:
(however, no additional instruction count, if "frame"/"end" is replacing a "block"/"end" instruction pair that needs to be there anyway)
(also, if used after a multi-value return, "pick" does not remove the returned values and hence needs multiple "drop" instructions after processing of return values is completed - whereas "frame"/"end" will tear down the stack values supplied as input into "frame", without any additional instructions)
The text was updated successfully, but these errors were encountered: