New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack machine semantics #323

Merged
merged 51 commits into from Sep 9, 2016

Conversation

Projects
None yet
6 participants
@rossberg
Member

rossberg commented Aug 24, 2016

This implements the stack machine semantics for Wasm:

  • Extends syntax to allow raw instruction sequences. Prior expression forms remain as syntax macros. See README for syntax summary.
  • Changes AST to consist of instruction sequences. Eliminate the kernel/AST distinction and corresponding desugaring.
  • Reimplements type checking in terms of that. Skips dead code (for now).
  • Reimplements evaluation as a small-step rewriting semantics over instruction sequences plus administrative forms.
  • Adjust & extend tests (though we should add more).
  • Various clean-ups on the way.

I apologise for the monolithic PR. Breaking the move over to a stack machine semantics into meaningful smaller steps is kind of impossible.

The main downside of this change is that evaluation got much slower, because it is now implemented by term rewriting, literally modelling a small-step reduction semantics as on paper. The upside is that this makes it suitable for explicitly modelling threads later on.

Show outdated Hide outdated ml-proto/spec/check.ml
"start function must be nullary";
require (start_type.out = None) x.at
"start function must not return anything";
require (func c x = FuncType ([], [])) x.at "start function must be nullary"

This comment has been minimized.

@sunfishcode

sunfishcode Aug 29, 2016

Member

This change causes the interpreter to say "start function must be nullary" when given a nullary function that has a return value, which is confusing.

@sunfishcode

sunfishcode Aug 29, 2016

Member

This change causes the interpreter to say "start function must be nullary" when given a nullary function that has a return value, which is confusing.

This comment has been minimized.

@rossberg

rossberg Aug 30, 2016

Member

Fixed.

@rossberg

rossberg Aug 30, 2016

Member

Fixed.

@rossberg

This comment has been minimized.

Show comment
Hide comment
@rossberg

rossberg Sep 2, 2016

Member

The raw stack machine syntax I have implemented in this patch does not require parens around individual instructions. I notice that many (most?) people discussing stack machine examples keep bracketing them anyway. I also got explicit comments from people saying they actually prefer the bracketed form (which is legal in the format as sugar, but not required).

And I agree that not bracketing instructions is at odds with the rest of the .wast format, especially since it extends to operators with immediates:

i32.const 0 i32.const 1 i32.add

is currently legal. Removing this would make the format more coherent and less baroque. You would have to write:

(i32.const 0) (i32.const 1) (i32.add)

Would anybody be opposed to require proper parenthesising for instructions like for all other elements of the syntax?

Member

rossberg commented Sep 2, 2016

The raw stack machine syntax I have implemented in this patch does not require parens around individual instructions. I notice that many (most?) people discussing stack machine examples keep bracketing them anyway. I also got explicit comments from people saying they actually prefer the bracketed form (which is legal in the format as sugar, but not required).

And I agree that not bracketing instructions is at odds with the rest of the .wast format, especially since it extends to operators with immediates:

i32.const 0 i32.const 1 i32.add

is currently legal. Removing this would make the format more coherent and less baroque. You would have to write:

(i32.const 0) (i32.const 1) (i32.add)

Would anybody be opposed to require proper parenthesising for instructions like for all other elements of the syntax?

@drom

This comment has been minimized.

Show comment
Hide comment
@drom

drom Sep 2, 2016

@rossberg-chromium with stack machine, parenthesis don't carry any function anymore. Require putting them around every instruction sounds like unnecessary decoration. New text stack format better off without parenthesis at all. Mixing two styles in the same file format confuses people.

drom commented Sep 2, 2016

@rossberg-chromium with stack machine, parenthesis don't carry any function anymore. Require putting them around every instruction sounds like unnecessary decoration. New text stack format better off without parenthesis at all. Mixing two styles in the same file format confuses people.

@rossberg

This comment has been minimized.

Show comment
Hide comment
@rossberg

rossberg Sep 2, 2016

Member

@drom, mixing two styles in one format is exactly what is happening currently, what people have complained about, and which I'm suggesting to get rid of.

Member

rossberg commented Sep 2, 2016

@drom, mixing two styles in one format is exactly what is happening currently, what people have complained about, and which I'm suggesting to get rid of.

@AndrewScheidecker

This comment has been minimized.

Show comment
Hide comment
@AndrewScheidecker

AndrewScheidecker Sep 2, 2016

Contributor

The goal for the text format in the MVP seems to be a "linear opcode" format to mirror the binary format. This PR makes fundamental changes to the expression syntax, but still includes some form of it. Is this change an incremental step toward removing expressions, or will expressions remain in WAST?

Contributor

AndrewScheidecker commented Sep 2, 2016

The goal for the text format in the MVP seems to be a "linear opcode" format to mirror the binary format. This PR makes fundamental changes to the expression syntax, but still includes some form of it. Is this change an incremental step toward removing expressions, or will expressions remain in WAST?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Sep 2, 2016

@AndrewScheidecker Mozilla appear to have committed to continuing to explore a structured text format, see WebAssembly/design#704 (comment)

There is a difference between not standardizing the text format for the MVP which I can understand, and ignoring the use case of a structure text format which I hope will not pass.

This PR has a format for testing purposes, so does it really need to be representative?

ghost commented Sep 2, 2016

@AndrewScheidecker Mozilla appear to have committed to continuing to explore a structured text format, see WebAssembly/design#704 (comment)

There is a difference between not standardizing the text format for the MVP which I can understand, and ignoring the use case of a structure text format which I hope will not pass.

This PR has a format for testing purposes, so does it really need to be representative?

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Sep 2, 2016

Member

@AndrewScheidecker: the decision as mentioned in your link is to ship the wasm MVP with a linear list. The spec repo's s-expression language will be a superset of that, but not part of the wasm spec, and not shown in browsers. However, after the MVP experimentation might lead to further developments and possible spec additions.

Member

kripken commented Sep 2, 2016

@AndrewScheidecker: the decision as mentioned in your link is to ship the wasm MVP with a linear list. The spec repo's s-expression language will be a superset of that, but not part of the wasm spec, and not shown in browsers. However, after the MVP experimentation might lead to further developments and possible spec additions.

@AndrewScheidecker

This comment has been minimized.

Show comment
Hide comment
@AndrewScheidecker

AndrewScheidecker Sep 2, 2016

Contributor

It makes sense for the ml-proto text format to include non-standard functionality for the test suite. But some subset of a future state of the ml-proto text format will be the MVP standard text format, right? Will that subset include expression trees in addition to linear operator sequences, or will expression trees be a non-standard part of the ml-proto syntax?

Contributor

AndrewScheidecker commented Sep 2, 2016

It makes sense for the ml-proto text format to include non-standard functionality for the test suite. But some subset of a future state of the ml-proto text format will be the MVP standard text format, right? Will that subset include expression trees in addition to linear operator sequences, or will expression trees be a non-standard part of the ml-proto syntax?

@eholk

This comment has been minimized.

Show comment
Hide comment
@eholk

eholk Sep 2, 2016

I'm in favor of requiring bracketing. In the example @rossberg-chromium gave the bracketed version seems more readable to me (although some more newlines might make the non-bracketed version look better). The parenthesis make it clear which things are meant to be taken as a unit, which I think will be especially helpful for instructions that take a number of optional arguments.

eholk commented Sep 2, 2016

I'm in favor of requiring bracketing. In the example @rossberg-chromium gave the bracketed version seems more readable to me (although some more newlines might make the non-bracketed version look better). The parenthesis make it clear which things are meant to be taken as a unit, which I think will be especially helpful for instructions that take a number of optional arguments.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Sep 2, 2016

Member

@AndrewScheidecker: The MVP will only standardize the linear part. In other words, expression trees will be a non-standard thing used in the spec repo.

Member

kripken commented Sep 2, 2016

@AndrewScheidecker: The MVP will only standardize the linear part. In other words, expression trees will be a non-standard thing used in the spec repo.

@AndrewScheidecker

This comment has been minimized.

Show comment
Hide comment
@AndrewScheidecker

AndrewScheidecker Sep 2, 2016

Contributor

Thank you, that's what I was trying to figure out.

A big part of the value of ml-proto to other implementations right now is its test suite. If everybody implements the non-standard expression tree syntax to access the test suite, it will be de facto standard. So IMO ml-proto should either remove expression trees (doesn't need to happen all at once), or the standard should include expression trees.

Contributor

AndrewScheidecker commented Sep 2, 2016

Thank you, that's what I was trying to figure out.

A big part of the value of ml-proto to other implementations right now is its test suite. If everybody implements the non-standard expression tree syntax to access the test suite, it will be de facto standard. So IMO ml-proto should either remove expression trees (doesn't need to happen all at once), or the standard should include expression trees.

@sunfishcode

This comment has been minimized.

Show comment
Hide comment
@sunfishcode

sunfishcode Sep 2, 2016

Member

It seems with the recent change, code like this:

    i32.const 3
    (i32.add (i32.const 4))

is now accepted. I understand why this makes the text format simpler, but it's also surprising. I wouldn't be comfortable with a language that visually looks like it has one structure, but actually has a very different structure, spreading, which is plausible here.

Member

sunfishcode commented Sep 2, 2016

It seems with the recent change, code like this:

    i32.const 3
    (i32.add (i32.const 4))

is now accepted. I understand why this makes the text format simpler, but it's also surprising. I wouldn't be comfortable with a language that visually looks like it has one structure, but actually has a very different structure, spreading, which is plausible here.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Sep 2, 2016

@kripken I did not see a decision to 'standardize the linear part' in the MVP rather to not have a standard in this area for the MVP. I would not object to also having a linear presentation of the code.

The core issue is surely keeping the design consistent with the use case of a structured presentation.

I don't think it matters if a linear code or some macros are used for testing, even if this becomes a de-facto standard format, this is a small matter.

ghost commented Sep 2, 2016

@kripken I did not see a decision to 'standardize the linear part' in the MVP rather to not have a standard in this area for the MVP. I would not object to also having a linear presentation of the code.

The core issue is surely keeping the design consistent with the use case of a structured presentation.

I don't think it matters if a linear code or some macros are used for testing, even if this becomes a de-facto standard format, this is a small matter.

@AndrewScheidecker

This comment has been minimized.

Show comment
Hide comment
@AndrewScheidecker

AndrewScheidecker Sep 8, 2016

Contributor

Since it sounds like the expression tree syntax will stay in ml-proto, it would be nice if there was a stronger separation between it and the stack machine syntax. For example:

  • An operator expression is always parenthesized, and expects a fixed number of subexpressions of a type defined by its context (as ml-proto has been to this point).

  • A sequence expression is a semi-colon separated sequence of subexpressions:

    ((nop); (i32.add (i32.const 3) (i32.const 4)))

    It's a little bit more flexible than the existing implicit sequences, since it allows the result of the sequence to be pushed by any subexpression within it, not just the final subexpression:

    ((nop); (i32.add (i32.const 3) (i32.const 4)); (nop))

  • A stack expression is a non-parenthesized opcode along with only its immediate operands. It may only occur as part of a sequence expression, and must only pop operands that are pushed within the same sequence. @sunfishcode's example above would be:

    (i32.const 3; i32.const 4; i32.add)

  • A sequence expression can embed a sequence anywhere as a subexpression:

    (i32.eqz (i32.const 3; i32.const 4; i32.add))

  • Function bodies, loop bodies, block bodies, and if clauses are implicit sequences:

    (loop $break $loop; (i32.eqz (get_local $num)); br_if $break; ...; br $loop)

I think a syntax like that would be a nice incremental step that exposes the additional flexibility of the stack machine, but without crossing wires in our intuition about the expression syntax: implicit stack operands don't cross parentheses.

Contributor

AndrewScheidecker commented Sep 8, 2016

Since it sounds like the expression tree syntax will stay in ml-proto, it would be nice if there was a stronger separation between it and the stack machine syntax. For example:

  • An operator expression is always parenthesized, and expects a fixed number of subexpressions of a type defined by its context (as ml-proto has been to this point).

  • A sequence expression is a semi-colon separated sequence of subexpressions:

    ((nop); (i32.add (i32.const 3) (i32.const 4)))

    It's a little bit more flexible than the existing implicit sequences, since it allows the result of the sequence to be pushed by any subexpression within it, not just the final subexpression:

    ((nop); (i32.add (i32.const 3) (i32.const 4)); (nop))

  • A stack expression is a non-parenthesized opcode along with only its immediate operands. It may only occur as part of a sequence expression, and must only pop operands that are pushed within the same sequence. @sunfishcode's example above would be:

    (i32.const 3; i32.const 4; i32.add)

  • A sequence expression can embed a sequence anywhere as a subexpression:

    (i32.eqz (i32.const 3; i32.const 4; i32.add))

  • Function bodies, loop bodies, block bodies, and if clauses are implicit sequences:

    (loop $break $loop; (i32.eqz (get_local $num)); br_if $break; ...; br $loop)

I think a syntax like that would be a nice incremental step that exposes the additional flexibility of the stack machine, but without crossing wires in our intuition about the expression syntax: implicit stack operands don't cross parentheses.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Sep 9, 2016

@AndrewScheidecker A semi-colon is not generally used like that in an s-exp, does not fit the representation. For a stronger separation a stack operator could be added to the s-exp format (not an opcode). For example:

(i32.eqz (stack (i32.const 3) (i32.const 4) (i32.add)))

It raises the problem of what to do if there are more or less values than expected by the consumer of the stack operator - probably just a syntax error.

When pick is added it will help to be able to name the stack values, and that is not pretty in the linear stack code. I'd just leave it to @rossberg-chromium for now.

ghost commented Sep 9, 2016

@AndrewScheidecker A semi-colon is not generally used like that in an s-exp, does not fit the representation. For a stronger separation a stack operator could be added to the s-exp format (not an opcode). For example:

(i32.eqz (stack (i32.const 3) (i32.const 4) (i32.add)))

It raises the problem of what to do if there are more or less values than expected by the consumer of the stack operator - probably just a syntax error.

When pick is added it will help to be able to name the stack values, and that is not pretty in the linear stack code. I'd just leave it to @rossberg-chromium for now.

@rossberg rossberg merged commit a7a5fef into binary-0xc Sep 9, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@AndrewScheidecker

This comment has been minimized.

Show comment
Hide comment
@AndrewScheidecker

AndrewScheidecker Sep 9, 2016

Contributor

A semi-colon is not generally used like that in an s-exp, does not fit the representation.

I agree, but the format already departs from a standard S-expression syntax. Previously the departure was pretty small: e.g. align=<x>, but as of the merge of this PR the syntax now allows unparenthesized "instructions":

i32.const 3 i32.const 4 i32.add

For a stronger separation a stack operator could be added to the s-exp format (not an opcode). For example:
(i32.eqz (stack (i32.const 3) (i32.const 4) (i32.add)))

That would still not distinguish between an operator expression and an "linear operator". (i32.add) could be a malformed expression or a linear operator that consumes its operands from the results of preceding operators, and you have to look at the context to figure out how to interpret it. I don't care if it's done exactly as I showed (with parentheses), but I do think there should be a more clear distinction than whether it's missing operands.

Contributor

AndrewScheidecker commented Sep 9, 2016

A semi-colon is not generally used like that in an s-exp, does not fit the representation.

I agree, but the format already departs from a standard S-expression syntax. Previously the departure was pretty small: e.g. align=<x>, but as of the merge of this PR the syntax now allows unparenthesized "instructions":

i32.const 3 i32.const 4 i32.add

For a stronger separation a stack operator could be added to the s-exp format (not an opcode). For example:
(i32.eqz (stack (i32.const 3) (i32.const 4) (i32.add)))

That would still not distinguish between an operator expression and an "linear operator". (i32.add) could be a malformed expression or a linear operator that consumes its operands from the results of preceding operators, and you have to look at the context to figure out how to interpret it. I don't care if it's done exactly as I showed (with parentheses), but I do think there should be a more clear distinction than whether it's missing operands.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Sep 9, 2016

@AndrewScheidecker That would still not distinguish between an operator expression and an "linear operator".

Could add another operator expression to flip back, or have operators with only immediate arguments be interpreted as popping all arguments, but what's the point, it's never going to be a nice format to use except for testing.

ghost commented Sep 9, 2016

@AndrewScheidecker That would still not distinguish between an operator expression and an "linear operator".

Could add another operator expression to flip back, or have operators with only immediate arguments be interpreted as popping all arguments, but what's the point, it's never going to be a nice format to use except for testing.

@rossberg rossberg deleted the stack branch May 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment