Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting rid of implode #470

Closed
tvdstorm opened this issue Dec 30, 2013 · 5 comments
Closed

Getting rid of implode #470

tvdstorm opened this issue Dec 30, 2013 · 5 comments

Comments

@tvdstorm
Copy link
Member

Here's an idea to get rid of implode, with only minor penalty (I think).

Implode transforms parse trees (PTs) to abstract syntax trees (ASTs). To do this, it requires an ADT describing the AST structure. Implode is often complained about and for good reason. The following problems can be observed:

  • Most of the information in that ADT, is also in the grammar.
  • Implode often (mostly) implements a one-to-one mapping, which feels wasteful and superfluous
  • The ADT should use type and constructor names corresponding to the non-terminals and production labels in the grammar, which is brittle.
  • Module sophistry is required to prevent non-terminal types from clashing with ADT types.

To solve these problems, I suggest the following idea. Get rid of implode and the separate ADT. The grammar defines a single type, just like now, but values of the type have three literal representations:

  1. Concrete syntax in backquotes.
  2. Tree notation with appls and prods
  3. Prefix notation like in add(lhs, rhs) (let's call it "Grammar AST" or GAST notation)

The 3rd item is new: it means you can construct "abstract" concrete syntax trees. As a result, the grammar is enough to allow you to program with ASTs. Concrete and GAST representations can be arbitrarily mixed and nested.

This design has one big consequence: whenever a parse tree is serialized, and a GAST sub-tree is encountered, we need to explode the tree. IOW we might have to invent layout and/or separators. Currently there is no restriction on either Layout or separators, so it's not always possible to do that. Consequently, either you'll get a runtime error ("can't invent layout/sep"), or we impose the restriction statically when using a GAST where no layout/sep can be inferred. Note: this is not about inferring intended layout/seps, just syntactically valid layout/seps. If you need to tweak layout, either you don't use GASTs, or you have to write a pretty printer like all AST people do.

Some observations:

  • Concrete trees are equal to concrete trees if they are really equal, i.e. including layout. (Like now)
  • GASTs are never equal to any concrete tree because they don't have layout.
  • Matching works across all three kinds.
  • If there is no production label, there is no GAST representation, but you have to use concrete syntax or appls.
  • There are now values of the same type, which have different in-memory representations.
  • For pattern matching this proposal is already implemented, more or less (?).

Cons:

  • GASTs can't be more flexible than concrete syntax trees. The mapping is really one-to-one.
  • We lose a little wysiwyg: you type add(const(1), const(2)) and you get appl(prod(labeled("add", sort("Exp")), [...]), ...)

Possible issues:

  • We need grammar information for explode (i.e. productions): where to get them?
  • Explode could occur at non-pattern GAST expression evaluation time (i.e. create appl-prod nodes immediately).
  • How to deal with optionals? You could use list notation (like implode currently assumes), but this would break matching I think. Another option is using Maybe, but this suggests Maybe shouldn't be in util anymore.

Feedback welcome!

P.S. Isn't it ironic that one might get rid of implode by introducing explode?

@tvdstorm
Copy link
Member Author

Just realized: GASTs are really the dual of concrete syntax patterns/expressions:

Concrete syntax -> statically insert appl-trees through parsing
GASTs -> statically insert appl-trees through explode.

So I think there's a story here.

@PaulKlint
Copy link
Member

I like the gist of your proposal but we have to work out the details. Complete duality between concrete and abstract views on the same tree is appealing. We also have to work out the efficiency implications: ASTs were supposed to be more efficient. As long as we can do all the required transformations at compile time, we are ok I guess.

We could even generalize a bit further and allow concrete patterns to match abstract trees, provided that the types of the subtrees are "compatible".

@tvdstorm
Copy link
Member Author

We could even generalize a bit further and allow concrete patterns to match abstract trees, provided that the types of the subtrees are "compatible".

I don't like "compatible" in scare-quotes. In my proposal GASTs will match old-skool concrete trees, if that's what you mean. Like so:

add(`1`, `2`) := `1 + 2`

=> True

@jurgenvinju
Copy link
Member

I think it would be a good idea to do this and a challenging one at the
same time.

  • It's an extension of the abstract pattern matching to include its mirror
    image in construction. We could generate a sentence of length 1 for every
    layout non-terminal as a default, if such a sentence exists, or otherwise
    default to the shortest possible sentence for the layout non-terminal. Same
    same for separators in lists. We may also apply the heuristics from
    pandora's pretty printer instead which recognizes typical programming
    language constructs such as comma separated lists, brackets and binary
    operators to infer when a newline is better than a space.
  • abstract pattern matching on concrete trees is also still in its infancy
    in terms of documentation, parametrization and error messaging...
  • Interesting challenges are in the dealings with lists and string literals
    i.e. in call("hello",[a,b]) with syntax Exp = call: Id "(" {Exp ","}* ")", we need to know that [a,b] is an abstract notation for a concrete
    list and this is only provided by the context so has to be inferred by the
    type checker. Also, "hello" should preferable be parsed as Id, which
    again adds more inference capabilities to the type checker and more chance
    of funny error messages...

On Mon, Dec 30, 2013 at 11:28 PM, Tijs van der Storm <
notifications@github.com> wrote:

Here's an idea to get rid of implode, with only minor penalty (I think).

Implode transforms parse trees (PTs) to abstract syntax trees (ASTs). To
do this, it requires an ADT describing the AST structure. Implode is often
complained about and for good reason. The following problems can be
observed:

  • Most of the information in that ADT, is also in the grammar.
  • Implode often (mostly) implements a one-to-one mapping, which feels
    wasteful and superfluous
  • The ADT should use type and constructor names corresponding to the
    non-terminals and production labels in the grammar, which is brittle.
  • Module sophistry is required to prevent non-terminal types from
    clashing with ADT types.

To solve these problems, I suggest the following idea. Get rid of implode
and the separate ADT. The grammar defines a single type, just like now,
but values of the type have three literal representations:

  1. Concrete syntax in backquotes.
  2. Tree notation with appls and prods
  3. Prefix notation like in add(lhs, rhs) (let's call it "Grammar AST"
    or GAST notation)

The 3rd item is new: it means you can construct "abstract" concrete syntax
trees. As a result, the grammar is enough to allow you to program with
ASTs. Concrete and GAST representations can be arbitrarily mixed and
nested.

This design has one big consequence: whenever a parse tree is serialized,
and a GAST sub-tree is encountered, we need to explode the tree. IOW we
might have to invent layout and/or separators. Currently there is no
restriction on either Layout or separators, so it's not always possible to
do that. Consequently, either you'll get a runtime error ("can't invent
layout/sep"), or we impose the restriction statically when using a GAST
where no layout/sep can be inferred. Note: this is not about inferring
intended layout/seps, just syntactically valid layout/seps. If you
need to tweak layout, either you don't use GASTs, or you have to write a
pretty printer like all AST people do.

Some observations:

  • Concrete trees are equal to concrete trees if they are _really_equal, i.e. including layout. (Like now)
  • GASTs are never equal to any concrete tree because they don't have
    layout.
  • Matching works across all three kinds.
  • If there is no production label, there is no GAST representation,
    but you have to use concrete syntax or appls.
  • There are now values of the same type, which have different
    in-memory representations.
  • For pattern matching this proposal is already implemented, more or
    less (?).

Cons:

  • GASTs can't be more flexible than concrete syntax trees. The mapping
    is really one-to-one.
  • We lose a little wysiwyg: "ASTs" that print out as source code.

Possible issues:

  • We need grammar information for explode (i.e. productions): where to
    get them?
  • Explode could occur at unparse-time, or at non-pattern GAST
    expression evaluation time (i.e. create appl-prod nodes immediately).
  • How to deal with optionals? You could use list notation (like
    implode currently assumes), but this would break matching I think. Another
    option is using Maybe, but this suggests Maybe shouldn't be in utilanymore.

Feedback welcome!

P.S. Isn't it ironic that one might get rid of implode by introducing
explode?


Reply to this email directly or view it on GitHubhttps://github.com//issues/470
.

Jurgen Vinju

@swatbot
Copy link

swatbot commented Jan 3, 2014

BTW, the full grammar for every non-terminal is always available using
#nonterminal, so that point is tackled.

On Fri, Jan 3, 2014 at 11:36 AM, Jurgen J. Vinju
notifications@github.comwrote:

I think it would be a good idea to do this and a challenging one at the
same time.

  • It's an extension of the abstract pattern matching to include its mirror
    image in construction. We could generate a sentence of length 1 for every
    layout non-terminal as a default, if such a sentence exists, or otherwise
    default to the shortest possible sentence for the layout non-terminal.
    Same
    same for separators in lists. We may also apply the heuristics from
    pandora's pretty printer instead which recognizes typical programming
    language constructs such as comma separated lists, brackets and binary
    operators to infer when a newline is better than a space.
  • abstract pattern matching on concrete trees is also still in its infancy
    in terms of documentation, parametrization and error messaging...
  • Interesting challenges are in the dealings with lists and string
    literals
    i.e. in call("hello",[a,b]) with syntax Exp = call: Id "(" {Exp ","}* ")", we need to know that [a,b] is an abstract notation for a concrete
    list and this is only provided by the context so has to be inferred by the
    type checker. Also, "hello" should preferable be parsed as Id, which
    again adds more inference capabilities to the type checker and more chance
    of funny error messages...

On Mon, Dec 30, 2013 at 11:28 PM, Tijs van der Storm <
notifications@github.com> wrote:

Here's an idea to get rid of implode, with only minor penalty (I think).

Implode transforms parse trees (PTs) to abstract syntax trees (ASTs). To
do this, it requires an ADT describing the AST structure. Implode is
often
complained about and for good reason. The following problems can be
observed:

  • Most of the information in that ADT, is also in the grammar.
  • Implode often (mostly) implements a one-to-one mapping, which feels
    wasteful and superfluous
  • The ADT should use type and constructor names corresponding to the
    non-terminals and production labels in the grammar, which is brittle.
  • Module sophistry is required to prevent non-terminal types from
    clashing with ADT types.

To solve these problems, I suggest the following idea. Get rid of
implode
and the separate ADT. The grammar defines a single type, just like
now,
but values of the type have three literal representations:

  1. Concrete syntax in backquotes.
  2. Tree notation with appls and prods
  3. Prefix notation like in add(lhs, rhs) (let's call it "Grammar AST"
    or GAST notation)

The 3rd item is new: it means you can construct "abstract" concrete
syntax
trees. As a result, the grammar is enough to allow you to program with
ASTs. Concrete and GAST representations can be arbitrarily mixed and
nested.

This design has one big consequence: whenever a parse tree is
serialized,
and a GAST sub-tree is encountered, we need to explode the tree. IOW we
might have to invent layout and/or separators. Currently there is no
restriction on either Layout or separators, so it's not always possible
to
do that. Consequently, either you'll get a runtime error ("can't invent
layout/sep"), or we impose the restriction statically when using a GAST
where no layout/sep can be inferred. Note: this is not about inferring
intended layout/seps, just syntactically valid layout/seps. If you
need to tweak layout, either you don't use GASTs, or you have to write a
pretty printer like all AST people do.

Some observations:

  • Concrete trees are equal to concrete trees if they are _really_equal,
    i.e. including layout. (Like now)
  • GASTs are never equal to any concrete tree because they don't have
    layout.
  • Matching works across all three kinds.
  • If there is no production label, there is no GAST representation,
    but you have to use concrete syntax or appls.
  • There are now values of the same type, which have different
    in-memory representations.
  • For pattern matching this proposal is already implemented, more or
    less (?).

Cons:

  • GASTs can't be more flexible than concrete syntax trees. The mapping
    is really one-to-one.
  • We lose a little wysiwyg: "ASTs" that print out as source code.

Possible issues:

  • We need grammar information for explode (i.e. productions): where to
    get them?
  • Explode could occur at unparse-time, or at non-pattern GAST
    expression evaluation time (i.e. create appl-prod nodes immediately).
  • How to deal with optionals? You could use list notation (like
    implode currently assumes), but this would break matching I think.
    Another
    option is using Maybe, but this suggests Maybe shouldn't be in
    utilanymore.

Feedback welcome!

P.S. Isn't it ironic that one might get rid of implode by introducing
explode?


Reply to this email directly or view it on GitHub<
https://github.com/cwi-swat/rascal/issues/470>
.

Jurgen Vinju

  • Centrum Wiskunde & Informatica - SWAT
  • INRIA Lille - ATEAMS
  • Universiteit van Amsterdam

www: http://jurgen.vinju.org, http://www.rascal-mpl.org,
http://twitter.com/jurgenvinju
skype: jurgen.vinju

Reply to this email directly or view it on GitHubhttps://github.com//issues/470#issuecomment-31515370
.

Jurgen Vinju

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants