Getting rid of implode #470

tvdstorm · 2013-12-30T22:28:45Z

Here's an idea to get rid of implode, with only minor penalty (I think).

Implode transforms parse trees (PTs) to abstract syntax trees (ASTs). To do this, it requires an ADT describing the AST structure. Implode is often complained about and for good reason. The following problems can be observed:

Most of the information in that ADT, is also in the grammar.
Implode often (mostly) implements a one-to-one mapping, which feels wasteful and superfluous
The ADT should use type and constructor names corresponding to the non-terminals and production labels in the grammar, which is brittle.
Module sophistry is required to prevent non-terminal types from clashing with ADT types.

To solve these problems, I suggest the following idea. Get rid of implode and the separate ADT. The grammar defines a single type, just like now, but values of the type have three literal representations:

Concrete syntax in backquotes.
Tree notation with appls and prods
Prefix notation like in add(lhs, rhs) (let's call it "Grammar AST" or GAST notation)

The 3rd item is new: it means you can construct "abstract" concrete syntax trees. As a result, the grammar is enough to allow you to program with ASTs. Concrete and GAST representations can be arbitrarily mixed and nested.

This design has one big consequence: whenever a parse tree is serialized, and a GAST sub-tree is encountered, we need to explode the tree. IOW we might have to invent layout and/or separators. Currently there is no restriction on either Layout or separators, so it's not always possible to do that. Consequently, either you'll get a runtime error ("can't invent layout/sep"), or we impose the restriction statically when using a GAST where no layout/sep can be inferred. Note: this is not about inferring intended layout/seps, just syntactically valid layout/seps. If you need to tweak layout, either you don't use GASTs, or you have to write a pretty printer like all AST people do.

Some observations:

Concrete trees are equal to concrete trees if they are really equal, i.e. including layout. (Like now)
GASTs are never equal to any concrete tree because they don't have layout.
Matching works across all three kinds.
If there is no production label, there is no GAST representation, but you have to use concrete syntax or appls.
There are now values of the same type, which have different in-memory representations.
For pattern matching this proposal is already implemented, more or less (?).

Cons:

GASTs can't be more flexible than concrete syntax trees. The mapping is really one-to-one.
We lose a little wysiwyg: you type add(const(1), const(2)) and you get appl(prod(labeled("add", sort("Exp")), [...]), ...)

Possible issues:

We need grammar information for explode (i.e. productions): where to get them?
Explode could occur at non-pattern GAST expression evaluation time (i.e. create appl-prod nodes immediately).
How to deal with optionals? You could use list notation (like implode currently assumes), but this would break matching I think. Another option is using Maybe, but this suggests Maybe shouldn't be in util anymore.

Feedback welcome!

P.S. Isn't it ironic that one might get rid of implode by introducing explode?

The text was updated successfully, but these errors were encountered:

tvdstorm · 2013-12-30T22:46:19Z

Just realized: GASTs are really the dual of concrete syntax patterns/expressions:

Concrete syntax -> statically insert appl-trees through parsing
GASTs -> statically insert appl-trees through explode.

So I think there's a story here.

PaulKlint · 2013-12-31T00:15:34Z

I like the gist of your proposal but we have to work out the details. Complete duality between concrete and abstract views on the same tree is appealing. We also have to work out the efficiency implications: ASTs were supposed to be more efficient. As long as we can do all the required transformations at compile time, we are ok I guess.

We could even generalize a bit further and allow concrete patterns to match abstract trees, provided that the types of the subtrees are "compatible".

tvdstorm · 2013-12-31T10:45:48Z

We could even generalize a bit further and allow concrete patterns to match abstract trees, provided that the types of the subtrees are "compatible".

I don't like "compatible" in scare-quotes. In my proposal GASTs will match old-skool concrete trees, if that's what you mean. Like so:

add(`1`, `2`) := `1 + 2`

=> True

jurgenvinju · 2014-01-03T10:36:32Z

I think it would be a good idea to do this and a challenging one at the
same time.

It's an extension of the abstract pattern matching to include its mirror
image in construction. We could generate a sentence of length 1 for every
layout non-terminal as a default, if such a sentence exists, or otherwise
default to the shortest possible sentence for the layout non-terminal. Same
same for separators in lists. We may also apply the heuristics from
pandora's pretty printer instead which recognizes typical programming
language constructs such as comma separated lists, brackets and binary
operators to infer when a newline is better than a space.
abstract pattern matching on concrete trees is also still in its infancy
in terms of documentation, parametrization and error messaging...
Interesting challenges are in the dealings with lists and string literals
i.e. in call("hello",[a,b]) with syntax Exp = call: Id "(" {Exp ","}* ")", we need to know that [a,b] is an abstract notation for a concrete
list and this is only provided by the context so has to be inferred by the
type checker. Also, "hello" should preferable be parsed as Id, which
again adds more inference capabilities to the type checker and more chance
of funny error messages...

On Mon, Dec 30, 2013 at 11:28 PM, Tijs van der Storm <
notifications@github.com> wrote:

Here's an idea to get rid of implode, with only minor penalty (I think).

Implode transforms parse trees (PTs) to abstract syntax trees (ASTs). To
do this, it requires an ADT describing the AST structure. Implode is often
complained about and for good reason. The following problems can be
observed:

Most of the information in that ADT, is also in the grammar.

Implode often (mostly) implements a one-to-one mapping, which feels
wasteful and superfluous

The ADT should use type and constructor names corresponding to the
non-terminals and production labels in the grammar, which is brittle.

Module sophistry is required to prevent non-terminal types from
clashing with ADT types.

To solve these problems, I suggest the following idea. Get rid of implode
and the separate ADT. The grammar defines a single type, just like now,
but values of the type have three literal representations:

Concrete syntax in backquotes.

Tree notation with appls and prods

Prefix notation like in add(lhs, rhs) (let's call it "Grammar AST"
or GAST notation)

The 3rd item is new: it means you can construct "abstract" concrete syntax
trees. As a result, the grammar is enough to allow you to program with
ASTs. Concrete and GAST representations can be arbitrarily mixed and
nested.

This design has one big consequence: whenever a parse tree is serialized,
and a GAST sub-tree is encountered, we need to explode the tree. IOW we
might have to invent layout and/or separators. Currently there is no
restriction on either Layout or separators, so it's not always possible to
do that. Consequently, either you'll get a runtime error ("can't invent
layout/sep"), or we impose the restriction statically when using a GAST
where no layout/sep can be inferred. Note: this is not about inferring
intended layout/seps, just syntactically valid layout/seps. If you
need to tweak layout, either you don't use GASTs, or you have to write a
pretty printer like all AST people do.

Some observations:

Concrete trees are equal to concrete trees if they are _really_equal, i.e. including layout. (Like now)

GASTs are never equal to any concrete tree because they don't have
layout.

Matching works across all three kinds.

If there is no production label, there is no GAST representation,
but you have to use concrete syntax or appls.

There are now values of the same type, which have different
in-memory representations.

For pattern matching this proposal is already implemented, more or
less (?).

Cons:

GASTs can't be more flexible than concrete syntax trees. The mapping
is really one-to-one.

We lose a little wysiwyg: "ASTs" that print out as source code.

Possible issues:

We need grammar information for explode (i.e. productions): where to
get them?

Explode could occur at unparse-time, or at non-pattern GAST
expression evaluation time (i.e. create appl-prod nodes immediately).

How to deal with optionals? You could use list notation (like
implode currently assumes), but this would break matching I think. Another
option is using Maybe, but this suggests Maybe shouldn't be in utilanymore.

Feedback welcome!

P.S. Isn't it ironic that one might get rid of implode by introducing
explode?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/470
.

Jurgen Vinju

Centrum Wiskunde & Informatica - SWAT
INRIA Lille - ATEAMS
Universiteit van Amsterdam

www: http://jurgen.vinju.org, http://www.rascal-mpl.org,
http://twitter.com/jurgenvinju
skype: jurgen.vinju

swatbot · 2014-01-03T10:41:07Z

BTW, the full grammar for every non-terminal is always available using
#nonterminal, so that point is tackled.

On Fri, Jan 3, 2014 at 11:36 AM, Jurgen J. Vinju
notifications@github.comwrote:

I think it would be a good idea to do this and a challenging one at the
same time.

It's an extension of the abstract pattern matching to include its mirror
image in construction. We could generate a sentence of length 1 for every
layout non-terminal as a default, if such a sentence exists, or otherwise
default to the shortest possible sentence for the layout non-terminal.
Same
same for separators in lists. We may also apply the heuristics from
pandora's pretty printer instead which recognizes typical programming
language constructs such as comma separated lists, brackets and binary
operators to infer when a newline is better than a space.

abstract pattern matching on concrete trees is also still in its infancy
in terms of documentation, parametrization and error messaging...

Interesting challenges are in the dealings with lists and string
literals
i.e. in call("hello",[a,b]) with syntax Exp = call: Id "(" {Exp ","}* ")", we need to know that [a,b] is an abstract notation for a concrete
list and this is only provided by the context so has to be inferred by the
type checker. Also, "hello" should preferable be parsed as Id, which
again adds more inference capabilities to the type checker and more chance
of funny error messages...

On Mon, Dec 30, 2013 at 11:28 PM, Tijs van der Storm <
notifications@github.com> wrote:

Here's an idea to get rid of implode, with only minor penalty (I think).

Implode transforms parse trees (PTs) to abstract syntax trees (ASTs). To
do this, it requires an ADT describing the AST structure. Implode is
often
complained about and for good reason. The following problems can be
observed:

Most of the information in that ADT, is also in the grammar.

Implode often (mostly) implements a one-to-one mapping, which feels
wasteful and superfluous

The ADT should use type and constructor names corresponding to the
non-terminals and production labels in the grammar, which is brittle.

Module sophistry is required to prevent non-terminal types from
clashing with ADT types.

To solve these problems, I suggest the following idea. Get rid of
implode
and the separate ADT. The grammar defines a single type, just like
now,
but values of the type have three literal representations:

Concrete syntax in backquotes.

Tree notation with appls and prods

Prefix notation like in add(lhs, rhs) (let's call it "Grammar AST"
or GAST notation)

The 3rd item is new: it means you can construct "abstract" concrete
syntax
trees. As a result, the grammar is enough to allow you to program with
ASTs. Concrete and GAST representations can be arbitrarily mixed and
nested.

This design has one big consequence: whenever a parse tree is
serialized,
and a GAST sub-tree is encountered, we need to explode the tree. IOW we
might have to invent layout and/or separators. Currently there is no
restriction on either Layout or separators, so it's not always possible
to
do that. Consequently, either you'll get a runtime error ("can't invent
layout/sep"), or we impose the restriction statically when using a GAST
where no layout/sep can be inferred. Note: this is not about inferring
intended layout/seps, just syntactically valid layout/seps. If you
need to tweak layout, either you don't use GASTs, or you have to write a
pretty printer like all AST people do.

Some observations:

Concrete trees are equal to concrete trees if they are _really_equal,
i.e. including layout. (Like now)

GASTs are never equal to any concrete tree because they don't have
layout.

Matching works across all three kinds.

If there is no production label, there is no GAST representation,
but you have to use concrete syntax or appls.

There are now values of the same type, which have different
in-memory representations.

For pattern matching this proposal is already implemented, more or
less (?).

Cons:

GASTs can't be more flexible than concrete syntax trees. The mapping
is really one-to-one.

We lose a little wysiwyg: "ASTs" that print out as source code.

Possible issues:

We need grammar information for explode (i.e. productions): where to
get them?

Explode could occur at unparse-time, or at non-pattern GAST
expression evaluation time (i.e. create appl-prod nodes immediately).

How to deal with optionals? You could use list notation (like
implode currently assumes), but this would break matching I think.
Another
option is using Maybe, but this suggests Maybe shouldn't be in
utilanymore.

Feedback welcome!

P.S. Isn't it ironic that one might get rid of implode by introducing
explode?

—
Reply to this email directly or view it on GitHub<
https://github.com/cwi-swat/rascal/issues/470>
.

Jurgen Vinju

Centrum Wiskunde & Informatica - SWAT

INRIA Lille - ATEAMS

Universiteit van Amsterdam

www: http://jurgen.vinju.org, http://www.rascal-mpl.org,
http://twitter.com/jurgenvinju
skype: jurgen.vinju

—

Reply to this email directly or view it on GitHubhttps://github.com//issues/470#issuecomment-31515370
.

Jurgen Vinju

Centrum Wiskunde & Informatica - SWAT
INRIA Lille - ATEAMS
Universiteit van Amsterdam

www: http://jurgen.vinju.org, http://www.rascal-mpl.org,
http://twitter.com/jurgenvinju
skype: jurgen.vinju

PaulKlint added future-enhancement and removed enhancement labels Mar 15, 2016

PaulKlint closed this as completed Mar 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting rid of implode #470

Getting rid of implode #470

tvdstorm commented Dec 30, 2013

tvdstorm commented Dec 30, 2013

PaulKlint commented Dec 31, 2013

tvdstorm commented Dec 31, 2013

jurgenvinju commented Jan 3, 2014

swatbot commented Jan 3, 2014

Getting rid of implode #470

Getting rid of implode #470

Comments

tvdstorm commented Dec 30, 2013

tvdstorm commented Dec 30, 2013

PaulKlint commented Dec 31, 2013

tvdstorm commented Dec 31, 2013

jurgenvinju commented Jan 3, 2014

swatbot commented Jan 3, 2014