Consider making location explicit. #691

ioquatix · 2020-05-17T03:01:14Z

I've been using this gem quite a bit recently.

I'm also looking at parsing C/C++ using clang into the same format, with the goal of providing a unified index for symbol lookup (as much as is possible).

In practice, it seems to me that an AST is (almost) always tied to some kind of source "location". I know there are cases where an AST is completely detached from actual code, but that is not, in my experience, the most common use case.

Have you considered whether it would make sense to expose a generic Location abstraction in the AST gem, that would cover multiple use cases?

In more concrete terms, making an interface for:

class AST::Node
  attr :location
end

# Introduce:

class AST::Location
  # Generic handling of source code, e.g. getting textual value, etc.
end

When I come to implementing a parser for C++ using the AST gem, I'm either going to have to roll my own location class with the same interface as the parser gem, or, preferably, follow some standard set out by the AST gem. The benefit would make it easier for converting an AST back into source code.

Essentially, what I'm proposing, is whether we can move a very light amount of location interface abstractions into the AST gem, such that parsing other languages could expose a shared interface for accessing location details.

The text was updated successfully, but these errors were encountered:

ioquatix · 2020-05-17T03:06:40Z

In particular node.location.expression.source seems like something that could easily be useful across multiple languages, at least at the interface level.

ioquatix · 2020-05-17T04:04:56Z

Sorry, one other point that might make sense, is if this gem is the canonical source for parsing in Ruby, what about making parser-ruby and parser-clang etc, and using this base gem to share data structures?

iliabylich · 2020-05-18T15:12:38Z

Sorry for a late response.

it seems to me that an AST is (almost) always tied to some kind of source "location"

For me AST node represents the intent of some piece of code. If you want to do some static analysis you can also attach locations to nodes, but it's not necessary. Well, they are required to build backtraces, and so to build let's say an AST interpreter (like old MRI) you need them.

Have you considered whether it would make sense to expose a generic Location abstraction in the AST gem

AST gem handles something like s-expressions that are not aware of any programming language and source files. Parser gem implements a subclass that adds location field - https://github.com/whitequark/parser/blob/master/lib/parser/ast/node.rb#L18.

Also, many nodes are more complicated than just plain literals (numbers, strings etc). For example, take a look at the def node:

$ bin/ruby-parse -Le 'def m(a); 42; end'
s(:def, :m,
  s(:args,
    s(:arg, :a)),
  s(:int, 42))
def m(a); 42; end
~~~ keyword
    ~ name    ~~~ end
~~~~~~~~~~~~~~~~~ expression

There's a location of the def keyword, of the method name and of the end keyword. Currently AST node has one location that encapsulates multiple ranges.

Do you propose having one top-level location with one range? Should we encapsulate other locations inside it? Like def_node.location.def or def_node.location.end? Sounds like a breaking change to me.

and using this base gem to share data structures?

I don't like this idea because it introduces a hard dependency between 2 libraries that don't really know anything about each other (they have the same dependency with data structures, but that's the only thing they have in common). How about making an adapter that converts AST nodes to a single unified format?

ioquatix · 2020-05-18T15:29:47Z

I'm not sure this is a good idea but my original thoughts were that the vast majority of AST nodes will be attached to a source location and some kind of source representation. Not always, but most of the time.

Consumers of AST nodes are often interested in manipulating the AST nodes and pulling out the original source. Or at least that's my experience.

To avoid reinventing the wheel over and over again, putting these abstractions into the AST gem might make sense. I'm not suggesting putting the specific kinds of locations - just a generic base class which provides some guidance on what a location interface should provide.

Yes, adapters are possible and that's what I've done in the short term. However, it would be nice to take ffi-clang, and parse that into a compatible AST format, with essentially the same interface for location/source code manipulations.

iliabylich · 2020-05-18T15:36:28Z

AST in vacuum doesn't know anything about source code. It can be just an AST. I understand your idea, but

it sounds too generic to cover all cases (like the one with the def node above)
and it requires breaking changes

I agree that it works for literals and other simple cases, but if you could give more details on how it might look for complex scenarios (like def node, again) I'd be happy to take a look. Maybe I don't fully understand the picture.

iliabylich · 2020-05-18T15:42:01Z

Also, it's quite easy to customize the way how AST is constructed - https://github.com/whitequark/parser/blob/master/lib/parser/builders/default.rb#L1574. You can pass your own builder to the parser, and the build can use your own subclass of the AST node.

I'd suggest either extending a Builder class or making a visitor-like converter to any desired format.

ioquatix · 2020-05-18T15:44:21Z

(like def node, again)

The generic AST location would only know the full extent of the source that generated the AST, and a subclass would still be required e.g. Parser::Source::Map::Send etc for extracting specific bits.

iliabylich · 2020-05-22T16:48:40Z

After a few days I still believe that it's incorrect to add locations to generic AST nodes. AST is supposed to be abstract. You can draw it on a piece of paper or encode as a chain of nested arrays. It's just a tree data structure and I'm not sure if it's valid to add programming language -specific features to nodes.

@whitequark Do you have any thoughts?

Also /cc @marcandre @mbj @bbatsov @koic @palkan.

I clearly see the value of adding it to the ast gem, but it conceptually feels wrong to me. I'm sure that creating a custom subclass of AST::Node for your own needs (or a rewriter that converts ASTs to the unified format) seems to be a better way to achieve the same level of flexibility without mixing concepts of different libraries/languages.

whitequark · 2020-05-22T16:55:56Z

@iliabylich I agree with your assessment. If it turns out that we indeed "reinvent the wheel over and over again" then we can, being informed by that experience, change our mind and add the requested abstraction to the ast gem. But it seems preliminary to me at this point

marcandre · 2020-05-22T17:28:11Z

There's definitely a case to say that Source::Range, Source::Map, Source::TreeRewriter among others can be thought of completely independently from the Ruby language. Just the TreeRewriter is a non-trivial chunk of code and could be useful for any RuboCop-like for other languages, for example.

Many arguments expressed here focus on the original proposal (moving those to the AST gem), but maybe a better idea is an intermediate layer, like a source gem that would handle the basic ideas of Range, Map, Comment, Rewriter in a generic way?

marcandre · 2020-05-22T17:47:11Z

As a further example, the ast gem as it stands right now would be perfect if one wanted to refactor RuboCop's NodePattern using an AST instead of recursive descent, for example, and there would be no need for the source gem. But I'd love a full fledged ERB parser that would output an AST with all the Source information.

It's not clear if the right abstraction is a source gem, or a generic-parser gem, but it's pretty clear to me that the AST gem is great as is, but that there is a valuable intermediate layer waiting to be extracted.

ioquatix · 2020-05-23T00:31:17Z

Agreed on all points. My end goal is to have the same tooling for parsing C++ using ffi-clang into the same AST as parser but I need some shared abstractions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider making location explicit. #691

Consider making location explicit. #691

ioquatix commented May 17, 2020 •

edited

ioquatix commented May 17, 2020

ioquatix commented May 17, 2020

iliabylich commented May 18, 2020

ioquatix commented May 18, 2020

iliabylich commented May 18, 2020

iliabylich commented May 18, 2020

ioquatix commented May 18, 2020

iliabylich commented May 22, 2020 •

edited

whitequark commented May 22, 2020

marcandre commented May 22, 2020

marcandre commented May 22, 2020 •

edited

ioquatix commented May 23, 2020

Consider making location explicit. #691

Consider making location explicit. #691

Comments

ioquatix commented May 17, 2020 • edited

ioquatix commented May 17, 2020

ioquatix commented May 17, 2020

iliabylich commented May 18, 2020

ioquatix commented May 18, 2020

iliabylich commented May 18, 2020

iliabylich commented May 18, 2020

ioquatix commented May 18, 2020

iliabylich commented May 22, 2020 • edited

whitequark commented May 22, 2020

marcandre commented May 22, 2020

marcandre commented May 22, 2020 • edited

ioquatix commented May 23, 2020

ioquatix commented May 17, 2020 •

edited

iliabylich commented May 22, 2020 •

edited

marcandre commented May 22, 2020 •

edited