Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making location explicit. #691

Open
ioquatix opened this issue May 17, 2020 · 12 comments
Open

Consider making location explicit. #691

ioquatix opened this issue May 17, 2020 · 12 comments

Comments

@ioquatix
Copy link

ioquatix commented May 17, 2020

I've been using this gem quite a bit recently.

I'm also looking at parsing C/C++ using clang into the same format, with the goal of providing a unified index for symbol lookup (as much as is possible).

In practice, it seems to me that an AST is (almost) always tied to some kind of source "location". I know there are cases where an AST is completely detached from actual code, but that is not, in my experience, the most common use case.

Have you considered whether it would make sense to expose a generic Location abstraction in the AST gem, that would cover multiple use cases?

In more concrete terms, making an interface for:

class AST::Node
  attr :location
end

# Introduce:

class AST::Location
  # Generic handling of source code, e.g. getting textual value, etc.
end

When I come to implementing a parser for C++ using the AST gem, I'm either going to have to roll my own location class with the same interface as the parser gem, or, preferably, follow some standard set out by the AST gem. The benefit would make it easier for converting an AST back into source code.

Essentially, what I'm proposing, is whether we can move a very light amount of location interface abstractions into the AST gem, such that parsing other languages could expose a shared interface for accessing location details.

@ioquatix
Copy link
Author

In particular node.location.expression.source seems like something that could easily be useful across multiple languages, at least at the interface level.

@ioquatix
Copy link
Author

Sorry, one other point that might make sense, is if this gem is the canonical source for parsing in Ruby, what about making parser-ruby and parser-clang etc, and using this base gem to share data structures?

@iliabylich
Copy link
Collaborator

Sorry for a late response.

it seems to me that an AST is (almost) always tied to some kind of source "location"

For me AST node represents the intent of some piece of code. If you want to do some static analysis you can also attach locations to nodes, but it's not necessary. Well, they are required to build backtraces, and so to build let's say an AST interpreter (like old MRI) you need them.

Have you considered whether it would make sense to expose a generic Location abstraction in the AST gem

AST gem handles something like s-expressions that are not aware of any programming language and source files. Parser gem implements a subclass that adds location field - https://github.com/whitequark/parser/blob/master/lib/parser/ast/node.rb#L18.

Also, many nodes are more complicated than just plain literals (numbers, strings etc). For example, take a look at the def node:

$ bin/ruby-parse -Le 'def m(a); 42; end'
s(:def, :m,
  s(:args,
    s(:arg, :a)),
  s(:int, 42))
def m(a); 42; end
~~~ keyword
    ~ name    ~~~ end
~~~~~~~~~~~~~~~~~ expression

There's a location of the def keyword, of the method name and of the end keyword. Currently AST node has one location that encapsulates multiple ranges.

Do you propose having one top-level location with one range? Should we encapsulate other locations inside it? Like def_node.location.def or def_node.location.end? Sounds like a breaking change to me.

and using this base gem to share data structures?

I don't like this idea because it introduces a hard dependency between 2 libraries that don't really know anything about each other (they have the same dependency with data structures, but that's the only thing they have in common). How about making an adapter that converts AST nodes to a single unified format?

@ioquatix
Copy link
Author

I'm not sure this is a good idea but my original thoughts were that the vast majority of AST nodes will be attached to a source location and some kind of source representation. Not always, but most of the time.

Consumers of AST nodes are often interested in manipulating the AST nodes and pulling out the original source. Or at least that's my experience.

To avoid reinventing the wheel over and over again, putting these abstractions into the AST gem might make sense. I'm not suggesting putting the specific kinds of locations - just a generic base class which provides some guidance on what a location interface should provide.

Yes, adapters are possible and that's what I've done in the short term. However, it would be nice to take ffi-clang, and parse that into a compatible AST format, with essentially the same interface for location/source code manipulations.

@iliabylich
Copy link
Collaborator

AST in vacuum doesn't know anything about source code. It can be just an AST. I understand your idea, but

  1. it sounds too generic to cover all cases (like the one with the def node above)
  2. and it requires breaking changes

I agree that it works for literals and other simple cases, but if you could give more details on how it might look for complex scenarios (like def node, again) I'd be happy to take a look. Maybe I don't fully understand the picture.

@iliabylich
Copy link
Collaborator

Also, it's quite easy to customize the way how AST is constructed - https://github.com/whitequark/parser/blob/master/lib/parser/builders/default.rb#L1574. You can pass your own builder to the parser, and the build can use your own subclass of the AST node.

I'd suggest either extending a Builder class or making a visitor-like converter to any desired format.

@ioquatix
Copy link
Author

(like def node, again)

The generic AST location would only know the full extent of the source that generated the AST, and a subclass would still be required e.g. Parser::Source::Map::Send etc for extracting specific bits.

@iliabylich
Copy link
Collaborator

iliabylich commented May 22, 2020

After a few days I still believe that it's incorrect to add locations to generic AST nodes. AST is supposed to be abstract. You can draw it on a piece of paper or encode as a chain of nested arrays. It's just a tree data structure and I'm not sure if it's valid to add programming language -specific features to nodes.

@whitequark Do you have any thoughts?

Also /cc @marcandre @mbj @bbatsov @koic @palkan.

I clearly see the value of adding it to the ast gem, but it conceptually feels wrong to me. I'm sure that creating a custom subclass of AST::Node for your own needs (or a rewriter that converts ASTs to the unified format) seems to be a better way to achieve the same level of flexibility without mixing concepts of different libraries/languages.

@whitequark
Copy link
Owner

@iliabylich I agree with your assessment. If it turns out that we indeed "reinvent the wheel over and over again" then we can, being informed by that experience, change our mind and add the requested abstraction to the ast gem. But it seems preliminary to me at this point

@marcandre
Copy link
Contributor

There's definitely a case to say that Source::Range, Source::Map, Source::TreeRewriter among others can be thought of completely independently from the Ruby language. Just the TreeRewriter is a non-trivial chunk of code and could be useful for any RuboCop-like for other languages, for example.

Many arguments expressed here focus on the original proposal (moving those to the AST gem), but maybe a better idea is an intermediate layer, like a source gem that would handle the basic ideas of Range, Map, Comment, Rewriter in a generic way?

@marcandre
Copy link
Contributor

marcandre commented May 22, 2020

As a further example, the ast gem as it stands right now would be perfect if one wanted to refactor RuboCop's NodePattern using an AST instead of recursive descent, for example, and there would be no need for the source gem. But I'd love a full fledged ERB parser that would output an AST with all the Source information.

It's not clear if the right abstraction is a source gem, or a generic-parser gem, but it's pretty clear to me that the AST gem is great as is, but that there is a valuable intermediate layer waiting to be extracted.

@ioquatix
Copy link
Author

Agreed on all points. My end goal is to have the same tooling for parsing C++ using ffi-clang into the same AST as parser but I need some shared abstractions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants