Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow converting a ParseTree to an AST #2428

Open
NathanJAdams opened this issue Nov 30, 2018 · 21 comments
Open

Allow converting a ParseTree to an AST #2428

NathanJAdams opened this issue Nov 30, 2018 · 21 comments

Comments

@NathanJAdams
Copy link

What would be awesome and IMHO an obvious next step with ANTLR would be to give people the ability to convert from a generated ParseTree to an AST.

There are a number of ways this could be done eg:

  1. ANTLR could have a generic Node, Leaf, Branch classes the ParseTree information is mapped onto.
  2. ANTLR could create a "best guess" at what classes are needed.
  3. The user could create a bunch of minimal data classes and the converter maps the ParseTree information into the user classes.
  4. The user could inject options to control how the conversion process happens, for example, they could set flags to keep/remove information, set operators on how to combine different ParseTrees into a single AST node.
@mike-lischke
Copy link
Member

This makes no sense, really. A parse tree is a superset of an AST. You have all the info, which is contained in an AST already in your parse tree. Only the access to the token info is a tad bit more complex (you have to get it from the context's start/stop nodes).

@NathanJAdams
Copy link
Author

Try writing a compiler with just a parse tree and it'll make more sense

@KvanTTT
Copy link
Member

KvanTTT commented Jan 16, 2019

You can manually convert the parse tree to AST with Visitor. BTW, Roslyn Compiler Platform uses full fidelity parse trees: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview#syntax-trees.

@NathanJAdams
Copy link
Author

Roslyn isn't a compiler though, it's a set of tools for IDEs and code analysis. A parse tree is the right structure for that kind of job, and an AST is the right structure for a compiler.

Sure of course it's possible to do it manually, but if it's a common use case it would make sense to abstract it out and allow the user to avoid reinventing the wheel.

@sharwell
Copy link
Member

Roslyn isn't a compiler though

It definitely is.

@mykolav
Copy link

mykolav commented Jan 3, 2020

@mike-lischke

This makes no sense, really.

Actually, this feature-request to simplify AST construction makes a ton of sense.
Antlr v3 used to have built-in mechanisms supporting AST construction. Take a look at the Tree construction page from the docs of Antlr v3 for additional details.
Antlr v4 dropped the AST construction support mechanisms as, according to Terence Parr, supporting building compilers is explicitly a non-goal for Antlr v4

Because most ANTLR users don't build compilers, I decided to focus on the other applications for ANTLR v4: parsing and extracting information and then translations. For compilers, we need to convert everything into operations and operands--that means ASTs are easier.

Clearly, the above implies ASTs absolutely make sense for compiler construction and similar tasks,
Antlr v4 just doesn't aim at supporting such tasks.

A parse tree is a superset of an AST.

Sorry, but this is plain wrong.
Sure, there can be individual cases where a parse tree is a superset of the corresponding AST. But in general, an AST at most forms an intersecting set with the corresponding ParseTree.
Eli Bendersky laid out the differences between ASTs and Parse trees in a very nice and concise way in his blogpost Abstract vs. Concrete Syntax Trees .

You have all the info, which is contained in an AST already in your parse tree.

Just like the source code the parse tree was built from contains all the same info and maybe more.
It seems a parse tree doesn't help to meaningfully reduce effort to build AST compared to just rolling out a hand-made recursive descent parser — I'll try to give an example illustrating this statement in another comment.
Basically, Antlr v4 is likely not the right tool for jobs that rely on having AST. This is fair enough, given ease of AST construction was never a goal of v4. It just seems the documentation could state this more prominently.

@mykolav
Copy link

mykolav commented Jan 3, 2020

@KvanTTT

TLDR: A Roslyn syntax tree is an AST it is not a parse tree.

You can manually convert the parse tree to AST with Visitor.

You sure can. Here is an arbitrary example of such a conversion: AstBuilder.java
(Please notice, I'm not affiliated with the project in any way)
Given AstBuilder is about 2000 lines of code. One has to wonder if using Antlr is worth it in this case.
Is a hand-rolled recursive descent parser going to take that much more code and effort to implement?
Keep in mind that in case of a hand-made parser:

  • The application can forego a redundant step of building an Antlr ParseTree to save some CPU cycles
  • RAM consumptions is decreased as there is no need to allocate memory for the ParseTree
  • The build can be simplified by removing the parser generation step
  • There is one layer of abstraction fewer to reason about when trying to make sense of the code
  • Plus all the usual pro-s of having a hand-rolled parser

In other words, without supporting direct AST generation, it isn't obvious what benefits Antlr brings to a task of parsing a programming language.

BTW, Roslyn Compiler Platform uses full fidelity parse trees: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview#syntax-trees.

Sorry, but calling a Roslyn syntax tree a parse tree is factually wrong.
At least, given the commonly accepted definition of a parse tree. Please take a look at Eli Bendersky's post Abstract vs. Concrete Syntax Trees for a clear and concise explanation of the AST vs ParseTree differences.

You're correct that Roslyn produces full fidelity trees. But they are full fidelity in the sense that its possible to do a round-trip from a Roslyn syntax tree to the original source code exactly as the code was spelled by the programmer.
Please take a look at the two screenshots below. One is a Roslyn syntax tree and the other is Antlr's ParseTree for the same C# code.
I believe, from the screenshots its obvious the Roslyn syntax tree is an AST. While Antlr's tree is a ParseTree. I guess, we can say a ParseTree is a full fidelity tree with respect to the grammar — i.e., a ParseTree contains a node for each non-terminal of the grammar.

class Program
{
    public static int Main(string[] args)
    {
        return 0;
    }
}

RoslynSyntaxTree

Antlr4ParseTree

@sharwell
Copy link
Member

sharwell commented Jan 3, 2020

The real answer to this is #2428 (comment). ANTLR 3 tried to make ASTs using the grammar, but they just ended up as partial-fidelity parse trees that took longer to create than full-fidelity ones. The whole feature was dropped from ANTLR 4 which made it simpler, faster, and more powerful. The tools already exist if a different particular representation is desired in an application.

@mykolav
Copy link

mykolav commented Jan 3, 2020

If anyone lands here in search of a way of converting an Antlr's parse tree into an AST,
I believe this StackOverflow answer does a stellar job explaining it.
And here's a real-world example, AstBuilder.java

@KvanTTT
Copy link
Member

KvanTTT commented Jan 3, 2020

ANTLR tree depends on grammar. Moreover, it's possible to skip a step of parse tree building and generate AST during the parsing process. In this case listeners with disabled option buildParseTrees = false should be used.

@lwhite1
Copy link

lwhite1 commented Aug 20, 2020

First, thank you everyone for your work in developing and maintaining this fantastic library. I can completely understand the desire to remove features that weren't being widely used and require a great deal of work to maintain.

However, the suggestion in this thread that converting a parse tree to an AST tree is easy to do by hand is less easy to understand. If someone had the time and skill, this would be an awesome feature. Consider:

Parse trees are really easy to build by hand and are so regular that tools like ANTLR can automate the process for us. That’s the good news. The bad news is that parse trees are extremely inconvenient to walk and transform. Parse trees are full of noise because of all the interior rule nodes. They are also very sensitive to changes in the grammar.

Parr, Terence. Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers) (pp. 91-92). Pragmatic Bookshelf. Kindle Edition.

When I read in that book about the AST support in ANTLR 3, I was really impressed by the convenience it promised, and especially the potential for reducing sensitivity to grammar changes as the language is developed.

So, thank you again for your work, and thank you for leaving the issue open. I hope someone can get to it someday. Cheers.

@alessiostalla
Copy link

I agree that ANTLR should focus on one thing, parsing. Let's leave the AST to other projects, that may experiment with different approaches.

Shameless plug, you may be interested in the "*LaSu" methodology that we use, and the open-source support libraries that we've written for it, that provide the building blocks for an AST and some glue code to ease the mapping from an ANTLR parse tree: https://github.com/Strumenta/StarLasu

@BubblyJuly

This comment was marked as outdated.

@Korporal
Copy link

This is an interesting post about this

https://tomassetti.me/migrating-from-antlr2-to-antlr4/

@BubblyJuly

This comment was marked as outdated.

@NilsBaumgartner1994
Copy link

Would like to just print out the modified tree so i can get the source code. I think generating an AST or having a minimal example on how to "print" the resulting code would be nice.

@BubblyJuly

This comment was marked as outdated.

@NilsBaumgartner1994
Copy link

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

Translated:

"This is a holiday auto-reply email from QQ mailbox.

Hello, I am on vacation recently and cannot reply to your email personally. I will reply to you as soon as possible after the vacation."

*Incomming auto-reply mail again ..."

@neon12345
Copy link

I also need antlr to create an AST rather than a parse tree, so I have created two repositories working towards this. With some additional information, it should be possible to compress and modify the tree into an AST automatically. As stated, it is more efficient to create an AST instead of a parse tree if needed, but there are use cases where performance is not as important and a smaller AST is more manageable (e.g. AST to AST translation). The project is in an early stage: part1 part2

@BubblyJuly

This comment was marked as outdated.

@neon12345
Copy link

I made some progress in creating the tool to transform the antlr parse tree into an AST. As a result, I can now create a self-contained JavaScript version of the AST and also query/transform/print/visualize it with CSS selectors from JavaScript.

Demo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests