Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Parse Tree Serializable #233

Open
JitCompiler opened this issue Apr 24, 2013 · 22 comments
Open

Making Parse Tree Serializable #233

JitCompiler opened this issue Apr 24, 2013 · 22 comments

Comments

@JitCompiler
Copy link

Is it possible to save up the parse tree generated by ANTLR4? For example, by making ParserRuleContext serializable.

I'd like to use ANTLR4 to parse the source files in my project. My project is quite large and contains hundreds of source files. Usually, I need to walk through the parse trees of several source files to get my desire info. It takes a while to parse even just 1 source file. So, it would be impractical to parse all the source files again every time I start my tool in order to get 1 piece of information (e.g. callers of a function). It would be nice if I can just parse once and save the parser output into the hard disk.

Another scenario is say I need to parse a cpp source file together and its related header files. e.g. to retrieve the function declarations from the header files for each function definition in the cpp file. But each header file is also used by another cpp file so it means its parse tree would be needed again when we examine the parse tree of a second cpp file.

@ghost ghost assigned sharwell Apr 24, 2013
@sharwell
Copy link
Member

sharwell commented Jun 2, 2013

I'm going to defer this issue for now, since I'm not confident how I want to handle it and other issues are consuming our time leading up to the 4.1 release.

@HSorensen
Copy link
Contributor

Just a datapoint.
On my version of antlr4 (see pull request 306) I toyed a bit with the java.io.ObjectOutputStream and java.io.ObjectInputStream to write the ParseTree to a file and by adding implements Serializable to the following classes:

org.antlr.v4.runtime.misc
  IntegerStack.java
  Pair.java

org.antlr.v4.runtime.tree
  TerminalNodeImpl.java

org.antlr.v4.runtime
  ANTLRInputStream.java
  CommonTokenFactory.java
  IncludeStrategyImpl.java
  Lexer.java
  RuleContext.java

The following works on my initial test case:

  ParseTree tree = myparser.start();            
  OutputStream outputStream = new FileOutputStream("afile.data");
  ObjectOutputStream objectOutputStream = new ObjectOutputStream(outputStream);
  objectOutputStream.writeObject(tree);
  objectOutputStream.close();

and reading the parse tree with

  ParseTree tree2;
  InputStream inputStream = new FileInputStream("afile.data");
  ObjectInputStream objectInputStream = new ObjectInputStream(inputStream);
  tree2=(ParseTree) objectInputStream.readObject();
  objectInputStream.close();

Thanks
Henrik

@sharwell sharwell removed their assignment Feb 20, 2015
@HSorensen
Copy link
Contributor

I have a solution that works for me. See commit in my fork

@JitCompiler
Copy link
Author

Are there any plans to incorporate all these into the release build? A simple but useful feature that has been requested for 3 years

@JPMoresmau
Copy link

I would like that too. Is there any philosophical objections from the maintainers? Is it just a matter of submitting a PR with some tests?

@HSorensen
Copy link
Contributor

HSorensen commented May 9, 2017 via email

@KvanTTT
Copy link
Member

KvanTTT commented May 9, 2017

I suggest to use JSON or a simple indented tree format (toStringTree method returns similar tree).

@HSorensen
Copy link
Contributor

HSorensen commented May 9, 2017 via email

@KvanTTT
Copy link
Member

KvanTTT commented May 13, 2017

JSON is a standard format and there are plenty libraries that support it.

@HSorensen
Copy link
Contributor

Yeah, that I understand. But can any of these libraries be used in the same manner as the generated tree walkers ?

@JitCompiler
Copy link
Author

JitCompiler commented Jun 20, 2017

the current implementation is very Java centric, but works well. Sam had
some ideas for a different implementation. I am currently looking at
storing the parse tree in a graph database. Looks really promising.

4 years after making the first request I, at last, realize what the problem was. Seems like the developers were looking for some "generic" but inevitably complex solutions at the expense of an extremely simple solution for 1 specific platform - to declare a few classes implementing the Serializable interface in the Java platform.

This is the same mistake that developers have been making all over the world. Ideas from having database independent, OS independent, language independent to xxx-independent. At the end, we are sacrificing the powerful facilities offered by 1 specific platform in order to be independent from that platform.

@JPMoresmau
Copy link

Well I've written my own little system (for my company, so not open source) to serialize the Parse Tree into JSON, and then add a JSON tree walker that you can register call backs on a specific rule name. So it's not a transparent system (the API for callbacks is different than Antlr listeners or visitors) but it serves my purpose well.

@sharwell
Copy link
Member

sharwell commented Jun 23, 2017

@JitCompiler Actually it's a bit different. Since users can modify tree nodes arbitrarily with new fields, it's not safe to claim that this type is serializable. Any solution would be at best application-specific. My hope was the overall performance of ANTLR 4 meant the need for serialization was sufficiently sparse that in the end the few people who would ever need it end up implementing it independently anyway. I believe that this is a reasonable reflection of the current reality.

@HSorensen
Copy link
Contributor

HSorensen commented Jun 23, 2017 via email

@rkumar-cognam
Copy link

Since users can modify tree nodes arbitrarily with new fields, it's not safe to claim that this type is serializable.

@sharwell With all due respect, does this mean trees are mutable now in ANTLR4 ? or can we just update the fields or insert new ones into the nodes ? Sorry if it was a stupid question.

@sharwell
Copy link
Member

sharwell commented Jul 11, 2017

@rkumar-cognam The locals, parameters, and returns features all add fields to parse tree nodes which are not guaranteed to be serializable.

The maintenance overhead is negligible, at least for the java target part I cannot speak for the targets.

Currently only the generated code and serialized ATN can be produced with one version of ANTLR and then attempted to be used with a different version of the runtime. Serialization means all the trees become subject to this scenario. Avoiding marking these trees as serializable moves the burden of maintaining this compatibility, if needed, to the application which needs it. Over time, when it comes to serialization this is a highly-desirable maintenance characteristic.

@ftomassetti
Copy link

Personally I had this need several times, the why I solved it is by using a library to complement ANTLR. Basically I translate the ANTLR's parse-tree into an AST written in this library, where I can do all sort of manipulations and transformations and solve serialize the AST to JSON or XML.

In case someone is interested, the library is called Kolasu. While the documentation is not great we have used it in several open-source and commercial projects, so it works quite decently.

@parrt
Copy link
Member

parrt commented Jul 4, 2022

Hi @ftomassetti I just started a branch: #3773 to gen json. Is json enough for you to make a json->parse tree converter for your needs?

@ftomassetti
Copy link

Yes, thank you @parrt !

@parrt
Copy link
Member

parrt commented Jul 5, 2022

Cool. This is expanding into a full serialization of an entire parse (I'm playing with antlr server how is a grammar playground). Any requirements you have would be welcome sir!

@bigstar119
Copy link

Hi,i have the same problem. my input is 889Byte and after parse,the ParseTree is 63KB, expansion factor nearly 6x. i tried java's native serialization/deserialization. but it's slow then reparse the input use antlr again.

Anyone has the quick way to serialization/deserialization?

@HSorensen
Copy link
Contributor

@bigstar119 I had good results using XZ compression of the parse trees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants