Json trees #3773

parrt · 2022-07-04T19:50:05Z

See #3772

Signed-off-by: Terence Parr <parrt@antlr.org>

runtime/Java/src/org/antlr/v4/runtime/tree/Trees.java

Signed-off-by: Terence Parr <parrt@antlr.org>

HSorensen · 2022-07-05T14:40:43Z

Are there any tree walkers or visitors that can utilize the JSON parse trees?

KvanTTT · 2022-07-05T14:48:25Z

I think it's up for runtime.

parrt · 2022-07-05T17:47:05Z

Are there any tree walkers or visitors that can utilize the JSON parse trees?

Any Target language that knows how to read json, should be able to pull these in and walk the trees recursively. I will have to build one in JavaScript as I'm trying to build a server / client webpage that communicates using this format.

Signed-off-by: Terence Parr <parrt@antlr.org>

HSorensen · 2022-07-05T18:11:16Z

Any Target language that knows how to read json,
As you of course already know using either the tree walker or visitor patterns is just so much more efficient.

parrt · 2022-07-05T18:13:18Z

@HSorensen yep, what I meant was somebody will have to deserialize the json into a proper parse tree and then the usual visitor in listener patterns will work great. This is only for sending stuff across a wire. If it's in memory this is all unnecessary.

Signed-off-by: Terence Parr <parrt@antlr.org>

parrt · 2022-07-05T19:31:17Z

@KvanTTT looking better, right?

Signed-off-by: Terence Parr <parrt@antlr.org>

KvanTTT · 2022-07-05T20:01:46Z

Yes, separated class looks better.

parrt · 2022-07-05T20:03:10Z

Added sample output and python parsing of json here: #3772

JamesRTaylor · 2022-11-16T02:32:55Z

This is really good stuff. Any work on the deserialization side? Do you think that's a bigger task?

parrt · 2022-12-10T18:46:45Z

Any work on the deserialization side? Do you think that's a bigger task?

Hi. Haven't done any work on deserialization. sorry.

parrt · 2022-12-10T18:47:47Z

There's a much better implementation I have for serialization in the antlr4-lab: https://github.com/antlr/antlr4-lab/blob/master/src/org/antlr/v4/server/JsonSerializer.java I hope to eventually fold this back into Antlr.

kaby76 · 2022-12-10T21:05:46Z

BTW, I've spent probably two or three years going through different implementations for the parse tree representation and serialization. After working on tree rewriting problems, I've come to the conclusion that the Antlr tree/tokenstream/chastream/interval implementation is definitely not the best representation for tree rewriting, especially if there are hundreds of edits to do: keeping it all consistent is very time consuming, and very tedious. I've settled on a tree decorated with text and attribute nodes for tokens and skip and off-channel text. Plus it is more easily adapted to XPath and XSLT engines.

parrt · 2022-12-11T01:33:13Z

I stopped doing tree rewriting for transformation purposes, and now use either token stream rewriting, or simply creating an internal model, and then generating code from there

On Sat, Dec 10, 2022 at 1:05 PM Ken Domino ***@***.***> wrote: BTW, I've spent probably two or three years going through different implementations for the parse tree representation and serialization. After working on tree rewriting problems, I've come to the conclusion that the Antlr tree/tokenstream/chastream/interval implementation is definitely not the best representation for tree rewriting, especially if there are hundreds of edits to do: keeping it all consistent is very time consuming, and very tedious. I've settled on a tree decorated with text and attribute nodes for tokens and skip and off-channel text. Plus it is more easily adapted to XPath and XSLT engines. — Reply to this email directly, view it on GitHub <#3773 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABLUWKBAAUQO6XVSLBZW33WMTWDJANCNFSM52UCGZZA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Dictation in use. Please excuse homophones, malapropisms, and nonsense.

JamesRTaylor · 2022-12-13T19:31:13Z

My use case was fast deserialization of a parse tree with as compact as possible encoded data. This is using Go. The end goal was to make deserialization significantly faster than re-parsing the original string. I was able to reduce the encoded data to about 80% compared to the string and reduce the decode time to about 35% of the parse time. In the end, it wasn't significantly faster than re-parsing (kudos to the parser!) to justify the extra code and limitations imposed on grammar writing (see below). It did show some promise, though.

The approach I took was to:

Serialize the parse tree (inspired by the serialization code) as a combination of token and rule indexes (not exactly a rule index as I needed to handle grammar # tags too).
Generate code to enable deserialization by introspecting the visitor. The input was the original string (since in our use case this was always going to be persisted and available) and the output was the result of walking the visitor.

Re-tokenize the original string
Deserialize the parse tree (calling generated code using rule index)
Re-walk the visitor to produce the domain objects

The one limitation I had was that grammar variables were problematic in that I had no good way to re-establish their state. I could have serialized and deserialized them, but that would have bloated the encoded data pretty significantly (though of course that depends on how your grammar was written). I chose to just not use grammar variables in my tests.

kaby76 · 2022-12-13T21:21:18Z

Thanks for the info.

The problem I'm working on, at the moment, is the scrape and conversion of the grammar for Python3, in Pegen syntax, to Antlr4. The parse of the Python3 grammar in Pegen syntax takes ~2.6s on a speedy machine--so long because the rules in a Pegen grammar do not have a rule terminator (e.g., the ';' at the end of a rule in Antlr4 grammars). Serialization of the parse tree, parser, and lexer tables takes ~0.03s, deserialization ~0.05s. The parse tree itself was changed to not use tokenstream/charstream/indices, but instead docorate the parse tree with text and attribute nodes for default channel and off-channel tokens and character strings. This representation allows for much faster tree node edits. In fact, the parse takes more time than deserialization, serialization, deleting and inserting hundreds of nodes involved in converting the grammar to Antlr4 syntax.

initial json tree impl

8525360

Signed-off-by: Terence Parr <parrt@antlr.org>

parrt changed the base branch from master to dev July 4, 2022 19:50

parrt added trees-contexts type:feature target:java labels Jul 4, 2022

parrt added this to the 4.10.2 milestone Jul 4, 2022

parrt mentioned this pull request Jul 4, 2022

Making Parse Tree Serializable #233

Open

parrt added 2 commits July 4, 2022 12:59

improve literals

d13e826

Signed-off-by: Terence Parr <parrt@antlr.org>

test optional start rule

82d8490

Signed-off-by: Terence Parr <parrt@antlr.org>

KvanTTT reviewed Jul 4, 2022

View reviewed changes

runtime/Java/src/org/antlr/v4/runtime/tree/Trees.java Outdated Show resolved Hide resolved

parrt added 4 commits July 4, 2022 13:51

use token index only

41a46a5

Signed-off-by: Terence Parr <parrt@antlr.org>

dump it all now

19b46df

Signed-off-by: Terence Parr <parrt@antlr.org>

Move tests to descriptors. only works for java now.

ce5da19

Signed-off-by: Terence Parr <parrt@antlr.org>

skip non-java JSON tests for now

02f08fc

Signed-off-by: Terence Parr <parrt@antlr.org>

parrt added 2 commits July 5, 2022 10:51

update all target test templates for ToJSON(s) template.

083c873

Signed-off-by: Terence Parr <parrt@antlr.org>

json all on one line; add start/stop not text for tokens

f9d2432

Signed-off-by: Terence Parr <parrt@antlr.org>

parrt added 3 commits July 5, 2022 11:42

move to separate class

232f5e9

Signed-off-by: Terence Parr <parrt@antlr.org>

add comment

6a1d7f9

Signed-off-by: Terence Parr <parrt@antlr.org>

add more complex grammar

f121ba4

Signed-off-by: Terence Parr <parrt@antlr.org>

more comment

7e7057b

Signed-off-by: Terence Parr <parrt@antlr.org>

uwol mentioned this pull request Aug 14, 2023

How can i store the generated AST in JSON format? uwol/proleap-cobol-parser#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Json trees #3773

Json trees #3773

parrt commented Jul 4, 2022

HSorensen commented Jul 5, 2022

KvanTTT commented Jul 5, 2022

parrt commented Jul 5, 2022

HSorensen commented Jul 5, 2022

parrt commented Jul 5, 2022

parrt commented Jul 5, 2022

KvanTTT commented Jul 5, 2022

parrt commented Jul 5, 2022

JamesRTaylor commented Nov 16, 2022

parrt commented Dec 10, 2022

parrt commented Dec 10, 2022

kaby76 commented Dec 10, 2022

parrt commented Dec 11, 2022 via email

JamesRTaylor commented Dec 13, 2022 •

edited

kaby76 commented Dec 13, 2022

Json trees #3773

Are you sure you want to change the base?

Json trees #3773

Conversation

parrt commented Jul 4, 2022

HSorensen commented Jul 5, 2022

KvanTTT commented Jul 5, 2022

parrt commented Jul 5, 2022

HSorensen commented Jul 5, 2022

parrt commented Jul 5, 2022

parrt commented Jul 5, 2022

KvanTTT commented Jul 5, 2022

parrt commented Jul 5, 2022

JamesRTaylor commented Nov 16, 2022

parrt commented Dec 10, 2022

parrt commented Dec 10, 2022

kaby76 commented Dec 10, 2022

parrt commented Dec 11, 2022 via email

JamesRTaylor commented Dec 13, 2022 • edited

kaby76 commented Dec 13, 2022

JamesRTaylor commented Dec 13, 2022 •

edited