-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json trees #3773
base: dev
Are you sure you want to change the base?
Json trees #3773
Conversation
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
Are there any tree walkers or visitors that can utilize the JSON parse trees? |
I think it's up for runtime. |
Any Target language that knows how to read json, should be able to pull these in and walk the trees recursively. I will have to build one in JavaScript as I'm trying to build a server / client webpage that communicates using this format. |
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
|
@HSorensen yep, what I meant was somebody will have to deserialize the json into a proper parse tree and then the usual visitor in listener patterns will work great. This is only for sending stuff across a wire. If it's in memory this is all unnecessary. |
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
Signed-off-by: Terence Parr <parrt@antlr.org>
@KvanTTT looking better, right? |
Signed-off-by: Terence Parr <parrt@antlr.org>
Yes, separated class looks better. |
Added sample output and python parsing of json here: #3772 |
This is really good stuff. Any work on the deserialization side? Do you think that's a bigger task? |
Hi. Haven't done any work on deserialization. sorry. |
There's a much better implementation I have for serialization in the antlr4-lab: https://github.com/antlr/antlr4-lab/blob/master/src/org/antlr/v4/server/JsonSerializer.java I hope to eventually fold this back into Antlr. |
BTW, I've spent probably two or three years going through different implementations for the parse tree representation and serialization. After working on tree rewriting problems, I've come to the conclusion that the Antlr tree/tokenstream/chastream/interval implementation is definitely not the best representation for tree rewriting, especially if there are hundreds of edits to do: keeping it all consistent is very time consuming, and very tedious. I've settled on a tree decorated with text and attribute nodes for tokens and skip and off-channel text. Plus it is more easily adapted to XPath and XSLT engines. |
I stopped doing tree rewriting for transformation purposes, and now use
either token stream rewriting, or simply creating an internal model, and
then generating code from there
On Sat, Dec 10, 2022 at 1:05 PM Ken Domino ***@***.***> wrote:
BTW, I've spent probably two or three years going through different
implementations for the parse tree representation and serialization. After
working on tree rewriting problems, I've come to the conclusion that the
Antlr tree/tokenstream/chastream/interval implementation is definitely not
the best representation for tree rewriting, especially if there are
hundreds of edits to do: keeping it all consistent is very time consuming,
and very tedious. I've settled on a tree decorated with text and attribute
nodes for tokens and skip and off-channel text. Plus it is more easily
adapted to XPath and XSLT engines.
—
Reply to this email directly, view it on GitHub
<#3773 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABLUWKBAAUQO6XVSLBZW33WMTWDJANCNFSM52UCGZZA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Dictation in use. Please excuse homophones, malapropisms, and nonsense.
|
My use case was fast deserialization of a parse tree with as compact as possible encoded data. This is using Go. The end goal was to make deserialization significantly faster than re-parsing the original string. I was able to reduce the encoded data to about 80% compared to the string and reduce the decode time to about 35% of the parse time. In the end, it wasn't significantly faster than re-parsing (kudos to the parser!) to justify the extra code and limitations imposed on grammar writing (see below). It did show some promise, though. The approach I took was to:
The one limitation I had was that grammar variables were problematic in that I had no good way to re-establish their state. I could have serialized and deserialized them, but that would have bloated the encoded data pretty significantly (though of course that depends on how your grammar was written). I chose to just not use grammar variables in my tests. |
Thanks for the info. The problem I'm working on, at the moment, is the scrape and conversion of the grammar for Python3, in Pegen syntax, to Antlr4. The parse of the Python3 grammar in Pegen syntax takes |
See #3772