New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement Ideas #1
Comments
I haven't done any further work on this project. Do you have any suggestions for improvements you're interested in? Are you using this project somewhere? I know I had some ideas for it, but I believe most things I was thinking of were pretty radical changes that would require redoing the byte compiler - essentially a new project entirely. It's not very fresh in my memory and I haven't dug into relevant notes in a while. |
Hello Chris ! But latelly I've been looking at PEG parsers due to it's easy of use/creation/evaluation (although we can make big mess as well like non typed scripting languages that is prevented by parsers generators that do several sanity checks) and found this one https://github.com/yhirose/cpp-peglib and it's online playground https://yhirose.github.io/cpp-peglib that has good performance and having the playground make even easier to develop/check a grammar. Other parser generators with online playgrounds:
And I was looking before at Your project is small/simple and has good performance low resource usage produces an AST/Parser tree and also can evaluate at runtime, I think that having a online playground like the I'm trying to extend it a bit to manage the grammar shown bellow and managed to add the easy part (expand escape sequences) but the case insensitive literals is taking a bit of time to figure out (see the change to the VM bellow). I would like to know what are your new ideas and maybe I'm willing to cooperate/help implement then. Another intriguing The
VM changes to add
|
Also you parser/vm seem to manage left recursion judging by one test https://github.com/ChrisHixon/chpeg/blob/master/test/grammars/left_recursion.peg but I didn't tried it with a real grammar so far. |
I did one test that made me seriously consider your project with a |
That all looks very interesting. I will check this out after I refresh myself on the project a bit. I'm a little lost trying to figure out how all the pieces fit together at the moment. The example in this project does not compile...
I didn't really look far into this. Any ideas - since you've been using the project? Do you have a fork somewhere with your changes/experiments? |
This project does not support left recursion. It's meant to be simple and fast, leaving it up to the user to avoid left recursion. I believe it is designed to fail and return an error on left recursion, rather than crash. |
I just cloned this repository and moved to examples folder and executed make and I've got an executable:
|
I also merged your |
Thanks, I'll check it out. I'm getting a similar error using clang. I'm thinking this is a bug that needs to be fixed in this code; perhaps some (older?) toolchains are ok with it. What versions of cc and ld are you using? I'm on Arch Linux (rolling):
|
Ok I just found the problem you mention it's the declaration of
|
Ah, yeah, that does it, thanks. |
I checked out the SQL PEG grammar along with the test SQL file. It's nice to see this code working with a large grammar for an established complex language. I have not tested much besides the grammars you see in the repo. Did you have to modify the grammar to get it to work with chpeg? I notice a few One idea is to add a separate lex stage that generates tokens, and then the PEG stage deals with tokens rather than characters. White space could be removed at the lex stage. Of course this is no longer really PEG, though perhaps it could be done in a way that is PEG compatible. Another idea is to cache (or add a way to load/save) the byte code so the grammar doesn't have to be compiled every time a program instance is run. I'll have to give the case insensitive issue some thought and check out your code changes for that. Have you encountered other PEG implementations that have tacked on case insensitive matching? If so, how did they handle the PEG syntax extension? Is |
Yes The one that has interesting ideas is |
The question about the grammar I'm trying to look for another parser generators that have usage and opensource projects and write a translator that reads the grammar and convert it to another format, like I did for bison/byacc/lemon and for
|
Here is an example of a C99 parser that is not working properly on On
With chpeg:
|
I opened new issues for a couple things discussed here that need to be fixed (or investigated as potential bugs). Feel free to open more as necessary, and we can talk about misc. improvements, changes, etc. on this issue. Also, we can open more issues for distinct improvement ideas, like say, case insensitivity. |
Here is a first crude attempt to create a playground for |
Probably in the future we'll need to add a prefix to all public symbols to avoid name clash with other code/libraries because some |
Moving this here to the improvements issue. I think that these could have clearer names or explanations:
STOP is really more like "CAPTURE", it forces that node to capture (contain what is under it) and not be eliminate by unwrap. Unwrap is probably unclear also, but it seems similar to what https://yhirose.github.io/cpp-peglib/ does as 'optimize' AST. I'm not sure of the correct or most often used terminology for this concept. Unwrap eliminates intermediate nodes that only contain one child node. A->B->C->D reduced to D. |
I'm do not have too much work done with AST creation and at first your way was simpler to digest (only 3 options, although in For example lpeg/lpegrex (https://github.com/edubart/lpegrex) have so many options that I've got lost, in For myself I think that the best way to find out how or if something is useful is to dogfood it like we did here with the In resuming a good working collection of non trivial grammars in my opinion is the best showcase for a parser generator. And there is a large pool of grammars done for https://tree-sitter.github.io/tree-sitter/ with a |
I'm aiming for a sort of minimal approach to everything with chpeg. I forget what those [ILUSE] options are in top_down, but I probably eliminated some useless/redundant stuff. For chpeg I came up with what seemed to be the bare minimum necessary to generate the AST you are interested in. Once you start using the AST output for some purpose you start to figure out the AST you want to make your life easier or make things clearer/simpler - what you want to capture at what level for your use case. It seems like the extensions get out of hand with a lot of the parsers. I'd like to come up with a simple way to do the kinds of things you're talking about. |
Another improved try to generate cbytecode from
Output:
|
I created a new issue related to eliminating the Ruby bootstrap and going with separate bytecode .h files. Your code will be a good start. |
Would be nice to add hexadecimal escape sequences and in future unicode escape sequences:
|
Yes I thought of that, but why make a special case for that when user can use |
Only to avoid confuse the user ! |
Unfortunately the negated char class extension breaks PEG compatibility, meaning valid PEG can fail or be interpreted differently. Note that cpp-peglib flags |
This tool is awesome to simplify grammars https://www.bottlecaps.de/rr/ui , when possible I always add an option to output equivalent EBNF understood by https://www.bottlecaps.de/rr/ui to help develop/debug grammars see https://github.com/mingodad/lalr-parser-test , https://github.com/mingodad/CocoR-CPP, https://github.com/mingodad/CocoR-CSharp, https://github.com/mingodad/CocoR-Java and https://github.com/mingodad/peg/blob/cf6630fb15ca3ff430d36b28e72a244bed2b9f4e/src/tree.c#L344 .
And the simplified one:
|
With my alternative grammar shown here #1 (comment) this grammar seems to parse properly (and indeed
AST:
|
Nice... If you want to give that a shot with chpeg (the utility), you could add an action to output EBNF (and/or actions for other formats); see util/src, util/src/actions. This would be a matter of grabbing the parse from -p/-P and then walking the AST. |
Yes, it looks like you removed the |
As I said before it's almost done here https://github.com/mingodad/chpeg/blob/696339f5802834a49e56371a255e3aeae487b7af/src/bytecode.c#L503 with an extra parameter to decide when to use |
Right, but that's not the approach I'd take or suggest, the AST from a parse is much easier to deal with than reversing the bytecode, which I think could be problematic, especially if changes are made later to bytecode/parser. It might map pretty well to the AST now but that could change later if it's ever reorganized or optimizations added. It maps pretty well now because it's a pretty basic/naive translation to bytecode. |
I agree with you, but honestly I'm having trouble to work with the AST, first because it's not easy to get my desired AST shape and because the same problem can happen when the way AST is constructed change and any existing working code need to be reviewd (AST simplification 0/1/2/?). But I would suggest to do it and then you'll be dog fooding |
Right now |
For example here is an
|
And here is a
|
Could you as an exercise build the equivalent of Again another way to dogfood |
The compiler itself is where I'm using the AST the most. Since there are no callbacks/actions, it seems easiest to wrap the AST with your own tree where you add whatever your app requires - this is what the compiler does. The compiler is stuck on |
This is the answer from |
I'll consider doing that and/or other examples. It's certainly a lacking area of chpeg as a project. |
See this issue yhirose/cpp-peglib#226 for a request/wish for a better user experience when developing/debugging grammars on the playground. The error message from chpeg is:
|
As a start, I created a calculator example here: A few things of note: This example incorporates a C bytecode file, generated in the Makefile. Without doing this (say, generating bytecode at runtime), it's going to be difficult to hook up the definition IDs in user code in order to process the AST nodes. I added a function chpeg_token to make it easier to process tokens:
calc takes input via stdin or file. The result printed by calc should match the same input sent to the unix bc command. |
Here is another interesting resource for |
I've just got this parser https://github.com/edubart/lpegrex/blob/main/parsers/c11.lua converted to
PEG/LEG (https://github.com/mingodad/peg) has this:
My implementation (see attached) has this:
Lpegrex (https://github.com/edubart/lpegrex) has this:
Can we have something like this in |
I am wondering would it be possible a VM based PEG parser to be made incremental? I.e. given a previous parse tree and a diff, is it possible a valid VM state at near point before the diff to be artificially constructed as a function of the tree and VM to be resumed in that state? |
See this project: https://github.com/zyedidia/gpeg and the papers listed under publications: https://github.com/zyedidia/gpeg#publications I have not explored incremental parsing myself. |
I'm looking into these things and thinking about how I might do symbol tables in chpeg. |
That would be a useful feature and that will allow parse at runtime more grammars (for instance C). |
While you are at it would be nice to have also |
Where is the link to the playground on the README ? |
I just add an online playground for my CocoR Typescript/Javascript port here https://mingodad.github.io/CocoR-Typescript/playground |
I just added |
Are you still around ?
Have you done some improvements to this project ?
I just found it and I'm impressed by it's simplicity and performance !
Thank you for your great work !
The text was updated successfully, but these errors were encountered: