-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packrat algorithm #21
Comments
Using the same example shown here #4 (comment) I'm getting the same behavior as
|
Indeed. I tested the same thing after reading your test results for |
That's why I said that I have mixed feelings about |
Yeah I am having (and have had) mixed feelings about packrat, and it can certainly hide bad grammars, but I still think it's useful to have available. Perhaps optimal was not the best word choice - I haven't really analyzed it, but I'm thinking if packrat makes something worse it might already be in fairly good shape. The ISUCC/IFAIL ratio looks good and it seems pretty fast for the input size. 5742685 inst / 451294 bytes = 12.72 instructions per input byte. Maybe a good stat to add. |
Definitely a good one ! |
I adding windowing to the packrat branch. It does do the job of reducing memory use; however, I have not yet found an example that doesn't trade window size for speed.
Note the farthest backtrack stat. Speed doesn't improve much until the window covers that amount. |
After you've mentioned the
Then I've tried to test on Some rules have a multiplier/cascade effect and would be nice to have a way to detect/show then. Another possibility that I was thinking is to have a way to specify where (rule) we want the |
I'm trying several tweaks in hope I'll find the big multiplier:
After some tweaks:
|
I added the option {P} to enable a node to be cached in packrat. If no such rules exist, all nodes will be cached as before. |
Just add P into the options of a rule like:
|
Do your tweaks improve:
I think if you can get the backtracking to fit into the sliding packrat window (which always sits at the furthest offset reached), then performance and memory use should both be good. Of course it's also possible that with such grammar changes, it'll be fine without packrat. |
I'm also pondering more memory efficient packrat storage. The current implementation uses a table much like the "Packrat Parsing with Elastic Sliding Window" paper you linked at https://www.jstage.jst.go.jp/article/ipsjjip/23/4/23_505/_pdf , but instead of recording length, it links to PNode wrapped parse sub-trees. There is a compile-time node limit to prevent it from caching huge sub-trees - perhaps I should expose this as runtime option. I'm not sure it's worth the trouble of making the lookup table more memory efficient or not, at the cost of more overhead. It'd have to be something like a linked list or b-tree, rb-tree, etc., for each window position. |
I guess I could also try the elastic sliding window - still trying to wrap my head around that one. The current implementation is the basic sliding window mentioned in the paper. |
Here is the tweaked c99-mouse grammar: |
From the other discussion: I tried raising the packrat node limit using
Perhaps the default CHPEG_PACKRAT_NODE_LIMIT should be changed. Also this could be made a runtime settable option. This is the total count of nodes allowed (recursive, entire sub-tree) in a single packrat cache entry. Either simplification (-s1, -s2) helps performance because it helps reduce the node count and node copy overhead (when packrat cache is used the result has to be copied to the parse tree). |
Here is the kotlin grammars I'm using, also see this comment yhirose/cpp-peglib#213 (comment) to see if you can reproduce it. |
Results: (this is with -DCHPEG_PACKRAT_NODE_LIMIT=64)
|
It seems that somehow |
I made
|
If you haven't been following commits, I also added another binary, chpeg-profile, that enables profiling but not VM_TRACE/VM_PRINT_TREE and -g like chpeg-trace. It's almost as fast as the plain |
I'd definitely be interested in this tutorial. I haven't really attempted such things myself, and I wouldn't know where to begin automating any part of it. |
For example simple metrics applied to text like word count, ideally in a grammar an identifier would appear once (except named punctuation):
Then we can see that the identifier
|
I did a mistake when count the usage of |
For example removing the
Now removing the
Now replacing all
|
Continuing looking at this kotlin grammar I can see that it was blindly done, originally it has a parser and lexer so on that context things like this make sense:
Because on some contexts the
|
I've implemented the Packrat algorithm here: https://github.com/ChrisHixon/chpeg/tree/packrat
TODO: windowing to reduce memory usage.
The text was updated successfully, but these errors were encountered: