🚧 Update parser to support look-ahead of two (or more) tokens #9

atifaziz · 2022-07-12T16:51:14Z

This PR addresses the feature described in issue #7. It updates the parser to use an adaptive strategy depending on whether the parser is asked to look-ahead one, two or more tokens. This is done by abstracting over the storage used for stack operations to push and pop tokens and then using an optimal storage for grammars that require just look-ahead of one or two tokens. This avoids using a single and general storage class such as an array as well as paying the penalty of a heap allocation for the most common cases. Like it is now, for a single peek, the parser only uses a single field that's an optional token (or optuple). For peeking up to, it uses a triplet of count and two tokens. Beyond that, it moves to using a Stack<>.

NU5118 error: File 'FILE1' is not added because the package already contains file 'FILE2'

atifaziz · 2022-07-12T17:08:16Z

@springcomp Below are my replies to your review/comments:

@atifaziz here is a quick PR on top of your multi-peek branch.

First of all, thanks for the quick turnaround on the feedback.

My question is what is the purpose of the default case in the Peek() method ? It seems to me that support for more than two look-ahead tokens is guaranteed to throw an exception.

I added a simple test illustrate this.

You found an incomplete implementation and this PR is still in draft. I think the overall direction is there, but it needs some polish and tests. It would also help to get your feedback on whether it helps your case.

I wanted to optimise for the common case of look-aheads of one or two tokens. For the less common one, I plan to implement one based on a regular and heap-allocated stack.

I like the ITokenStream abstraction that as been introduced.

Glad you like it!

As a general question why limit the implementation to only two tokens? I guess there should not be a grammar that needs more than two tokens of lookahead.

It's not limited. I am just not done. Again, you discovered my PR that was still work-in-progress.

Also, it seems that maybe the design you chose is extensible to more than two tokens of lookahead in the future, maybe by introducing a future ThreeTokenStackOps, etc.

I think I will only optimise for peeking up to two tokens. For more complex and rare grammars, I'll probably just go with a regular stack but that's a decision that can be changed with time. The point is that the design is adaptive.

But what lead you to this design, with two different StoreStackOps implementations? Is this for performance reason ? Is this in an attempt to prevent more upfront allocations for multiple tokens if not eventually necessary?

Yes, exactly! You pay for the rare cases, but not for the common ones. The only new cost for the common cases is a virtual dispatch for the stack operations.

codecov-commenter · 2022-07-12T17:10:45Z

Codecov Report

Merging #9 (a3cae65) into master (65b8da4) will increase coverage by 4.20%.
The diff coverage is 92.42%.

@@            Coverage Diff             @@
##           master       #9      +/-   ##
==========================================
+ Coverage   88.76%   92.96%   +4.20%     
==========================================
  Files           1        1              
  Lines          89      199     +110     
==========================================
+ Hits           79      185     +106     
- Misses         10       14       +4

Flag	Coverage Δ
unittests	`92.96% <92.42%> (+4.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/Parser.cs	`92.96% <92.42%> (+4.20%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65b8da4...a3cae65. Read the comment docs.

springcomp · 2022-07-12T17:21:34Z

It would also help to get your feedback on whether it helps your case.

It totally does! In fact, I revisited my parser to make sure it did not require more than two tokens, this time. And I managed to do it. I suspect we will never need more than two, but you answered my question. Gratt will support anything.

You anwsered all my questions with the PR summary and follow up text. Using an adaptive strategy is a great design. I’m still getting used to C#’s pattern matching feature which evolved a lot with different versions of the language. That and the heavily generic nature of the library make it sometimes hard to exactly comprehend what’s going on.

But I was able to debug step by step and see for myself the switch from a OnTokenStackOps to a TwoTokenStackOps.

I’m looking forward to consuming the future NuGet once released.

On another note, after extensive work on redesigning the original JMESPath parser using Gratt and a hand-crafter automaton-based regex lexer, I could find no improvements in performance 😞. I also started a System.Text.Json-based JMESPath expression evaluator and, again, could not find any performance improvements using early tests. I wonder If I’m not missing something obvious. 🤔

Anyway, I learned so much by doing this that it will have been not for nothing.

atifaziz · 2022-07-12T17:22:07Z

My question is what is the purpose of the default case in the Peek() method ? It seems to me that support for more than two look-ahead tokens is guaranteed to throw an exception.

I added a simple test illustrate this.

@springcomp So see 25abe36 for the resolution where I added a stack-backed strategy when peeking into 3+ tokens.

atifaziz · 2022-07-12T17:31:25Z

the heavily generic nature of the library make it sometimes hard to exactly comprehend what’s going on.

I can sympathize with that! I like to solve problems once, so it naturally leads to a more general design and generic code. In the case of Gratt, the parsing is mostly algorithmic so it can be made independent of the types and since there are many types involved in the data model (precedence, token, token kind, result, etc), it does make definitions long to read. What I've found is that aliases can help:

Gratt/tests/CSharpPreprocessorExpression.cs

Lines 24 to 26 in 65b8da4

    
           using Parser = Gratt.Parser<ParseContext, TokenKind, Token<TokenKind>, Precedence, bool>; 
        
           using PrefixParselet = System.Func<Token<TokenKind>, Gratt.Parser<ParseContext, TokenKind, Token<TokenKind>, Precedence, bool>, bool>; 
        
           using InfixParselet = System.Func<Token<TokenKind>, bool, Gratt.Parser<ParseContext, TokenKind, Token<TokenKind>, Precedence, bool>, bool>;

On another note, after extensive work on redesigning the original JMESPath parser using Gratt and a hand-crafter automaton-based regex lexer, I could find no improvements in performance 😞. I also started a System.Text.Json-based JMESPath expression evaluator and, again, could not find any performance improvements using early tests. I wonder If I’m not missing something obvious. 🤔

Sorry to hear about that. Unfortunately, I do not have the cycles right now to help there. I hope it's something simple and you find it soon. I can, however, commit to this PR and releasing a new version of Gratt in the coming weeks, as I find small bits of time.

atifaziz added 13 commits July 9, 2022 19:08

Extract interfaces for lexer & token stream

6731db5

Add buffering strategy to token stream

1f8d829

Add two-token look-ahead buffer

6937552

Use record for two-token buffer

53d8eee

Bring back "Parser.Unread" publicly

62de456

Make peeking dynamic & internal to Parser

3bd6a3c

Nest token stream implementation

c93a3f4

Rename "TokenBuffer<,>.Init" to "Default"

12968d0

Clean-up token buffers (naming, visibility, etc.)

f9c0c7b

Hide stack ops implementations under "TokenStream"

a8d0949

Fix NU5118 errors by downgrading to Nullable 1.3.0

beb7777

NU5118 error: File 'FILE1' is not added because the package already contains file 'FILE2'

Fix ref error in doc comment

be67083

Add blank line after doc comment (consistency)

5801d5b

atifaziz mentioned this pull request Jul 12, 2022

Feedback on new support for more than one look-ahead tokens #8

Closed

[review] Add test for exceeded lookahead tokens.

16b6851

atifaziz force-pushed the multi-peek branch from ad82520 to f416387 Compare July 12, 2022 17:17

Implement 3+ token peeking storage strategy

25abe36

atifaziz force-pushed the multi-peek branch from f416387 to 25abe36 Compare July 12, 2022 17:18

Remove debug assertion

a3cae65

atifaziz linked an issue Jul 12, 2022 that may be closed by this pull request

Support look-ahead of two tokens #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 Update parser to support look-ahead of two (or more) tokens #9

🚧 Update parser to support look-ahead of two (or more) tokens #9

atifaziz commented Jul 12, 2022

atifaziz commented Jul 12, 2022 •

edited

Loading

codecov-commenter commented Jul 12, 2022 •

edited

Loading

springcomp commented Jul 12, 2022

atifaziz commented Jul 12, 2022

atifaziz commented Jul 12, 2022 •

edited

Loading

🚧 Update parser to support look-ahead of two (or more) tokens #9

Are you sure you want to change the base?

🚧 Update parser to support look-ahead of two (or more) tokens #9

Conversation

atifaziz commented Jul 12, 2022

atifaziz commented Jul 12, 2022 • edited Loading

codecov-commenter commented Jul 12, 2022 • edited Loading

Codecov Report

springcomp commented Jul 12, 2022

atifaziz commented Jul 12, 2022

atifaziz commented Jul 12, 2022 • edited Loading

atifaziz commented Jul 12, 2022 •

edited

Loading

codecov-commenter commented Jul 12, 2022 •

edited

Loading

atifaziz commented Jul 12, 2022 •

edited

Loading