Bug fixes, improvements, optimization & refactoring before parser generation #288

petee-d · 2022-12-05T09:37:44Z

This PR contains a whole bunch of changes I did to the core code in order to implement code generation, but doesn't contain code generation itself.

The first 5 commits are bug fixes or improvements that slightly improve the behavior of participle (especially when handling errors). I came across these issues when implementing the code generation and considered it better to change participle's behavior than to implement the weird behavior to the generated code.
The next 3 commits (starting with d03a0d2) are optimizations that either will also help the generated code, or in the case of string accumulation, was just an easy win to make just the reflective parser faster.
- Thrift benchmark had 1903 allocations before these, 1818 after these. The CPU improvement was much smaller (<2%), but IMHO the allocations make it worthwhile.
The next 3 commits (starting with bf35800) do some small refactoring that may make generating code easier or are just generally nice.
The last 2 commits (starting with af8a3d3) are changes that are not useful on its own but will be useful for the code generation. The number of strct usages will be useful for inlining of generated code (no separate function is created for structs only used in one place) and the UnexpectedTokenError.Expect thing allows creating the error without having access to the parser nodes.

The next PR will be code generation itself (#213).

Both strconv.ParseInt and strconv.ParseFloat already report the input. The word invalid isn't always true - it could also just be out of range. The type is contained within the Parse* function being called. So as the extra context added nothing useful, I decided to remove it altogether, making things much easier for the generated parser.

If an optional group within a capture matches no input (either because of an unexpected token or EOF), obtaining partial AST can cause `setField` to be called with no tokens, causing a panic. Fixed by not doing anything.

It didn't properly escape the token, the error would be malformed if the token contained single quotes. It also used a custom error instead of `UnexpectedTokenError` for no good reason.

The last error in the added test explains the difference. Without this change, it would have been `1:15: unexpected token "(" (expected ";")`, which would be very confusing as '(' was actually allowed after the class name. After this change, it's `1:20: unexpected token ")" (expected <ident>)`, pointing out the real problem - that an identifier was expected after the comma.

…ction Optimize finding case insensitive tokens to be done at parser construction Doing it before each parsing was unnecessary. One could worry the results would be different if `ParseFromLexer` was called with a different lexer than used to build the grammar, but then case-insentive tokens would be the least of our problems - literal & reference would check agaist the wrong TokenType.

Not very significant for the reflective parser, more significant for the generated one.

The generated parser would be the second place that would construct a temporary disjunction in the implementation, I think it might as well contain the disjunction directly.

They could never be created and were private, so this certainly can't break anything.

It was n probably because I copied it from negation and assumed n was for node. Naming it l is more consistent with the convention for other nodes.

petee-d · 2022-12-05T09:43:13Z

parser_test.go

@@ -1874,11 +1874,11 @@ func TestParseNumbers(t *testing.T) {
 	}
 	parser := participle.MustBuild[grammar]()
 	_, err := parser.ParseString("", `300 0 x`)
-	require.EqualError(t, err, `grammar.Int: invalid integer "300": strconv.ParseInt: parsing "300": value out of range`)


@alecthomas I included the previous commit basically just to make it visible what the error message looked like before. IMHO none of the context being added was actually useful and just duplicated the stuff deeper in the error.

The main reason why I removed it had to do with generated parser performance - my parser avoids allocations for errors as much as possible as most errors are later suppressed, so it just collects some error information while parsing and then only allocates the final error once parsing is done. This made that approach tricky as I would need to depend on fmt just because of these int/uint/float errors and allocate extra stuff that might later be thrown away. Hope it makes sense.

Yep, makes sense 👍

alecthomas

Overall LGTM 👍

alecthomas · 2022-12-05T10:03:25Z

parser_test.go

@@ -1874,11 +1874,11 @@ func TestParseNumbers(t *testing.T) {
 	}
 	parser := participle.MustBuild[grammar]()
 	_, err := parser.ParseString("", `300 0 x`)
-	require.EqualError(t, err, `grammar.Int: invalid integer "300": strconv.ParseInt: parsing "300": value out of range`)


Yep, makes sense 👍

error.go

alecthomas · 2022-12-05T10:15:16Z

Hmm I realised I could have just rebase merged this. I'm so used to most multi-commit PRs being full of garbage commit messages that I just squash by default now. Will keep that in mind in future.

Peter Dolak and others added 13 commits December 5, 2022 10:23

Fix bad float conformation error message + add coverage

5fa25c2

Fix possible panic when optional groups are used within a capture

ad1557e

If an optional group within a capture matches no input (either because of an unexpected token or EOF), obtaining partial AST can cause `setField` to be called with no tokens, causing a panic. Fixed by not doing anything.

Improve mismatch error message for negation & negative lookahead group

5e400b6

It didn't properly escape the token, the error would be malformed if the token contained single quotes. It also used a custom error instead of `UnexpectedTokenError` for no good reason.

Optimize string field accumulation to do fewer allocations

d03a0d2

Avoid an allocation in newParseContext

bb4e748

Not very significant for the reflective parser, more significant for the generated one.

Include explicit disjunction in the union node

bf35800

The generated parser would be the second place that would construct a temporary disjunction in the implementation, I think it might as well contain the disjunction directly.

Remove unused optional and repetition grammar nodes

9aefebf

They could never be created and were private, so this certainly can't break anything.

Change the receiver name of lookaheadGroup from n to l

8854a6d

It was n probably because I copied it from negation and assumed n was for node. Naming it l is more consistent with the convention for other nodes.

Keep track of the number of usages of a strct node

af8a3d3

Allow specifying UnexpectedTokenError expected sequence as a string

a61174d

petee-d commented Dec 5, 2022

View reviewed changes

alecthomas approved these changes Dec 5, 2022

View reviewed changes

alecthomas merged commit 1f05da7 into alecthomas:master Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes, improvements, optimization & refactoring before parser generation #288

Bug fixes, improvements, optimization & refactoring before parser generation #288

petee-d commented Dec 5, 2022 •

edited

petee-d Dec 5, 2022

alecthomas Dec 5, 2022

alecthomas left a comment

alecthomas Dec 5, 2022

alecthomas commented Dec 5, 2022 •

edited

Bug fixes, improvements, optimization & refactoring before parser generation #288

Bug fixes, improvements, optimization & refactoring before parser generation #288

Conversation

petee-d commented Dec 5, 2022 • edited

petee-d Dec 5, 2022

Choose a reason for hiding this comment

alecthomas Dec 5, 2022

Choose a reason for hiding this comment

alecthomas left a comment

Choose a reason for hiding this comment

alecthomas Dec 5, 2022

Choose a reason for hiding this comment

alecthomas commented Dec 5, 2022 • edited

petee-d commented Dec 5, 2022 •

edited

alecthomas commented Dec 5, 2022 •

edited