Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes, improvements, optimization & refactoring before parser generation #288

Merged

Conversation

petee-d
Copy link
Contributor

@petee-d petee-d commented Dec 5, 2022

This PR contains a whole bunch of changes I did to the core code in order to implement code generation, but doesn't contain code generation itself.

  • The first 5 commits are bug fixes or improvements that slightly improve the behavior of participle (especially when handling errors). I came across these issues when implementing the code generation and considered it better to change participle's behavior than to implement the weird behavior to the generated code.
  • The next 3 commits (starting with d03a0d2) are optimizations that either will also help the generated code, or in the case of string accumulation, was just an easy win to make just the reflective parser faster.
    • Thrift benchmark had 1903 allocations before these, 1818 after these. The CPU improvement was much smaller (<2%), but IMHO the allocations make it worthwhile.
  • The next 3 commits (starting with bf35800) do some small refactoring that may make generating code easier or are just generally nice.
  • The last 2 commits (starting with af8a3d3) are changes that are not useful on its own but will be useful for the code generation. The number of strct usages will be useful for inlining of generated code (no separate function is created for structs only used in one place) and the UnexpectedTokenError.Expect thing allows creating the error without having access to the parser nodes.

The next PR will be code generation itself (#213).

Peter Dolak and others added 13 commits December 5, 2022 10:23
Both strconv.ParseInt and strconv.ParseFloat already report the input.
The word invalid isn't always true - it could also just be out of range.
The type is contained within the Parse* function being called. So as the
extra context added nothing useful, I decided to remove it altogether,
making things much easier for the generated parser.
If an optional group within a capture matches no input (either because
of an unexpected token or EOF), obtaining partial AST can cause
`setField` to be called with no tokens, causing a panic. Fixed by not
doing anything.
It didn't properly escape the token, the error would be malformed if the
token contained single quotes. It also used a custom error instead of
`UnexpectedTokenError` for no good reason.
The last error in the added test explains the difference. Without this
change, it would have been `1:15: unexpected token "(" (expected ";")`,
which would be very confusing as '(' was actually allowed after the
class name.

After this change, it's `1:20: unexpected token ")" (expected <ident>)`,
pointing out the real problem - that an identifier was expected after
the comma.
…ction

Optimize finding case insensitive tokens to be done at parser construction

Doing it before each parsing was unnecessary. One could worry the
results would be different if `ParseFromLexer` was called with a
different lexer than used to build the grammar, but then case-insentive
tokens would be the least of our problems - literal & reference would
check agaist the wrong TokenType.
Not very significant for the reflective parser, more significant for the
generated one.
The generated parser would be the second place that would construct a
temporary disjunction in the implementation, I think it might as well
contain the disjunction directly.
They could never be created and were private, so this certainly can't
break anything.
It was n probably because I copied it from negation and assumed n was
for node. Naming it l is more consistent with the convention for other
nodes.
@@ -1874,11 +1874,11 @@ func TestParseNumbers(t *testing.T) {
}
parser := participle.MustBuild[grammar]()
_, err := parser.ParseString("", `300 0 x`)
require.EqualError(t, err, `grammar.Int: invalid integer "300": strconv.ParseInt: parsing "300": value out of range`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alecthomas I included the previous commit basically just to make it visible what the error message looked like before. IMHO none of the context being added was actually useful and just duplicated the stuff deeper in the error.

The main reason why I removed it had to do with generated parser performance - my parser avoids allocations for errors as much as possible as most errors are later suppressed, so it just collects some error information while parsing and then only allocates the final error once parsing is done. This made that approach tricky as I would need to depend on fmt just because of these int/uint/float errors and allocate extra stuff that might later be thrown away. Hope it makes sense.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense 👍

Copy link
Owner

@alecthomas alecthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM 👍

@@ -1874,11 +1874,11 @@ func TestParseNumbers(t *testing.T) {
}
parser := participle.MustBuild[grammar]()
_, err := parser.ParseString("", `300 0 x`)
require.EqualError(t, err, `grammar.Int: invalid integer "300": strconv.ParseInt: parsing "300": value out of range`)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense 👍

error.go Show resolved Hide resolved
@alecthomas alecthomas merged commit 1f05da7 into alecthomas:master Dec 5, 2022
@alecthomas
Copy link
Owner

alecthomas commented Dec 5, 2022

Hmm I realised I could have just rebase merged this. I'm so used to most multi-commit PRs being full of garbage commit messages that I just squash by default now. Will keep that in mind in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants