Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TokenMatcher parse errors are not exceptions #153

Open
6 tasks
ehuelsmann opened this issue Aug 17, 2023 · 0 comments
Open
6 tasks

TokenMatcher parse errors are not exceptions #153

ehuelsmann opened this issue Aug 17, 2023 · 0 comments

Comments

@ehuelsmann
Copy link
Contributor

The problem

Before 9902c7d, Ruby, Java and JavaScript would return the same error multiple times in a single parse invocation. This wasn't in line with the Go implementation: In Go, if there was an error with, e..g., a tag line, that error would be reported exactly once.

The cause

The token matcher returns a boolean value indicating whether the token is of the requested type (match_*). If the token matcher determines a token is of a specific type, it needs to return true. At the same time, while the type matches, there may be an issue parsing the token content. This happens where the line is a tagline with invalid tags or language selection tokens with non-existing languages. In order for the parser to be able to report the error after the parse, the token matcher adds any errors to the parser context before returning.

There lies the problem: a token may be given to the token matcher multiple times before it is finally accepted by the parser and added to the AST (notably due to lookahead functionality). Each time the problematic token is parsed, the error is added to the parser context by the token matcher, regardless of whether the token will end up in the AST (meaning that it's unclear if the error actually matters).

The proposed solution

In PR #152, the Perl code changes the interface contract of the token matcher: it will return 2 values. First the acceptance boolean and second a parser error if it occurred. Subsequently, the lookahead code is designed to look only at the acceptance boolean and the AST building functions match_token_at_* receive the error, adding it to the parser's error list when the token is accepted into the AST.

The design is better this way because exceptions should be used to indicate exceptional (unexpected) situations and return values should be used for "regular business". Parse errors - although being errors - are really in the "regular business" of a parser, meaning that the error should be returned from the token matcher to the parser which can deal with it as part of its regular processing.

This issue can be closed when

  • The community decides this is the way to go forward
  • The mainstream languages have been coded to follow the new pattern
    • Java
    • JavaScript/TypeScript
    • Ruby
    • (Others?)

This item was created after discussion during the community zoom meeting on 17 Aug 2023.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant