Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Previous Token Information on Errors. #613

Closed
christianvoigt opened this issue Dec 7, 2017 · 19 comments
Closed

Previous Token Information on Errors. #613

christianvoigt opened this issue Dec 7, 2017 · 19 comments

Comments

@christianvoigt
Copy link
Contributor

Hi, for my VS Code linter I need the exact token locations for every parser error. If the error is an EarlyExitException and the token is of type chevrotain.EOF, the location properties of the token are all set to null. As a workaround I can check for this case in my linter and replace the location info in the error with the endLine and endColumn of the last token. But it would be nice if Chevrotain could do this for me.

@christianvoigt
Copy link
Contributor Author

christianvoigt commented Dec 7, 2017

For anyone having the same issue, the workaround requires to create a new token, as the token location properties are read-only. Use:

chevrotain.createTokenInstance(chevrotain.EOF, "", startOffset, endOffset, startLine, endLine, startColumn, endColumn);

@bd82
Copy link
Member

bd82 commented Dec 7, 2017

the EOF does not really have a location.
But what can be done is add another property to the EarlyExitError.
The previous (real) Token

https://github.com/SAP/chevrotain/blob/master/src/parse/parser_public.ts#L3083-L3085

      // old code
      throw this.SAVE_ERROR(new exceptions.EarlyExitException(msg, this.LA(1))

      // new code, LA(0) will return the previously successfully parsed Token.
      throw this.SAVE_ERROR(new exceptions.EarlyExitException(msg, this.LA(1), this.LA(0))

Of course the class of the EarlyExitError will also have to modified to add the new property.
https://github.com/SAP/chevrotain/blob/master/src/parse/errors_public.ts#L62-L82

@bd82
Copy link
Member

bd82 commented Dec 7, 2017

The one edge case is what if the error is for a completely empty text input.
In that case there would be no previously successfully parsed token...

So that edge case would have to be tested.

@bd82
Copy link
Member

bd82 commented Dec 7, 2017

This seems fairly trivial to implement, So I'll happily accept pull requests for this 😄
Or implement this myself when I get around to it (after 1.0.0 release and new web-page creation)

@christianvoigt
Copy link
Contributor Author

That is a good solution. I will add the changes today and send you a pull request.

The edge case will be no problem for the linter, as VS Code will put the error marker at the beginning of the file if the location is missing.

@bd82
Copy link
Member

bd82 commented Dec 8, 2017

The edge case will be no problem for the linter

We still need to check the edge case does not cause some weird runtime exception for Chevrotain.
I guess in the edge case both token properties should be EOF, so anyone using a Chevrotain parser
could easily identify it and handle it in their code.

@christianvoigt
Copy link
Contributor Author

ok, so I can simply check if LA(0) is null, right? If that is the case, I will use LA(1) twice. Or should I additionally check with the tokenMatcher if LA(1) is EOF?

@bd82
Copy link
Member

bd82 commented Dec 8, 2017

The cleanest solution would be up upgrade LA to handle this.

    protected LA(howMuch: number): IToken {
        // Maybe add another check here that: 
        // currIdx + HowMuch > 0
        if (this.tokVectorLength <= this.currIdx + howMuch) {
            return END_OF_FILE
        } else {
            return this.tokVector[this.currIdx + howMuch]
        }
    }

The problem is that it might actually cause a slight performance regression as LA(x) is used a-lot.
There are performance benchmarks here to inspect that
https://github.com/SAP/chevrotain/tree/master/benchmark_web

But they are not working 100% as i've made a lot of breaking changes in the upcoming 1.0.0 version.
(The benchmark compares dev version vs latest version, but that means both must have identical APIs)
So just start with the clean solution and I will hack it later if needed for performance reasons.

@christianvoigt
Copy link
Contributor Author

christianvoigt commented Dec 8, 2017

Sorry, I am not sure I understand your last comment. Our edge case is that this.tokVector is empty because the file was empty, right? In this case this.tokVectorLength is 0 and if I ask for LA(0) this.currIdx + howMuch will also be 0, so LA will return END_OF_FILE. This is already the behavior you wanted, isn't it? It seems to me I can simply use LA(0) for saving the previously parsed token without any further checks.

@bd82
Copy link
Member

bd82 commented Dec 8, 2017

In this case this.tokVectorLength is 0 and if I ask for LA(0) this.currIdx + howMuch will also be 0

this.currIdx starts at -1
Because normally you look at the next token so you call LA(1).
Which means on the first token you would need to access

this.tokVector[this.currIdx + howMuch] // -1 + 1 === 0

and the If condition would be:

this.tokVectorLength <= this.currIdx + howMuch // 0 <= -1 + 0

I know its confusing 😿

@christianvoigt
Copy link
Contributor Author

Ah ok, now I get it. Thank you for the explanation.

@christianvoigt
Copy link
Contributor Author

How about this:

        if (this.tokVectorLength === 0 || this.tokVectorLength <= this.currIdx + howMuch) {
            return END_OF_FILE
        } else {
            return this.tokVector[this.currIdx + howMuch]
        }

@christianvoigt
Copy link
Contributor Author

Mhm, while this would work in the edge case it might still cause problems if one calls LA(0) and you probably want a general solution. So I will simply do what you proposed. Sorry for thinking out loud.

@christianvoigt
Copy link
Contributor Author

ok, last question: What should happen if tokVectorLength > 0and this.currIdx + howMuch < 0? It would be misleading to return EOF in this case.

@bd82
Copy link
Member

bd82 commented Dec 8, 2017

I think it is still correct that it is EOF, just that the EOF is also the Start Of File, for an "empty" file.
Also consider that if LA(x) is EOF LA(x+1) is also EOF.
So it is kinda of a "null"/"undefined" for tokens.

I don't see a good alternative, creating a SOF virtual seems redundant, and returning null/undefined
seems even less informative.

@christianvoigt
Copy link
Contributor Author

Today I finally managed to update Chevrotain in my parser to 2.0.1 and remove the workaround I posted above. I made some tests and realized that the same issue can occur in MismatchedTokenException and NoViableAltException as well. I previously thought that error.token can only be an instance of EOF if the error is an EarlyExitException, but this is not the case. MismatchedTokenException and NoViableAltException can also have EOF as their token and will not contain any location information in these cases. I have not been able to produce such a case with a NotAllInputParsedException but maybe the same issue can occur there.

So I guess it would make sense to add error.previousToken for all types of exceptions, right?

@bd82 bd82 reopened this Feb 19, 2018
@bd82
Copy link
Member

bd82 commented Feb 23, 2018

Sorry for the delay, I knew I missed replying somewhere 😄

NotAllInputParsedException can never have EOF as its token as the condition to throw it requires that
the next token is not EOF.

Anyhow it makes sense to expend whatever solution you implemented to those other Exceptions types as well...

@bd82 bd82 closed this as completed May 10, 2018
@christianvoigt
Copy link
Contributor Author

Hi, have you added the previousToken information for MismatchedTokenException and NoViableAltException?

Or did you expect a pull request from me and closed the issue because I did not send you one (to be honest, I forgot this issue)?

@bd82
Copy link
Member

bd82 commented May 10, 2018

Actually I was pruning the backlog and thought this issue was closed as I did not re-read the whole conversation. I will re-open it until once of us will get around to finishing it. 😄

@bd82 bd82 reopened this May 10, 2018
christianvoigt added a commit to christianvoigt/chevrotain that referenced this issue Jun 6, 2018
christianvoigt added a commit to christianvoigt/chevrotain that referenced this issue Jun 6, 2018
@bd82 bd82 closed this as completed in c4e721f Jun 6, 2018
@bd82 bd82 changed the title Missing token location in EarlyExitException if token is EOF Previous Token Information on Errors. Jun 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants