Keeping error messages relevant while parsing with GenericToken and IgnoreEOL = true #177

mcclown · 2020-05-23T06:09:07Z

Hi, great project, thanks for creating it.

I'm implementing a parser for a language where statements are delimited by EOL.

I've set Lexer(IgnoreEOL), added a GenericToken.SugarToken("\r\n") to my enum and handled EOL appropriately in my Parser class.

I'm hitting an issue when there are errors in the ParseResult. The errors are interpreting my input code as if there are no lines, all code is on line 0 and all lines have been concatenated onto one line. Is there a better way to handle this, to make sure the error messages are still possible to follow?

Here are some snippets from my code, to show what I've done.

    public class Parser
    {

        [Production("sequence: statement*")]
        public void Sequence(List<TurnipRoot> statements)
        { 
        }

        [Production("statement: declaration (EOL)+ [d]")]
        [Production("statement: rule (EOL)+ [d]")]
        [Production("statement: override (EOL)+ [d]")]
        public void Statement(TurnipRoot statement)
        {

        }

...

    [Lexer(IgnoreEOL = false)]
    public enum Token 
    {
        [Lexeme(GenericToken.SugarToken, "\r\n")]
        EOL,
...

b3b00 · 2020-05-25T08:37:02Z

that is quite normal as if you choose to manage EOL by yourself CSLy's lexer has no way to know is hitting a new line and then increment the line counter : ineed it 's "thinking" that all the source code is one line.
maybe one way to manage it would be to tag EOL tokens as ... end of lines. then lexer would be able to increment line counter, and maybe by some more hacky way the column counter.

For your language this would be something like :

[Lexer(IgnoreEOL = false)]
    public enum Token 
    {
        [Lexeme(GenericToken.SugarToken, "\r\n", isEndOfLine:true)]
        EOL,

What do you think about it ?

This is a major change to the lexer (both Generic and Regex should be modified).

b3b00 · 2020-05-25T09:06:47Z

I've started looking at it, I have not much time right now so you'll probably have to wait a little.

b3b00 · 2020-05-25T09:22:47Z

In fact there already is a IslineEnding parameter on Lexeme attribute. but it does not magane correctly the line counter.

b3b00 · 2020-05-25T10:03:26Z

You can start to test a fix with branch feature/#177-error-messages-and-EOL-tokens feature
It manages line number but column numbers still need some additional work

your lexer should look like

[Lexer(IgnoreEOL = false)]
    public enum Token 
    {
        [Lexeme(GenericToken.SugarToken, "\r\n", IsLineEnding:true)]
        EOL,

mcclown · 2020-05-25T10:34:22Z

Thanks mate! That really helps with working through errors in my parser. I'll test it out.

b3b00 · 2020-05-25T12:24:27Z

@mcclown , i've just pushed a better fix for line and column computation when eol are not ignored. You can check on branch feature/#177-error-messages-and-EOL-tokens feature

I will wait for your approval to close the issue as I don't have a "real world" parser to check it complete. By the way would you mind sharing your parser ? I am always interesting in the ways CSLY is used.

mcclown · 2020-05-25T15:13:28Z

That's working as expected, thank you for the quick turnaround. I'll send you an email with some details of my parser.

mcclown · 2020-05-25T15:14:13Z

Woops, didn't mean to close that until the fix had been merged. Sorry!

mcclown · 2020-05-25T15:17:59Z

Just noticed one thing, the line numbers are 0 indexed. ie. an error on line 7 shows an error saying it happened on line 6.

b3b00 · 2020-05-26T06:21:25Z

Yes indeed, generic lexer is 0 based (both lines and columns). I think that's not an issue and csly client can manage the shift easily if needed.
Now merging.

…om:b3b00/csly into feature/#177-error-messages-and-EOL-tokens * 'feature/#177-error-messages-and-EOL-tokens' of github.com:b3b00/csly: bugfix #175 : column counter

b3b00 · 2020-05-26T07:33:19Z

appveyor is failing on this branch for some mysterious reasons.... I will look at it but for now your only way to get it is to use a CSLY clone. Sorry for the inconvenience.

b3b00 · 2020-05-26T10:04:21Z

@mcclown new nuget available as 2.6.0

mcclown changed the title ~~Keeping error messages relevant white parsing with GenericToken and IgnoreEOL = true~~ Keeping error messages relevant while parsing with GenericToken and IgnoreEOL = true May 25, 2020

mcclown closed this as completed May 25, 2020

mcclown reopened this May 25, 2020

b3b00 closed this as completed May 26, 2020

b3b00 added a commit that referenced this issue May 26, 2020

bugfix #177 column counter

fa98ed6

b3b00 added a commit that referenced this issue May 26, 2020

merge #177

4346e50

b3b00 added a commit that referenced this issue Jun 5, 2020

fix unit test for issue #177 : make it pass on linux and windows

7b38c0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keeping error messages relevant while parsing with GenericToken and IgnoreEOL = true #177

Keeping error messages relevant while parsing with GenericToken and IgnoreEOL = true #177

mcclown commented May 23, 2020

b3b00 commented May 25, 2020

b3b00 commented May 25, 2020

b3b00 commented May 25, 2020

b3b00 commented May 25, 2020 •

edited

mcclown commented May 25, 2020

b3b00 commented May 25, 2020

mcclown commented May 25, 2020

mcclown commented May 25, 2020

mcclown commented May 25, 2020

b3b00 commented May 26, 2020

b3b00 commented May 26, 2020

b3b00 commented May 26, 2020

Keeping error messages relevant while parsing with GenericToken and IgnoreEOL = true #177

Keeping error messages relevant while parsing with GenericToken and IgnoreEOL = true #177

Comments

mcclown commented May 23, 2020

b3b00 commented May 25, 2020

b3b00 commented May 25, 2020

b3b00 commented May 25, 2020

b3b00 commented May 25, 2020 • edited

mcclown commented May 25, 2020

b3b00 commented May 25, 2020

mcclown commented May 25, 2020

mcclown commented May 25, 2020

mcclown commented May 25, 2020

b3b00 commented May 26, 2020

b3b00 commented May 26, 2020

b3b00 commented May 26, 2020

b3b00 commented May 25, 2020 •

edited