Tokenizer: Fix line continuation after punctuation #12

ChristianStadelmann · 2021-01-19T17:09:39Z

Prior to this change, any line ending with [punctuation + '...'], for
example ||..., would cause the tokenizer to fail.

Fixes #9

ChristianStadelmann · 2021-01-19T17:10:22Z

I know that this is a somewhat ugly hack because it goes back 3 positions. Feel free to reject or improve if you think this is not good.

HaHeho · 2021-01-19T17:52:05Z

EDIT: I deleted my suggestion since it disregarded the AND operation.

Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.

ChristianStadelmann · 2021-01-19T18:05:59Z

In your solution, wouldn't a ||... match the % a binary operator, followed by a unary operator: case and thus parse it as ||.. and .?

Also, even in a simple case like symbol = '.', it would be recognized as % a binary operator, followed by a unary operator and thus adding both an empty token and the .. I think the any in this case is conceptually wrong.

HaHeho · 2021-01-19T18:11:05Z

In your solution, wouldn't a ||... match the % a binary operator, followed by a unary operator: case and thus parse it as ||.. and .?

Also, even in a simple case like symbol = '.', it would be recognized as % a binary operator, followed by a unary operator and thus adding both an empty token and the .. I think the any in this case is conceptually wrong.

I noticed that my solution did break things, yes.

Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.

Your version does throw it correctly. Sorry for the for the noise. :)

So your version works entirely as intended then, I suppose? The only question would be if the implementation could be improved.

ChristianStadelmann · 2021-01-19T18:22:52Z

Yep, my implementation could definitely be improved. I won't do that today any more, though.

HaHeho · 2021-01-19T18:51:32Z

I think it is good enough actually, since it allows the following statements to handle it right afterwards.

Just added some documentation and replaced the endsWith() to preserve compatibility with older Matlab versions.

                symbol = skip(punctuation);
                % ends with '...':
                % The '...' has to be unskipped and handled here in order
                % to not cause and error for line endings such as `+...` 
                % or `&&...`.
                if length(symbol) > 3 && strcmp(symbol(end-2:end), '...')
                    pos = pos - 3;
                    symbol = symbol(1:end-3);
                end
                % one operator:

ChristianStadelmann · 2021-01-19T22:52:20Z

Actually, this (both your and mine approach) will still break if you write bad but perfectly valid Matlab code like &&..... (note the superfluous dots at the end). I've just created another approach where the tokenizer does not try to parse two operators at once (and jumps between parsing from left to right and from right to left): #13.

ChristianStadelmann · 2021-01-19T22:59:20Z

I've updated this with your suggestions.

HaHeho · 2021-01-20T00:12:12Z

I've updated this with your suggestions.

Is my preview of the commit wrong after the force-push or did you make a mistake? :)

Prior to this change, any line ending with [punctuation + '...'], for example `||...`, would cause the tokenizer to fail.

ChristianStadelmann · 2021-01-20T08:42:39Z

Is my preview of the commit wrong after the force-push or did you make a mistake? :)

I did. Forgot to commit+amend before pushing :-P

Now you should see your suggestion in the diff ;-)

ChristianStadelmann mentioned this pull request Jan 19, 2021

&&... makes check throw an error #9

Closed

ChristianStadelmann mentioned this pull request Jan 19, 2021

Tokenizer: Parse operators one at a time #13

Open

ChristianStadelmann force-pushed the patch-1 branch from 18cf851 to e71780e Compare January 19, 2021 22:58

Tokenizer: Fix line continuation after punctuation

1b08e08

Prior to this change, any line ending with [punctuation + '...'], for example `||...`, would cause the tokenizer to fail.

ChristianStadelmann force-pushed the patch-1 branch from e71780e to 1b08e08 Compare January 20, 2021 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenizer: Fix line continuation after punctuation #12

Tokenizer: Fix line continuation after punctuation #12

ChristianStadelmann commented Jan 19, 2021 •

edited

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 19, 2021 •

edited

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 19, 2021

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 19, 2021

ChristianStadelmann commented Jan 19, 2021

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 20, 2021

ChristianStadelmann commented Jan 20, 2021

Tokenizer: Fix line continuation after punctuation #12

Are you sure you want to change the base?

Tokenizer: Fix line continuation after punctuation #12

Conversation

ChristianStadelmann commented Jan 19, 2021 • edited

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 19, 2021 • edited

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 19, 2021

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 19, 2021

ChristianStadelmann commented Jan 19, 2021

ChristianStadelmann commented Jan 19, 2021

HaHeho commented Jan 20, 2021

ChristianStadelmann commented Jan 20, 2021

ChristianStadelmann commented Jan 19, 2021 •

edited

HaHeho commented Jan 19, 2021 •

edited