Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer: Fix line continuation after punctuation #12

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ChristianStadelmann
Copy link

@ChristianStadelmann ChristianStadelmann commented Jan 19, 2021

Prior to this change, any line ending with [punctuation + '...'], for
example ||..., would cause the tokenizer to fail.

Fixes #9

@ChristianStadelmann
Copy link
Author

I know that this is a somewhat ugly hack because it goes back 3 positions. Feel free to reject or improve if you think this is not good.

@HaHeho
Copy link

HaHeho commented Jan 19, 2021

EDIT: I deleted my suggestion since it disregarded the AND operation.

Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.

@ChristianStadelmann
Copy link
Author

In your solution, wouldn't a ||... match the % a binary operator, followed by a unary operator: case and thus parse it as ||.. and .?

Also, even in a simple case like symbol = '.', it would be recognized as % a binary operator, followed by a unary operator and thus adding both an empty token and the .. I think the any in this case is conceptually wrong.

@HaHeho
Copy link

HaHeho commented Jan 19, 2021

In your solution, wouldn't a ||... match the % a binary operator, followed by a unary operator: case and thus parse it as ||.. and .?

Also, even in a simple case like symbol = '.', it would be recognized as % a binary operator, followed by a unary operator and thus adding both an empty token and the .. I think the any in this case is conceptually wrong.

I noticed that my solution did break things, yes.

Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.

Your version does throw it correctly. Sorry for the for the noise. :)

So your version works entirely as intended then, I suppose? The only question would be if the implementation could be improved.

@ChristianStadelmann
Copy link
Author

Yep, my implementation could definitely be improved. I won't do that today any more, though.

@HaHeho
Copy link

HaHeho commented Jan 19, 2021

I think it is good enough actually, since it allows the following statements to handle it right afterwards.

Just added some documentation and replaced the endsWith() to preserve compatibility with older Matlab versions.

                symbol = skip(punctuation);
                % ends with '...':
                % The '...' has to be unskipped and handled here in order
                % to not cause and error for line endings such as `+...` 
                % or `&&...`.
                if length(symbol) > 3 && strcmp(symbol(end-2:end), '...')
                    pos = pos - 3;
                    symbol = symbol(1:end-3);
                end
                % one operator:

@ChristianStadelmann
Copy link
Author

Actually, this (both your and mine approach) will still break if you write bad but perfectly valid Matlab code like &&..... (note the superfluous dots at the end). I've just created another approach where the tokenizer does not try to parse two operators at once (and jumps between parsing from left to right and from right to left): #13.

@ChristianStadelmann
Copy link
Author

I've updated this with your suggestions.

@HaHeho
Copy link

HaHeho commented Jan 20, 2021

I've updated this with your suggestions.

Is my preview of the commit wrong after the force-push or did you make a mistake? :)

Prior to this change, any line ending with [punctuation + '...'], for
example `||...`, would cause the tokenizer to fail.
@ChristianStadelmann
Copy link
Author

Is my preview of the commit wrong after the force-push or did you make a mistake? :)

I did. Forgot to commit+amend before pushing :-P

Now you should see your suggestion in the diff ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

&&... makes check throw an error
2 participants