-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nonsensical code in Whitespace->__TOKENIZER__on_char #10
Comments
The tokenizer doesn't stringy, it's presumably a bug. Adam
|
Cool, i'll fix that then. |
Quick note: The specific code that's nonsensical here is: https://metacpan.org/source/ADAMK/PPI-1.215/lib/PPI/Token/Whitespace.pm#L407 That's still not fixed and does need a fix. |
That looks like it might have been a very early attempt to do something at
|
Alright, i worked out what's going on there. The code tried to figure out whether a non-ascii char is whitespace or a word char, to proceed with that then; and otherwise throw an error message explaining that it doesn't know what to do with the char. With the original version of the code it ALWAYS recognized it as a word char, which caused weird crashes in Token::Word. After fixing it to inspect that usefully, the error from utf8 that was transformed into unhandlable mojibake now complains about an unexpected character instead. This still needs further adressing by way of better input handling as mentioned in #26, but for now this is an improvement. |
Agreed
|
There is a bit there that seems to try to determine whether a character outside of the ASCII range is word or whitespace, however instead of actually looking at the current character it looks at the stringified tokenizer, which is just a perl address. I'm unclear on whether the tokenizer is supposed to stringify, or whether this was just a piece where the meaning of $t changed without the code adapting. Anything but the obvious change to chr($char) you'd like done here?
The text was updated successfully, but these errors were encountered: