nonsensical code in Whitespace->TOKENIZERon_char #10

wchristian · 2013-12-18T13:56:14Z

There is a bit there that seems to try to determine whether a character outside of the ASCII range is word or whitespace, however instead of actually looking at the current character it looks at the stringified tokenizer, which is just a perl address. I'm unclear on whether the tokenizer is supposed to stringify, or whether this was just a piece where the meaning of $t changed without the code adapting. Anything but the obvious change to chr($char) you'd like done here?

adamkennedy · 2013-12-18T17:38:39Z

The tokenizer doesn't stringy, it's presumably a bug.

Adam
On Dec 18, 2013 5:56 AM, "Christian Walde" notifications@github.com wrote:

There is a bit there that seems to try to determine whether a character
outside of the ASCII range is word or whitespacehttps://github.com/adamkennedy/PPI/blob/master/lib/PPI/Token/Whitespace.pm#L407,
however instead of actually looking at the current character it looks at
the stringified tokenizer, which is just a perl address. I'm unclear on
whether the tokenizer is supposed to stringify, or whether this was just a
piece where the meaning of $t changed without the code adapting. Anything
but the obvious change to chr($char) you'd like done here?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10
.

wchristian · 2013-12-19T11:26:32Z

Cool, i'll fix that then.

wchristian · 2014-11-12T17:39:16Z

Quick note: The specific code that's nonsensical here is:

https://metacpan.org/source/ADAMK/PPI-1.215/lib/PPI/Token/Whitespace.pm#L407

That's still not fixed and does need a fix.

adamkennedy · 2014-11-13T02:33:21Z

That looks like it might have been a very early attempt to do something at
least remotely usful with Unicode or accented Latin1 characters. Probably
the latter...
On Nov 12, 2014 9:39 AM, "Christian Walde" notifications@github.com wrote:

Quick note: The specific code that's nonsensical here is:

https://metacpan.org/source/ADAMK/PPI-1.215/lib/PPI/Token/Whitespace.pm#L407

That's still not fixed and does need a fix.

—
Reply to this email directly or view it on GitHub
#10 (comment).

wchristian · 2014-11-13T17:28:40Z

Alright, i worked out what's going on there.

The code tried to figure out whether a non-ascii char is whitespace or a word char, to proceed with that then; and otherwise throw an error message explaining that it doesn't know what to do with the char. With the original version of the code it ALWAYS recognized it as a word char, which caused weird crashes in Token::Word.

After fixing it to inspect that usefully, the error from utf8 that was transformed into unhandlable mojibake now complains about an unexpected character instead.

This still needs further adressing by way of better input handling as mentioned in #26, but for now this is an improvement.

adamkennedy · 2014-11-14T04:34:31Z

Agreed
On Nov 13, 2014 9:28 AM, "Christian Walde" notifications@github.com wrote:

Alright, i worked out what's going on there.

The code tried to figure out whether a non-ascii char is whitespace or a
word char, to proceed with that then; and otherwise throw an error message
explaining that it doesn't know what to do with the char. With the original
version of the code it ALWAYS recognized it as a word char, which caused
weird crashes in Token::Word.

After fixing it to inspect that usefully, the error from utf8 that was
transformed into unhandlable mojibake now complains about an unexpected
character instead.

This still needs further adressing by way of better input handling as
mentioned in #26 #26, but for
now this is an improvement.

—
Reply to this email directly or view it on GitHub
#10 (comment).

fixes #10

ghost assigned wchristian Dec 19, 2013

wchristian closed this as completed in deefad0 Nov 13, 2014

wchristian added this to the 1.222 milestone Nov 13, 2014

wchristian added a commit that referenced this issue May 12, 2017

allow Token::Whitespace to throw useful error on unexpected input

68d4038

fixes #10

wchristian added a commit that referenced this issue May 12, 2017

allow Token::Whitespace to throw useful error on unexpected input

5157714

fixes #10

wchristian added a commit that referenced this issue May 12, 2017

allow Token::Whitespace to throw useful error on unexpected input

4dfbc99

fixes #10

wchristian added a commit that referenced this issue May 12, 2017

allow Token::Whitespace to throw useful error on unexpected input

3c197fd

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

380f667

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

a7a1390

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

6fcac9b

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

6ea242c

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

bfb0a05

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

1f6a7b0

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

9a910d7

fixes #10

wchristian added a commit that referenced this issue May 13, 2017

allow Token::Whitespace to throw useful error on unexpected input

d73b98f

fixes #10

wchristian added a commit that referenced this issue May 14, 2017

allow Token::Whitespace to throw useful error on unexpected input

6768375

fixes #10

wchristian added a commit that referenced this issue May 14, 2017

allow Token::Whitespace to throw useful error on unexpected input

fba22d4

fixes #10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nonsensical code in Whitespace->TOKENIZERon_char #10

nonsensical code in Whitespace->TOKENIZERon_char #10

wchristian commented Dec 18, 2013

adamkennedy commented Dec 18, 2013

wchristian commented Dec 19, 2013

wchristian commented Nov 12, 2014

adamkennedy commented Nov 13, 2014

wchristian commented Nov 13, 2014

adamkennedy commented Nov 14, 2014

nonsensical code in Whitespace->__TOKENIZER__on_char #10

nonsensical code in Whitespace->__TOKENIZER__on_char #10

Comments

wchristian commented Dec 18, 2013

adamkennedy commented Dec 18, 2013

wchristian commented Dec 19, 2013

wchristian commented Nov 12, 2014

adamkennedy commented Nov 13, 2014

wchristian commented Nov 13, 2014

adamkennedy commented Nov 14, 2014

nonsensical code in Whitespace->TOKENIZERon_char #10

nonsensical code in Whitespace->TOKENIZERon_char #10