-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Replacement list is longer than search list" is warned even if search list is in range of ASCII #14777
Comments
From nezumi@cpan.orgCreated by nezumi@cpan.orgAs of 5.21.6, "Replacement list is longer than search list" warning seems ------------ >8 ------------ >8 ------------ >8 ------------ Replacement list is longer than search list at - line 3. If the 3rd line of code above does not contain "-", or if search list in it tr/09/\x{6F0}\x{6F9}/; Perl Info
|
From @tonycozOn Fri Jun 26 17:33:20 2015, nezumi@cpan.org wrote:
The code added by 6a8b6cf, which enabled For the example supplied, rcount ends up as 55 (10+9+8...+2+1) which is On the optimization side, the swash definition produced for the example is: 0030\t\t01f0 which could probably be simplified. Tony |
The RT System itself - Status changed from 'new' to 'open' |
From bkb@cpan.orgOn Tue, 30 Jun 2015 23:24:00 -0700, tonyc wrote:
The code added in that commit is not actually buggy, it just reveals the bug which was there. The above commit adds a warning if there is a mismatch in UTF-8-flagged strings, but the original bug which existed before that commit was that rcount was calculated but then discarded without being used anywhere, except where it was uselessly setting rlen = rcount just before exiting that routine. Adding the above goto statement and sending it to the "warnins" (sic) section revealed that the rlen value was meaningless. The actual bug here is in the calculation of rcount. Incidentally this bug occurred to me in the following example case, which might be a useful test whether this has been fixed: These lines: $input =~ tr/\x{3000}\x{FF01}-\x{FF5E}/ -~/; and $input =~ tr/ -~/\x{3000}\x{FF01}-\x{FF5E}/; both trip the bug. Adding a line like this into the source code of op.c after the increments of rcount and tcount: Perl_warn ("BKB test: %d %d\n", tcount, rcount); gives values like this: BKB test: 1 1 Going further into this, the test *t == ILLEGAL_UTF8_BYTE or the equivalent *r == test is only applied to the side which contains UTF-8 and not to the other side. Putting UTF-8 on both sides of the tr/// solves the problem. Here is a simple test case: use warnings; Only the upper tr/// trips the bug, and the counts are done correctly for the lower case. So this bug is related to conversions with UTF-8 encodings on one or the other side only. At this point I'm out of my depth for the best way to fix this, but it seems to me that the range operator (the "-") is not being correctly converted to ILLEGAL_UTF8_BYTE when one of from_utf or to_utf is not set.
|
From @khwilliamsonI plan on fixing this in 5.27. In the meantime, I've added the first example as a TODO test in our suite, via commit a0c4698 |
This has finally been fixed by
|
Migrated from rt.perl.org#125493 (status was 'open')
Searchable as RT125493$
The text was updated successfully, but these errors were encountered: