-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no "Malformed UTF-8 character" warning on single-quoted strings under "use utf8" #14973
Comments
From florian.schlichting@fu-berlin.deThis is a bug report for perl from florian.schlichting@fu-berlin.de, As discovered in the "Malformed UTF-8 character" thread at % blead -C0 -le 'print qq(print "\xB0C";)' | blead -Mutf8 -CS -l % blead -C0 -le 'print qq(print \x27\xB0C\x27;)' | blead -Mutf8 -CS -l This should be fixed so that the warning is issued for single quoted strings as Flags: Site configuration information for perl 5.20.2: Configured by Debian Project at Sun May 3 16:16:25 UTC 2015. Summary of my perl5 (revision 5 version 20 subversion 2) configuration: Locally applied patches: @INC for perl 5.20.2: Environment for perl 5.20.2: |
From @khwilliamsonI have taken this ticket, as I'm about to start work on related things. On 10/09/2015 07:17 AM, (via RT) wrote:
|
The RT System itself - Status changed from 'new' to 'open' |
From @khwilliamsonI intend to fix this, unless the consensus is to not. It involves extra work in the parser of doing a UTF-8 validity check when appropriate on single-quoted strings. -- |
From [Unknown Contact. See original ticket]I intend to fix this, unless the consensus is to not. It involves extra work in the parser of doing a UTF-8 validity check when appropriate on single-quoted strings. -- |
From @cpansproutOn Tue Aug 02 19:58:51 2016, khw wrote:
If you mean in tokeq or scan_str, I think that’s the wrong place to do it. It sounds as though eval "'...'" will be subject to such extra checks as well, but it is perfectly reasonable to assume that perl strings are already well-formed. Ideally, under ‘use utf8’, the validation would be done when the input is read from a stream, though I can’t say offhand what is the best way to go about that. -- Father Chrysostomos |
From @cpansproutOn Tue Aug 02 20:05:11 2016, sprout wrote:
Probably in Perl_lex_next_chunk or something it calls. -- Father Chrysostomos |
From @khwilliamsonOn Tue Aug 02 20:09:15 2016, sprout wrote:
Is the attach3ed like what you mean? -- |
From @khwilliamson0001-Proof-of-concept-to-test-input-for-valid-UTF-8.patchFrom c1d8cbda01e0b2f372e9341efeb4e306ec0c043d Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Wed, 31 Aug 2016 21:31:28 -0600
Subject: [PATCH] Proof-of-concept to test input for valid UTF-8.
This will fix #126310, and heaven knows what else.
I think we should die at the first malformed UTF-8 encountered in
parsing. To try to continue is asking for trouble, and not going to be
DWIM anyway.
---
toke.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/toke.c b/toke.c
index dbeecd1..eddfb29 100644
--- a/toke.c
+++ b/toke.c
@@ -1339,6 +1339,11 @@ Perl_lex_next_chunk(pTHX_ U32 flags)
new_bufend_pos = SvCUR(linestr);
PL_parser->bufend = buf + new_bufend_pos;
PL_parser->bufptr = buf + bufptr_pos;
+
+ if (UTF && ! is_utf8_string((U8 *) PL_parser->bufptr, PL_parser->bufend - PL_parser->bufptr)) {
+ Perl_croak(aTHX_ "Malformed utf8");
+ }
+
PL_parser->oldbufptr = buf + oldbufptr_pos;
PL_parser->oldoldbufptr = buf + oldoldbufptr_pos;
PL_parser->linestart = buf + linestart_pos;
--
2.5.0
|
From @cpansproutOn Wed Aug 31 20:35:02 2016, khw wrote:
Yes, that would work. It would be nice, too, if we could add the ‘near such and such’ that yyerror normally does. Maybe yyerror could have an extra option to croak instead of calling qerror. It already has a flags field. -- Father Chrysostomos |
From florian.schlichting@fu-berlin.deHi Karl, Father Chrysostomos wrote:
thanks for looking into this issue. I tested your patch and can confirm % ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8 -CS % -l % ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib' -Mutf8 -CS -l However, I feel a little uneasy about dying altogether. Currently Perl Florian |
From @khwilliamsonOn 09/16/2016 06:46 AM, Florian Schlichting wrote:
But we are running into segfaults because of trying to keep going in the |
From @cpansproutOn Fri Sep 16 13:34:55 2016, khw wrote:
I agree. If perl keeps going, then even if it does not crash, it will die on those malformed strings later. -- Father Chrysostomos |
From @khwilliamsonOn 09/16/2016 04:44 PM, Father Chrysostomos via RT wrote:
blead now has improved diagnostics for when malformations occur. I am |
From @khwilliamsonThis has been fixed in blead by |
@khwilliamson - Status changed from 'open' to 'pending release' |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.26.0, this and 210 other issues have been Perl 5.26.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#126310 (status was 'resolved')
Searchable as RT126310$
The text was updated successfully, but these errors were encountered: