-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range Operator inconsistency? #16770
Comments
From @haukexDear P5P, As first reported on PerlMonks in this thread: perlop says: "The range operator (in list context) makes use of the And yet there are some really strange inconsistencies with respect to Some more test cases from Perl 5.26.0 on Linux are below. (A note on the $ perl -wMstrict -MData::Dump -e' dd "0".."-1" ' Thanks, Regards, |
From @haukexHi all, Now with a test file attached. Best, On Wed, 28 Nov 2018 07:15:33 -0800, haukex@zero-g.net wrote:
|
From @haukex |
From [Unknown Contact. See original ticket]Hi all, Now with a test file attached. Best, On Wed, 28 Nov 2018 07:15:33 -0800, haukex@zero-g.net wrote:
|
From @iabynOn Wed, Nov 28, 2018 at 07:56:34AM -0800, Hauke D via RT wrote:
Perl internally tries very hard to treat the range args as numeric where /* This code tries to decide if "$left .. $right" should use the #define RANGE_IS_NUMERIC(left,right) ( Frabnkly I don't understand all those conditions; they are a lot more -- |
@jkeenan - Status changed from 'new' to 'open' |
From @haukexHi, Thanks for looking into this! The code comment in the code you showed [1] mentions #18165 [2] which references #18114 [3] where a reply by Slaven Rezic makes sense to me: 'There is a special handling for numeric strings beginning with a "0". This is to allow things like "01".."31" to preserve the leading zero for one-digit numbers.' The basic behavior appears to go all the way back to 5.000 [4]. [1] https://perl5.git.perl.org/perl.git/blob/23665de87341f4f3452009759d4fc95ce30b8ced:/pp_ctl.c#l1179 So my interpretation of the rules is this: If the left and right operands are strings, then check if they looks_like_number. If they do, treat them as integers. However, make an exception when the left-hand side begins with "0", for the reason stated above. The key word here is *begins* with zero; the condition *SvPVX_const(left)!='0' causes this inconsistency: -3..-1 and "-3".."-1" are (-3,-2,-1) That latter behavior may be in line with "01".."-1", which is ("01","02","03",...), but IMO it's still surprising, and in any case the fact that strings that look like numbers are treated as such appears to be undocumented. I have two alternative proposals: (A) leave the behavior as-is, but document it, or (B) change the behavior so that the above condition is 'if the LHS is a string that begins with 0, except for the string "0" itself' (and document it) - this would cause the "01".."31" case to still work, but also cause "0".."-1" to act like 0..-1. Patches for both A (just document) and B (change behavior) are attached, with tests included (a full build passes all tests on my end). My internals knowledge is quite limited so I hope my use of SvCUR in the second patch is correct. My personal preference is option B, since it gets rid of the above inconsistency, but I understand that if there are worries about backwards compatibility; option A may be better in that respect. The way I've worded the documentation pretty much nails down the behavior and wouldn't allow for future changes, a third option might be to word the documentation more loosely and leave the door open for future changes. Thanks, Regards, P.S. The attachment "rt133695.pl" in my previous message contains an off-by-one error, but in an unused branch of code, so the output and conclusions produced by the script are still correct (as long as $inseq is always false, which it currently is). On Thu, 29 Nov 2018 04:05:27 -0800, davem wrote:
|
From @haukexrt133695_rangeop_zero_A_doc_only.patchFrom 52296ca221128e2ed89d2f9e39520dcb96801eb9 Mon Sep 17 00:00:00 2001
From: Hauke D <haukex@zero-g.net>
Date: Fri, 30 Nov 2018 13:56:10 +0100
Subject: [PATCH] (perl #133695) Document range op details
"-2".."-1" is the same as -2..-1 and "1".."-1" is the same as 1..-1, but
"0".."-1" is the same as "0".."99". This patch documents the rules for
the range operator in list context with both operands being strings more
explicitly.
See also #18165 and #18114.
---
pod/perlop.pod | 85 +++++++++++++++++++++++++++++++++++++++-----------
pp_ctl.c | 3 +-
t/op/range.t | 24 +++++++++++++-
3 files changed, 92 insertions(+), 20 deletions(-)
diff --git a/pod/perlop.pod b/pod/perlop.pod
index d6adbd11f2..9ff980e9b4 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1081,26 +1081,82 @@ And now some examples as a list operator:
@foo = @foo[0 .. $#foo]; # an expensive no-op
@foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
-The range operator (in list context) makes use of the magical
-auto-increment algorithm if the operands are strings. You
-can say
+Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
+return two elements in list context.
- @alphabet = ("A" .. "Z");
+ @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
-to get all normal letters of the English alphabet, or
+The range operator in list context can make use of the magical
+auto-increment algorithm if both operands are strings, subject to the
+following rules:
- $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+=over
+
+=item *
+
+With one exception (below), if both strings look like numbers to Perl,
+the magic increment will not be applied, and the strings will be treated
+as numbers (more specifically, integers) instead.
+
+For example, C<"-2".."2"> is the same as C<-2..2>, C<"1".."-1"> is the
+same as C<1..-1> (producing the empty list), and C<"2.18".."3.14">
+produces C<2, 3>.
-to get a hexadecimal digit, or
+=item *
+
+The exception to the above rule is when the left-hand string begins with
+C<0>, including the string C<"0"> itself. In this case, the magic
+increment I<will> be applied, even though strings like C<"01"> would
+normally look like a number to Perl.
+
+For example, C<"01".."04"> produces C<"01", "02", "03", "04">, and
+C<"0".."-1"> produces C<"0"> through C<"99"> - this may seem
+surprising, but see the following rules for why it works this way.
+To get dates with leading zeros, you can say:
@z2 = ("01" .. "31");
print $z2[$mday];
-to get dates with leading zeros.
+If you want to force strings to be interpreted as numbers, you could say
+
+ @numbers = ( 0+$first .. 0+$last );
+
+=item *
+
+If the initial value specified isn't part of a magical increment
+sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
+only the initial value will be returned.
+
+For example, C<"ax".."az"> produces C<"ax", "ay", "az">, but
+C<"*x".."az"> produces only C<"*x">.
+
+=item *
+
+For other initial values that are strings that do follow the rules of the
+magical increment, the corresponding sequence will be returned.
+
+For example, you can say
+
+ @alphabet = ("A" .. "Z");
+
+to get all normal letters of the English alphabet, or
+
+ $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+
+to get a hexadecimal digit.
+
+=item *
If the final value specified is not in the sequence that the magical
increment would produce, the sequence goes until the next value would
-be longer than the final value specified.
+be longer than the final value specified. If the length of the final
+string is shorter than the first, the empty list is returned.
+
+For example, C<"a".."--"> is the same as C<"a".."zz">, C<"0".."xx">
+produces C<"0"> through C<"99">, and C<"aaa".."--"> returns the empty
+list.
+
+=back
As of Perl 5.26, the list-context range operator on strings works as expected
in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
@@ -1108,10 +1164,8 @@ in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
that feature, it exhibits L<perlunicode/The "Unicode Bug">: its behavior
depends on the internal encoding of the range endpoint.
-If the initial value specified isn't part of a magical increment
-sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
-only the initial value will be returned. So the following will only
-return an alpha:
+Because the magical increment only works on non-empty strings matching
+C</^[a-zA-Z]*[0-9]*\z/>, the following will only return an alpha:
use charnames "greek";
my @greek_small = ("\N{alpha}" .. "\N{omega}");
@@ -1131,11 +1185,6 @@ you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
L<experimental feature|perlrecharclass/Extended Bracketed Character
Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
-Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
-return two elements in list context.
-
- @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
-
=head2 Conditional Operator
X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
diff --git a/pp_ctl.c b/pp_ctl.c
index 17d4f0d14a..2da942aa88 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -1178,7 +1178,8 @@ PP(pp_flip)
/* This code tries to decide if "$left .. $right" should use the
magical string increment, or if the range is numeric (we make
- an exception for .."0" [#18165]). AMS 20021031. */
+ an exception for .."0" [#18165]). AMS 20021031.
+ See also [#133695] - the rules are now documented in perlop. */
#define RANGE_IS_NUMERIC(left,right) ( \
SvNIOKp(left) || (SvOK(left) && !SvPOKp(left)) || \
diff --git a/t/op/range.t b/t/op/range.t
index 19ae1269ca..18eaa1fe0c 100644
--- a/t/op/range.t
+++ b/t/op/range.t
@@ -9,7 +9,7 @@ BEGIN {
use Config;
-plan (146);
+plan (162);
is(join(':',1..5), '1:2:3:4:5');
@@ -112,6 +112,28 @@ is(join(":","-4".."-0") , "-4:-3:-2:-1:0");
is(join(":","-4\n".."0\n") , "-4:-3:-2:-1:0");
is(join(":","-4\n".."-0\n"), "-4:-3:-2:-1:0");
+# [#133695] document inconsistency between "0".."-1" and 0..-1
+is(join(":","-2".."-1") , "-2:-1");
+is(join(":","-1".."-1") , "-1");
+is(join(":", 0 .. -1 ) , "");
+is(join(":","0".."-1") , "0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","1".."-1") , "");
+
+# these test the statements made in the documentation
+# regarding the rules of string ranges
+is(join(":","-2".."2"), join(":",-2..2));
+is(join(":","2.18".."3.14"), "2:3");
+is(join(":","01".."04"), "01:02:03:04");
+# "0".."-1" tested above
+is(join(":","00".."31"), "00:01:02:03:04:05:06:07:08:09:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31");
+is(join(":","ax".."az"), "ax:ay:az");
+is(join(":","*x".."az"), "*x");
+is(join(":","A".."Z"), "A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z");
+is(join(":", 0..9,"a".."f"), "0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f");
+is(join(":","a".."--"), join(":","a".."zz"));
+is(join(":","0".."xx"), "0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","aaa".."--"), "");
+
# undef should be treated as 0 for numerical range
is(join(":",undef..2), '0:1:2');
is(join(":",-2..undef), '-2:-1:0');
--
2.19.2
|
From @haukexrt133695_rangeop_zero_B_change.patchFrom cd2b39ae22f1a9e2090cea546da9a2c3884bf22e Mon Sep 17 00:00:00 2001
From: Hauke D <haukex@zero-g.net>
Date: Fri, 30 Nov 2018 13:06:07 +0100
Subject: [PATCH] (perl #133695) "0".."-1" should act like 0..-1
Previously, *any* string beginning with 0, including the string "0"
itself, would be subject to the magic string auto-increment, instead of
being treated like a number. This meant that "-2".."-1" was the same as
-2..-1 and "1".."-1" was the same as 1..-1, but "0".."-1" was the same
as "0".."99".
This patch fixes that inconsistency, while still allowing ranges like
"01".."31" to produce the strings "01", "02", ... "31", which is what
the "begins with 0" exception was intended for.
This patch also expands the documentation in perlop and states the rules
for the range operator in list context with both operands being strings
more explicitly.
See also #18165 and #18114.
---
pod/perlop.pod | 84 +++++++++++++++++++++++++++++++++++++++-----------
pp_ctl.c | 10 ++++--
t/op/range.t | 23 +++++++++++++-
3 files changed, 95 insertions(+), 22 deletions(-)
diff --git a/pod/perlop.pod b/pod/perlop.pod
index d6adbd11f2..d4101ff544 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1081,26 +1081,81 @@ And now some examples as a list operator:
@foo = @foo[0 .. $#foo]; # an expensive no-op
@foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
-The range operator (in list context) makes use of the magical
-auto-increment algorithm if the operands are strings. You
-can say
+Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
+return two elements in list context.
- @alphabet = ("A" .. "Z");
+ @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
-to get all normal letters of the English alphabet, or
+The range operator in list context can make use of the magical
+auto-increment algorithm if both operands are strings, subject to the
+following rules:
- $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+=over
+
+=item *
+
+With one exception (below), if both strings look like numbers to Perl,
+the magic increment will not be applied, and the strings will be treated
+as numbers (more specifically, integers) instead.
+
+For example, C<"-2".."2"> is the same as C<-2..2>, and
+C<"2.18".."3.14"> produces C<2, 3>.
-to get a hexadecimal digit, or
+=item *
+
+The exception to the above rule is when the left-hand string begins with
+C<0> and is longer than one character, in this case the magic increment
+I<will> be applied, even though strings like C<"01"> would normally look
+like a number to Perl.
+
+For example, C<"01".."04"> produces C<"01", "02", "03", "04">, and
+C<"00".."-1"> produces C<"00"> through C<"99"> - this may seem
+surprising, but see the following rules for why it works this way.
+To get dates with leading zeros, you can say:
@z2 = ("01" .. "31");
print $z2[$mday];
-to get dates with leading zeros.
+If you want to force strings to be interpreted as numbers, you could say
+
+ @numbers = ( 0+$first .. 0+$last );
+
+=item *
+
+If the initial value specified isn't part of a magical increment
+sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
+only the initial value will be returned.
+
+For example, C<"ax".."az"> produces C<"ax", "ay", "az">, but
+C<"*x".."az"> produces only C<"*x">.
+
+=item *
+
+For other initial values that are strings that do follow the rules of the
+magical increment, the corresponding sequence will be returned.
+
+For example, you can say
+
+ @alphabet = ("A" .. "Z");
+
+to get all normal letters of the English alphabet, or
+
+ $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
+
+to get a hexadecimal digit.
+
+=item *
If the final value specified is not in the sequence that the magical
increment would produce, the sequence goes until the next value would
-be longer than the final value specified.
+be longer than the final value specified. If the length of the final
+string is shorter than the first, the empty list is returned.
+
+For example, C<"a".."--"> is the same as C<"a".."zz">, C<"0".."xx">
+produces C<"0"> through C<"99">, and C<"aaa".."--"> returns the empty
+list.
+
+=back
As of Perl 5.26, the list-context range operator on strings works as expected
in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
@@ -1108,10 +1163,8 @@ in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
that feature, it exhibits L<perlunicode/The "Unicode Bug">: its behavior
depends on the internal encoding of the range endpoint.
-If the initial value specified isn't part of a magical increment
-sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
-only the initial value will be returned. So the following will only
-return an alpha:
+Because the magical increment only works on non-empty strings matching
+C</^[a-zA-Z]*[0-9]*\z/>, the following will only return an alpha:
use charnames "greek";
my @greek_small = ("\N{alpha}" .. "\N{omega}");
@@ -1131,11 +1184,6 @@ you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
L<experimental feature|perlrecharclass/Extended Bracketed Character
Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
-Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
-return two elements in list context.
-
- @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
-
=head2 Conditional Operator
X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
diff --git a/pp_ctl.c b/pp_ctl.c
index 17d4f0d14a..e820a9df02 100644
--- a/pp_ctl.c
+++ b/pp_ctl.c
@@ -1177,14 +1177,18 @@ PP(pp_flip)
}
/* This code tries to decide if "$left .. $right" should use the
- magical string increment, or if the range is numeric (we make
- an exception for .."0" [#18165]). AMS 20021031. */
+ magical string increment, or if the range is numeric. Initially,
+ an exception was made for *any* string beginning with "0" (see
+ [#18165], AMS 20021031), but now that is only applied when the
+ string's length is also >1 - see the rules now documented in
+ perlop [#133695] */
#define RANGE_IS_NUMERIC(left,right) ( \
SvNIOKp(left) || (SvOK(left) && !SvPOKp(left)) || \
SvNIOKp(right) || (SvOK(right) && !SvPOKp(right)) || \
(((!SvOK(left) && SvOK(right)) || ((!SvOK(left) || \
- looks_like_number(left)) && SvPOKp(left) && *SvPVX_const(left) != '0')) \
+ looks_like_number(left)) && SvPOKp(left) \
+ && !(*SvPVX_const(left) == '0' && SvCUR(left)>1 ) )) \
&& (!SvOK(right) || looks_like_number(right))))
PP(pp_flop)
diff --git a/t/op/range.t b/t/op/range.t
index 19ae1269ca..2deefc61cf 100644
--- a/t/op/range.t
+++ b/t/op/range.t
@@ -9,7 +9,7 @@ BEGIN {
use Config;
-plan (146);
+plan (162);
is(join(':',1..5), '1:2:3:4:5');
@@ -112,6 +112,27 @@ is(join(":","-4".."-0") , "-4:-3:-2:-1:0");
is(join(":","-4\n".."0\n") , "-4:-3:-2:-1:0");
is(join(":","-4\n".."-0\n"), "-4:-3:-2:-1:0");
+# [#133695] "0".."-1" should be the same as 0..-1
+is(join(":","-2".."-1") , "-2:-1");
+is(join(":","-1".."-1") , "-1");
+is(join(":","0".."-1") , "");
+is(join(":","1".."-1") , "");
+
+# these test the statements made in the documentation
+# regarding the rules of string ranges
+is(join(":","-2".."2"), join(":",-2..2));
+is(join(":","2.18".."3.14"), "2:3");
+is(join(":","01".."04"), "01:02:03:04");
+is(join(":","00".."-1"), "00:01:02:03:04:05:06:07:08:09:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","00".."31"), "00:01:02:03:04:05:06:07:08:09:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31");
+is(join(":","ax".."az"), "ax:ay:az");
+is(join(":","*x".."az"), "*x");
+is(join(":","A".."Z"), "A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z");
+is(join(":", 0..9,"a".."f"), "0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f");
+is(join(":","a".."--"), join(":","a".."zz"));
+is(join(":","0".."xx"), "0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:52:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:69:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:86:87:88:89:90:91:92:93:94:95:96:97:98:99");
+is(join(":","aaa".."--"), "");
+
# undef should be treated as 0 for numerical range
is(join(":",undef..2), '0:1:2');
is(join(":",-2..undef), '-2:-1:0');
--
2.19.2
|
From @tonycozOn Fri, 30 Nov 2018 06:09:07 -0800, haukex@zero-g.net wrote:
I think I prefer B too. It would be nice to find out what anyone else thinks. Unfortunately I don't think I'd want to put a change in behaviour into core at this point in the release cycle. Tony |
From @xenuOn Wed, 13 Feb 2019 15:59:02 -0800, tonyc wrote:
Now we're in a brand new release cycle, so I think it's time to revisit this ticket. Personally, I think that the option B is better, it's unlikely that anything relies on the current (broken) behaviour. |
From @tonycozOn Tue, 06 Aug 2019 23:58:10 -0700, me@xenu.pl wrote:
I've applied to the B version to blead, so we should find out if anything depends on the old behaviour. Leaving open for now. Tony |
From @tonycozOn Wed, 07 Aug 2019 18:19:54 -0700, tonyc wrote:
Closing. Tony |
@tonycoz - Status changed from 'open' to 'pending release' |
This change documents the previous behavior of the range operator with magic string increment in Perl 5.30 and below - the change was introduced in commit d1bc97f from GitHub Perl#16770 (RT133695).
Migrated from rt.perl.org#133695 (status was 'pending release')
Searchable as RT133695$
The text was updated successfully, but these errors were encountered: