-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative indices with inclusive intervals are dangerous #1979
Comments
Our existing behavior optimizes towards the case where we use .. high or .. <len, in which case the bug doesn't exist because at len 0 the complete sequence is also the empty sequence. The bug category is most prominent when we are working in 0.. <n where n is some iterated variable. Users of slice will expect their range to count to some positive index on the right hand side, resulting in output which grows smaller as n grows smaller. And in cases where the range is 0.. <n and n goes to 1, the assumption holds. It explodes specifically when n continues downwards to 0, expecting the empty sequence, and instead gets the complete sequence. That is tricky behavior. So the problem is not specifically with negative indexes, because Python has those, but with the ends of the range and how they interact with our assumptions surrounding 0 and the empty sequence. We can see that in these examples I can make exact equivalents for each: print("abc")[0:3] # abc
print("abc")[0:-1] # ab
print("abc")[0:1] # a
print("abc")[1:3] # bc
print("abc")[1:-1] # b
print("abc")[1:1] # echo "abc"[0.. 2] # abc
echo "abc"[0.. -2] # ab
echo "abc"[0.. 0] # a
echo "abc"[1.. 2] # bc
echo "abc"[1.. -2] # b
echo "abc"[1.. 0] # But if I try to write Nim's As I suggested in IRC, the patch that would be most pleasing to new users(if it's possible) would be to build in the exact Python syntax and behavior as an alternative and equivalent operator. I think this is only going to come up over and over when we're porting code from languages using half-inclusive ranges, but if we have the exact same idiom in place, it ceases to be an issue, and cut and paste code will work. Another solution is to use unsigned integers in our slice iteration idioms, triggering OOB when it overflows ;) |
I agree with @triplefox. I think that the best solution is to implement Python's slice syntax and to deprecate our current one. That said, I believe that @Araq has his reasons for the way the current slices work and we should wait until he explains them here (I think he explained them on IRC but I wasn't really paying close attention). Isn't the |
Since it seems like Araq is inclined to remove special meaning of negative indices, I can only remind about the options that can ease the transition from them. Add a special syntax inside square brackets that expands to length of the sequence/whatever in question. With just removing negative indices If negative indices are no longer a problem, the syntax |
My background is ruby, and I really like the negative indexes. |
Something @nothings pointed out is that the behavior [0.. <n] implies is [0..n), but the actual behavior is [0..n-1]. This shows in a mathematical way exactly how our half-inclusive emulation isn't the same, regardless of the "negative index" aspect. From this perspective, a direct way to attack the issue would be to change <'s meaning when placed on the right hand of the range such that it enables a different mode with [0..n) behavior. Then in our error cases, n counts to 0 and never wraps around, and all is well. However, the effectiveness of this is muddied since < is useful elsewhere as a "-1" operator. |
@BlaXpirit "With just removing negative indices someSeq[-2] will need to be replaced by someSeq[someSeq.len-2], and I suggest some syntactic sugar for this. For example, someSeq[^-2], as demonstrated here " I agree with this. But wouldn't $ be more familiar, as an indication of the end, like in regular expressions? @reactormonk "My background is ruby, and I really like the negative indexes." Me too, they're intuitive and makes the code small and neat. But it's not very well thought out, it's strange that Nim shuns unsigned integers so much, but then have this blemish that causes similar issues, probably much more often. Adding one extra small character makes the feature logical and less bug-prone, and it's not that much more syntax. |
@skyfex I thought @triplefox |
Why not have one use |
@oderwat But can needless duplication be avoided for every interface which uses slices? |
this would make a lot of sense to me: "x[0 .. -1] -> []" because of what happens when you iterate through "x.low .. x.high" where x is empty:
in this case x.low is 0 and x.high is -1 |
@EXetoC your example shows that it works. But it mixes range with slice. As I said var x: seq[int] = @[]
var count = 0
for i in x.low .. x.high:
count += 1
assert(count == 0)
template `...`[T](low, high: T): Slice =
if high < low:
-high .. 0
else:
low .. high
echo "'", "0123456"[3 .. 5], "'" # '345'
echo "'", "0123456"[3 ... 5], "'" # '345'
echo "'", "0123456"[0 .. -3], "'" # '01234'
echo "'", "0123456"[0 ... -3], "'" # ''
var s = @[0,1,2,3,4,5,6]
write stdout, '\''
for e in s[0 .. 3]:
write stdout, e # '0123'
write stdout, "'\n'"
for e in s[0 ... 3]:
write stdout, e # '0123'
write stdout, "'\n'"
for e in s[0 .. -2]:
write stdout, e # '01245'
write stdout, "'\n'"
for e in s[0 ... -2]:
write stdout, e # ''
write stdout, "'\n" Couldn't this kinda go into the stdlib @Araq / @def- or am I missing something (again)? |
@oderwat I think that |
@oderwat slices are used in both cases regardless, which is why I think that those cases seem very similar EDIT: No, that's wrong. The fact that "for a in b..c" calls the |
I talked with Araq a bit about the solution he decided on as well as the possible alternatives. The solution he wants to implement currently is to use The counting will start at 1, so assuming Here are some more definitions of how var x = "123456"
assert x[0 .. 4] == "12345"
assert x[0 .. ^0] == "" # Araq stated that he is not yet sure about this.
assert x[0 .. ^1] == "123456"
assert x[0 .. ^2] == "12345"
assert x[0 .. ^3] == "1234"
assert x[0 .. ^4] == "123"
assert x[0 .. ^5] == "12"
assert x[0 .. ^6] == "1"
assert x[0 .. ^7] == "" Hopefully this will clear up some confusion. The current support for negative indices will go away. During a transition period negative indices in slices will result in a compile-time error (where possible) or a runtime error otherwise. After the transition period the use of negative indices will result in "". Slicing with positive indices will continue to work as previously. @Araq equates the current slice syntax to the For example: for i in 0 .. 5:
echo i Will iterate 6 times. This exactly why Other solutions we discussed included: Copying Python's This was rejected by @Araq because:
Using
|
Any particular reason |
@Varriount Likely because |
@dom96 didn't most people agree that it was an odd shortcut though? And is it not more common to specify up to the last element (in other words, 'high')? But I think it only matters if ^ can be used without an argument, otherwise it doesn't matter much whether one needs to do x[0 .. ^0] or x[0 .. ^1], in the case where ^ is short for x.high and x.len respectively. |
Araq voiced the plans about this: http://irclogs.nim-lang.org/26-03-2015.html Negative indices will give a warning, and will be disabled later. New syntax will be introduced: inside index square brackets This operation will work only if |
Negative indexing is gone, a[^1] works instead. Be happy. |
When asked to write a function to return first N items of a
seq
, almost everyone would write something like this:@[1,2,3,4,5].first(2) == @[1,2]
and all is great... Except@[1,2,3,4,5].first(0) == @[1,2,3,4,5]
. Can you see why this is? How long would it take you to detect such a bug?To elaborate, what happens is
s[0 .. -1]
, and -1 corresponds to the last item...That's okay. To fix this, what is the first thing that comes to your mind? Maybe this (and I'm not making this up, someone actually did come up with this):
But this would just return the first item when asked for 0 items.
So, the correct version will be...
This issue must be considered every single time you take a slice of a
seq
. Same thing goes forstring
s and probably many other things, but at least there is asubstr
function without this flaw.And this is not something obscure or rare. I ran into this at least twice in the last few days. Last time, when translating a library, it took me 1-2 hours to find:
left[:pos] + left[pos + 1:]
translated toleft[0 .. <pos] & left[pos+1 .. left.high]
.Just think of it, the relative rarity of this happening only makes it so much harder to find. You can only bet this bug is present in a lot of existing code, quite possibly in more code than benefits from the negative indices.
It is impossible to solve this while keeping backwards compatibility. My suggestion is to remove negative indices in slices, and possibly introduce an explicit syntax for them, maybe something like this.
And for now, this is a must-have function in my toolbelt...
The text was updated successfully, but these errors were encountered: