Negative indices with inclusive intervals are dangerous #1979

oprypin · 2015-01-17T08:44:35Z

When asked to write a function to return first N items of a seq, almost everyone would write something like this:

proc first[T](s: seq[T]; n: int): seq[T] =
  s[0 .. <n]

@[1,2,3,4,5].first(2) == @[1,2] and all is great... Except @[1,2,3,4,5].first(0) == @[1,2,3,4,5]. Can you see why this is? How long would it take you to detect such a bug?

To elaborate, what happens is s[0 .. -1], and -1 corresponds to the last item...

That's okay. To fix this, what is the first thing that comes to your mind? Maybe this (and I'm not making this up, someone actually did come up with this):

  s[0 .. max(0, n-1)]

But this would just return the first item when asked for 0 items.

So, the correct version will be...

proc first[T](s: seq[T]; n: int): seq[T] =
  if n > 0: s[0 .. <n]
  else: @[]

This issue must be considered every single time you take a slice of a seq. Same thing goes for strings and probably many other things, but at least there is a substr function without this flaw.

And this is not something obscure or rare. I ran into this at least twice in the last few days. Last time, when translating a library, it took me 1-2 hours to find:
left[:pos] + left[pos + 1:] translated to left[0 .. <pos] & left[pos+1 .. left.high].

Just think of it, the relative rarity of this happening only makes it so much harder to find. You can only bet this bug is present in a lot of existing code, quite possibly in more code than benefits from the negative indices.

It is impossible to solve this while keeping backwards compatibility. My suggestion is to remove negative indices in slices, and possibly introduce an explicit syntax for them, maybe something like this.

And for now, this is a must-have function in my toolbelt...

proc sub*[T](s: seq[T], a, b: int): seq[T] =
  ## Items from `a` to `b` non-inclusive
  if a < b: s[a .. <b]
  else: @[]

The text was updated successfully, but these errors were encountered:

triplefox · 2015-01-17T10:53:53Z

Our existing behavior optimizes towards the case where we use .. high or .. <len, in which case the bug doesn't exist because at len 0 the complete sequence is also the empty sequence.

The bug category is most prominent when we are working in 0.. <n where n is some iterated variable. Users of slice will expect their range to count to some positive index on the right hand side, resulting in output which grows smaller as n grows smaller. And in cases where the range is 0.. <n and n goes to 1, the assumption holds. It explodes specifically when n continues downwards to 0, expecting the empty sequence, and instead gets the complete sequence. That is tricky behavior.

So the problem is not specifically with negative indexes, because Python has those, but with the ends of the range and how they interact with our assumptions surrounding 0 and the empty sequence.

We can see that in these examples I can make exact equivalents for each:

print("abc")[0:3] # abc
print("abc")[0:-1] # ab
print("abc")[0:1] # a
print("abc")[1:3] # bc
print("abc")[1:-1] # b
print("abc")[1:1] #

echo "abc"[0.. 2] # abc
echo "abc"[0.. -2] # ab
echo "abc"[0.. 0] # a
echo "abc"[1.. 2] # bc
echo "abc"[1.. -2] # b
echo "abc"[1.. 0] #

But if I try to write Nim's "abc"[0.. -1] # "abc" I do not have an equivalent expressed numerically in Python. I am forced to write "abc"[0:-1] # "ab" to get anything remotely similar, and yet this is not something that happens when an iterated variable counts down to 0. The problem of counting from zero and subtracting one leading to a bad negative index is eliminated by eliminating the single combination of indices that will cause that issue.

As I suggested in IRC, the patch that would be most pleasing to new users(if it's possible) would be to build in the exact Python syntax and behavior as an alternative and equivalent operator. I think this is only going to come up over and over when we're porting code from languages using half-inclusive ranges, but if we have the exact same idiom in place, it ceases to be an issue, and cut and paste code will work.

Another solution is to use unsigned integers in our slice iteration idioms, triggering OOB when it overflows ;)

dom96 · 2015-01-17T16:56:47Z

I agree with @triplefox. I think that the best solution is to implement Python's slice syntax and to deprecate our current one. That said, I believe that @Araq has his reasons for the way the current slices work and we should wait until he explains them here (I think he explained them on IRC but I wasn't really paying close attention).

Isn't the "abc"[0.. -1] # "abc" equivalent in Python `"abc"[0:] # "ab"``?

oprypin · 2015-01-18T02:27:05Z

Since it seems like Araq is inclined to remove special meaning of negative indices, I can only remind about the options that can ease the transition from them.

Add a special syntax inside square brackets that expands to length of the sequence/whatever in question.

With just removing negative indices someSeq[-2] will need to be replaced by someSeq[someSeq.len-2], and I suggest some syntactic sugar for this. For example, someSeq[^-2], as demonstrated here (but it's probably best to add it directly to the language, and not use these weird templates).

If negative indices are no longer a problem, the syntax s[0 .. <n] will work just fine, including s[0 .. -1], so I don't think anything else needs to be added in regard to this.

reactormonk · 2015-01-18T06:48:12Z

My background is ruby, and I really like the negative indexes.

triplefox · 2015-01-18T10:17:11Z

Something @nothings pointed out is that the behavior [0.. <n] implies is [0..n), but the actual behavior is [0..n-1]. This shows in a mathematical way exactly how our half-inclusive emulation isn't the same, regardless of the "negative index" aspect. From this perspective, a direct way to attack the issue would be to change <'s meaning when placed on the right hand of the range such that it enables a different mode with [0..n) behavior. Then in our error cases, n counts to 0 and never wraps around, and all is well.

However, the effectiveness of this is muddied since < is useful elsewhere as a "-1" operator.

skyfex · 2015-01-18T20:58:45Z

@BlaXpirit "With just removing negative indices someSeq[-2] will need to be replaced by someSeq[someSeq.len-2], and I suggest some syntactic sugar for this. For example, someSeq[^-2], as demonstrated here "

I agree with this. But wouldn't $ be more familiar, as an indication of the end, like in regular expressions?

@reactormonk "My background is ruby, and I really like the negative indexes."

Me too, they're intuitive and makes the code small and neat. But it's not very well thought out, it's strange that Nim shuns unsigned integers so much, but then have this blemish that causes similar issues, probably much more often. Adding one extra small character makes the feature logical and less bug-prone, and it's not that much more syntax.

refi64 · 2015-01-18T21:18:11Z

@skyfex I thought $ was already used as a stringification operator?

@triplefox 'abc'[0:None] == 'abc'

oderwat · 2015-02-20T12:43:16Z

Why not have one use ... (counting like range / slice) and the other .. (leave it as is)?

ghost · 2015-03-14T16:21:47Z

@oderwat But can needless duplication be avoided for every interface which uses slices?

ghost · 2015-03-14T17:35:50Z

this would make a lot of sense to me: "x[0 .. -1] -> []" because of what happens when you iterate through "x.low .. x.high" where x is empty:

var x: seq[int] = @[]

var count = 0
for i in x.low .. x.high:
  count += 1

assert(count == 0)

in this case x.low is 0 and x.high is -1

oderwat · 2015-03-14T18:36:47Z

@EXetoC your example shows that it works. But it mixes range with slice. As I said ... imho would be the best way to introduce another behavior for this.

var x: seq[int] = @[]

var count = 0

for i in x.low .. x.high:
  count += 1

assert(count == 0)

template `...`[T](low, high: T): Slice =
  if high < low:
    -high .. 0
  else:
    low .. high

echo "'", "0123456"[3 .. 5], "'" # '345'
echo "'", "0123456"[3 ... 5], "'" # '345'

echo "'", "0123456"[0 .. -3], "'" # '01234'
echo "'", "0123456"[0 ... -3], "'" # ''


var s = @[0,1,2,3,4,5,6]


write stdout,  '\''

for e in s[0 .. 3]:
  write stdout,  e # '0123'

write stdout,  "'\n'"

for e in s[0 ... 3]:
  write stdout,  e # '0123'

write stdout,  "'\n'"

for e in s[0 .. -2]:
  write stdout,  e # '01245'

write stdout,  "'\n'"

for e in s[0 ... -2]:
  write stdout,  e # ''

write stdout,  "'\n"

Couldn't this kinda go into the stdlib @Araq / @def- or am I missing something (again)?

dom96 · 2015-03-14T18:47:26Z

@oderwat I think that ... should behave exactly in the same manner as Python's [:].

ghost · 2015-03-14T20:48:44Z

@oderwat slices are used in both cases regardless, which is why I think that those cases seem very similar

EDIT: No, that's wrong. The fact that "for a in b..c" calls the .. iterator rather than creating a slice and then invoking 'items' just seems like an odd implementation detail though.

dom96 · 2015-03-14T22:34:44Z

I talked with Araq a bit about the solution he decided on as well as the possible alternatives.

The solution he wants to implement currently is to use ^ to mean "counting from the end of the string" (where string can and will mean any collection type).

The counting will start at 1, so assuming let x = "123456" the following will hold true assert x[^1] == '6'.

Here are some more definitions of how ^ will work:

var x = "123456"
assert x[0 .. 4] == "12345"

assert x[0 .. ^0] == "" # Araq stated that he is not yet sure about this.
assert x[0 .. ^1] == "123456"
assert x[0 .. ^2] == "12345"
assert x[0 .. ^3] == "1234"
assert x[0 .. ^4] == "123"
assert x[0 .. ^5] == "12"
assert x[0 .. ^6] == "1"
assert x[0 .. ^7] == ""

Hopefully this will clear up some confusion.

The current support for negative indices will go away. During a transition period negative indices in slices will result in a compile-time error (where possible) or a runtime error otherwise. After the transition period the use of negative indices will result in "".

Slicing with positive indices will continue to work as previously. @Araq equates the current slice syntax to the .. used in for loops.

For example:

for i in 0 .. 5:
  echo i

Will iterate 6 times. This exactly why "123456789"[0 .. 5] also yields "123456" (6 characters).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative indices with inclusive intervals are dangerous #1979

Negative indices with inclusive intervals are dangerous #1979

oprypin commented Jan 17, 2015

triplefox commented Jan 17, 2015

dom96 commented Jan 17, 2015

oprypin commented Jan 18, 2015

reactormonk commented Jan 18, 2015

triplefox commented Jan 18, 2015

skyfex commented Jan 18, 2015

refi64 commented Jan 18, 2015

oderwat commented Feb 20, 2015

ghost commented Mar 14, 2015

ghost commented Mar 14, 2015

oderwat commented Mar 14, 2015

dom96 commented Mar 14, 2015

ghost commented Mar 14, 2015

dom96 commented Mar 14, 2015

Varriount commented Mar 15, 2015

dom96 commented Mar 15, 2015

ghost commented Mar 15, 2015

oprypin commented Mar 26, 2015

Araq commented Mar 27, 2015

Negative indices with inclusive intervals are dangerous #1979

Negative indices with inclusive intervals are dangerous #1979

Comments

oprypin commented Jan 17, 2015

triplefox commented Jan 17, 2015

dom96 commented Jan 17, 2015

oprypin commented Jan 18, 2015

reactormonk commented Jan 18, 2015

triplefox commented Jan 18, 2015

skyfex commented Jan 18, 2015

refi64 commented Jan 18, 2015

oderwat commented Feb 20, 2015

ghost commented Mar 14, 2015

ghost commented Mar 14, 2015

oderwat commented Mar 14, 2015

dom96 commented Mar 14, 2015

ghost commented Mar 14, 2015

dom96 commented Mar 14, 2015

Varriount commented Mar 15, 2015

dom96 commented Mar 15, 2015

ghost commented Mar 15, 2015

oprypin commented Mar 26, 2015

Araq commented Mar 27, 2015