Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative indices with inclusive intervals are dangerous #1979

Closed
oprypin opened this issue Jan 17, 2015 · 19 comments
Closed

Negative indices with inclusive intervals are dangerous #1979

oprypin opened this issue Jan 17, 2015 · 19 comments

Comments

@oprypin
Copy link
Contributor

oprypin commented Jan 17, 2015

When asked to write a function to return first N items of a seq, almost everyone would write something like this:

proc first[T](s: seq[T]; n: int): seq[T] =
  s[0 .. <n]

@[1,2,3,4,5].first(2) == @[1,2] and all is great... Except @[1,2,3,4,5].first(0) == @[1,2,3,4,5]. Can you see why this is? How long would it take you to detect such a bug?

To elaborate, what happens is s[0 .. -1], and -1 corresponds to the last item...

That's okay. To fix this, what is the first thing that comes to your mind? Maybe this (and I'm not making this up, someone actually did come up with this):

  s[0 .. max(0, n-1)]

But this would just return the first item when asked for 0 items.

So, the correct version will be...

proc first[T](s: seq[T]; n: int): seq[T] =
  if n > 0: s[0 .. <n]
  else: @[]

This issue must be considered every single time you take a slice of a seq. Same thing goes for strings and probably many other things, but at least there is a substr function without this flaw.

And this is not something obscure or rare. I ran into this at least twice in the last few days. Last time, when translating a library, it took me 1-2 hours to find:
left[:pos] + left[pos + 1:] translated to left[0 .. <pos] & left[pos+1 .. left.high].

Just think of it, the relative rarity of this happening only makes it so much harder to find. You can only bet this bug is present in a lot of existing code, quite possibly in more code than benefits from the negative indices.

It is impossible to solve this while keeping backwards compatibility. My suggestion is to remove negative indices in slices, and possibly introduce an explicit syntax for them, maybe something like this.

And for now, this is a must-have function in my toolbelt...

proc sub*[T](s: seq[T], a, b: int): seq[T] =
  ## Items from `a` to `b` non-inclusive
  if a < b: s[a .. <b]
  else: @[]
@triplefox
Copy link

Our existing behavior optimizes towards the case where we use .. high or .. <len, in which case the bug doesn't exist because at len 0 the complete sequence is also the empty sequence.

The bug category is most prominent when we are working in 0.. <n where n is some iterated variable. Users of slice will expect their range to count to some positive index on the right hand side, resulting in output which grows smaller as n grows smaller. And in cases where the range is 0.. <n and n goes to 1, the assumption holds. It explodes specifically when n continues downwards to 0, expecting the empty sequence, and instead gets the complete sequence. That is tricky behavior.

So the problem is not specifically with negative indexes, because Python has those, but with the ends of the range and how they interact with our assumptions surrounding 0 and the empty sequence.

We can see that in these examples I can make exact equivalents for each:

print("abc")[0:3] # abc
print("abc")[0:-1] # ab
print("abc")[0:1] # a
print("abc")[1:3] # bc
print("abc")[1:-1] # b
print("abc")[1:1] #
echo "abc"[0.. 2] # abc
echo "abc"[0.. -2] # ab
echo "abc"[0.. 0] # a
echo "abc"[1.. 2] # bc
echo "abc"[1.. -2] # b
echo "abc"[1.. 0] #

But if I try to write Nim's "abc"[0.. -1] # "abc" I do not have an equivalent expressed numerically in Python. I am forced to write "abc"[0:-1] # "ab" to get anything remotely similar, and yet this is not something that happens when an iterated variable counts down to 0. The problem of counting from zero and subtracting one leading to a bad negative index is eliminated by eliminating the single combination of indices that will cause that issue.

As I suggested in IRC, the patch that would be most pleasing to new users(if it's possible) would be to build in the exact Python syntax and behavior as an alternative and equivalent operator. I think this is only going to come up over and over when we're porting code from languages using half-inclusive ranges, but if we have the exact same idiom in place, it ceases to be an issue, and cut and paste code will work.

Another solution is to use unsigned integers in our slice iteration idioms, triggering OOB when it overflows ;)

@dom96
Copy link
Contributor

dom96 commented Jan 17, 2015

I agree with @triplefox. I think that the best solution is to implement Python's slice syntax and to deprecate our current one. That said, I believe that @Araq has his reasons for the way the current slices work and we should wait until he explains them here (I think he explained them on IRC but I wasn't really paying close attention).

Isn't the "abc"[0.. -1] # "abc" equivalent in Python `"abc"[0:] # "ab"``?

@oprypin
Copy link
Contributor Author

oprypin commented Jan 18, 2015

Since it seems like Araq is inclined to remove special meaning of negative indices, I can only remind about the options that can ease the transition from them.

Add a special syntax inside square brackets that expands to length of the sequence/whatever in question.

With just removing negative indices someSeq[-2] will need to be replaced by someSeq[someSeq.len-2], and I suggest some syntactic sugar for this. For example, someSeq[^-2], as demonstrated here (but it's probably best to add it directly to the language, and not use these weird templates).

If negative indices are no longer a problem, the syntax s[0 .. <n] will work just fine, including s[0 .. -1], so I don't think anything else needs to be added in regard to this.

@reactormonk
Copy link
Contributor

My background is ruby, and I really like the negative indexes.

@triplefox
Copy link

Something @nothings pointed out is that the behavior [0.. <n] implies is [0..n), but the actual behavior is [0..n-1]. This shows in a mathematical way exactly how our half-inclusive emulation isn't the same, regardless of the "negative index" aspect. From this perspective, a direct way to attack the issue would be to change <'s meaning when placed on the right hand of the range such that it enables a different mode with [0..n) behavior. Then in our error cases, n counts to 0 and never wraps around, and all is well.

However, the effectiveness of this is muddied since < is useful elsewhere as a "-1" operator.

@skyfex
Copy link
Contributor

skyfex commented Jan 18, 2015

@BlaXpirit "With just removing negative indices someSeq[-2] will need to be replaced by someSeq[someSeq.len-2], and I suggest some syntactic sugar for this. For example, someSeq[^-2], as demonstrated here "

I agree with this. But wouldn't $ be more familiar, as an indication of the end, like in regular expressions?

@reactormonk "My background is ruby, and I really like the negative indexes."

Me too, they're intuitive and makes the code small and neat. But it's not very well thought out, it's strange that Nim shuns unsigned integers so much, but then have this blemish that causes similar issues, probably much more often. Adding one extra small character makes the feature logical and less bug-prone, and it's not that much more syntax.

@refi64
Copy link
Contributor

refi64 commented Jan 18, 2015

@skyfex I thought $ was already used as a stringification operator?

@triplefox 'abc'[0:None] == 'abc'

@oderwat
Copy link
Contributor

oderwat commented Feb 20, 2015

Why not have one use ... (counting like range / slice) and the other .. (leave it as is)?

@ghost
Copy link

ghost commented Mar 14, 2015

@oderwat But can needless duplication be avoided for every interface which uses slices?

@ghost
Copy link

ghost commented Mar 14, 2015

this would make a lot of sense to me: "x[0 .. -1] -> []" because of what happens when you iterate through "x.low .. x.high" where x is empty:

var x: seq[int] = @[]

var count = 0
for i in x.low .. x.high:
  count += 1

assert(count == 0)

in this case x.low is 0 and x.high is -1

@oderwat
Copy link
Contributor

oderwat commented Mar 14, 2015

@EXetoC your example shows that it works. But it mixes range with slice. As I said ... imho would be the best way to introduce another behavior for this.

var x: seq[int] = @[]

var count = 0

for i in x.low .. x.high:
  count += 1

assert(count == 0)

template `...`[T](low, high: T): Slice =
  if high < low:
    -high .. 0
  else:
    low .. high

echo "'", "0123456"[3 .. 5], "'" # '345'
echo "'", "0123456"[3 ... 5], "'" # '345'

echo "'", "0123456"[0 .. -3], "'" # '01234'
echo "'", "0123456"[0 ... -3], "'" # ''


var s = @[0,1,2,3,4,5,6]


write stdout,  '\''

for e in s[0 .. 3]:
  write stdout,  e # '0123'

write stdout,  "'\n'"

for e in s[0 ... 3]:
  write stdout,  e # '0123'

write stdout,  "'\n'"

for e in s[0 .. -2]:
  write stdout,  e # '01245'

write stdout,  "'\n'"

for e in s[0 ... -2]:
  write stdout,  e # ''

write stdout,  "'\n"

Couldn't this kinda go into the stdlib @Araq / @def- or am I missing something (again)?

@dom96
Copy link
Contributor

dom96 commented Mar 14, 2015

@oderwat I think that ... should behave exactly in the same manner as Python's [:].

@ghost
Copy link

ghost commented Mar 14, 2015

@oderwat slices are used in both cases regardless, which is why I think that those cases seem very similar

EDIT: No, that's wrong. The fact that "for a in b..c" calls the .. iterator rather than creating a slice and then invoking 'items' just seems like an odd implementation detail though.

@dom96
Copy link
Contributor

dom96 commented Mar 14, 2015

I talked with Araq a bit about the solution he decided on as well as the possible alternatives.

The solution he wants to implement currently is to use ^ to mean "counting from the end of the string" (where string can and will mean any collection type).

The counting will start at 1, so assuming let x = "123456" the following will hold true assert x[^1] == '6'.

Here are some more definitions of how ^ will work:

var x = "123456"
assert x[0 .. 4] == "12345"

assert x[0 .. ^0] == "" # Araq stated that he is not yet sure about this.
assert x[0 .. ^1] == "123456"
assert x[0 .. ^2] == "12345"
assert x[0 .. ^3] == "1234"
assert x[0 .. ^4] == "123"
assert x[0 .. ^5] == "12"
assert x[0 .. ^6] == "1"
assert x[0 .. ^7] == ""

Hopefully this will clear up some confusion.


The current support for negative indices will go away. During a transition period negative indices in slices will result in a compile-time error (where possible) or a runtime error otherwise. After the transition period the use of negative indices will result in "".

Slicing with positive indices will continue to work as previously. @Araq equates the current slice syntax to the .. used in for loops.

For example:

for i in 0 .. 5:
  echo i

Will iterate 6 times. This exactly why "123456789"[0 .. 5] also yields "123456" (6 characters).


Other solutions we discussed included:

Copying Python's :

This was rejected by @Araq because:

  • : is used in Nim in too many contexts already.
  • x[-1] will still not be possible and so will not match Python.

Using x{-1} for negative indices

  • @Araq prefers to leave this operator for libraries to decide what they should do with it.

@Varriount
Copy link
Contributor

Any particular reason ^1 signifies the end of the string, rather than ^0 ?

@dom96
Copy link
Contributor

dom96 commented Mar 15, 2015

@Varriount Likely because -1 currently signifies the end of the string.

@ghost
Copy link

ghost commented Mar 15, 2015

@dom96 didn't most people agree that it was an odd shortcut though? And is it not more common to specify up to the last element (in other words, 'high')? But I think it only matters if ^ can be used without an argument, otherwise it doesn't matter much whether one needs to do x[0 .. ^0] or x[0 .. ^1], in the case where ^ is short for x.high and x.len respectively.

@oprypin
Copy link
Contributor Author

oprypin commented Mar 26, 2015

Araq voiced the plans about this: http://irclogs.nim-lang.org/26-03-2015.html

Negative indices will give a warning, and will be disabled later.

New syntax will be introduced: inside index square brackets ^ will have a special meaning.
arr[^n] will be rewritten into arr[arr.len - n], so arr[^1] is the last item and arr[^5] is the fifth item from the right.

This operation will work only if len == high-1 (or low == 0), to avoid accidents with array[a..b] etc.

@Araq
Copy link
Member

Araq commented Mar 27, 2015

Negative indexing is gone, a[^1] works instead. Be happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants