Array#shift is slow #5126

benoist · 2017-10-15T12:46:49Z

I was using array shift a lot, but I found out it's pretty slow compared to using an index lookup

This is the current implementation

  def shift
    if @size == 0
      yield
    else
      value = @buffer[0]
      @size -= 1
      @buffer.move_from(@buffer + 1, @size)
      (@buffer + @size).clear
      value
    end
  end

If I change this function by

  def shift
    if @size == 0
      yield
    else
      value   = @buffer[0]
      @size   -= 1
      @buffer += 1
      value
    end
  end

It is a lot faster
Shifting Array(String).new(100_000, "a") shows the following speed improvement

                 user     system      total        real
shift        1.600000   0.000000   1.600000 (  1.602299)
each         0.000000   0.000000   0.000000 (  0.000487)
fast shift   0.000000   0.000000   0.000000 (  0.000530)

Question is, can it really be implemented like this or is there in important reason the buffer move and clear is required for this operation? I've made the change in the current master branch and tests all pass.

The text was updated successfully, but these errors were encountered:

jhass · 2017-10-15T12:50:11Z

Well, doesn't your implementation leak memory? That is the array will never shrink again?

However I agree we should only move after some threshold, that is move if there' X free space in front of the buffer and also check that when pushing.

In any case check whether Deque doesn't fit better for your usage pattern.

asterite · 2017-10-15T12:52:22Z

Duplicate of #573

asterite · 2017-10-15T12:57:05Z

We should probably document that Array#shift is O(n)

benoist · 2017-10-15T12:57:39Z

Well Deque has the fast shift implementation, but I need other functions from array first before shifting.

asterite · 2017-10-15T12:57:51Z

@benoist As to why your implementation doesn't work, later when you need to grow the array later you lost the reference to the original pointer to invoke realloc on.

asterite · 2017-10-15T12:59:36Z

Alternatively we could store an offset inside the array, but maybe that's a big penalty for just this use case.

benoist · 2017-10-15T13:02:27Z

converting to deque before shift has some overhead of course, but it's already a 160x speedup

                       user     system      total        real
shift              1.600000   0.000000   1.600000 (  1.606989)
each               0.000000   0.000000   0.000000 (  0.000606)
fast shift         0.000000   0.000000   0.000000 (  0.000649)
convert to deque   0.010000   0.000000   0.010000 (  0.001799)

asterite · 2017-10-15T13:03:04Z

In Ruby they do increase the base pointer... I don't know how they can later reallocate. So it's worth investigating how they do it, it might worth doing the same in our case.

benoist · 2017-10-15T13:21:30Z

If I'm reading the ruby implementation correctly it seems like they are holding off the memmove until inserts and they double the capacity upon the insert. That would give you the move penalty only once to when you actually need to reallocate. If the array is shifted until empty it would never need the reallocation. Just need to make sure it doesn't leak memory like @jhass pointed out.

asterite · 2017-10-15T13:44:19Z

But how do they retain the original pointer and the current pointer?

benoist · 2017-10-15T14:00:23Z

If the array is not shared when shift is called, it will call the ary_make_shared function. I think this makes a copy of the original pointer and freezes it. This pointer is used in ary_modify to determine the shift.

https://github.com/ruby/ruby/blob/trunk/array.c#L590

https://github.com/ruby/ruby/blob/trunk/array.c#L322

RX14 · 2017-10-15T14:00:35Z

Well, what functions do you need that Deque doesn't have? I would have thought that all Array functions could work on Deque once implemented. If Deque is the right datastructure to use here, then you should use it.

Another option is reverse! then pop to use just Array.

benoist · 2017-10-15T14:06:03Z

I'm currently using uniq and sort from array, which might be a lot slower to do in deque as those functions are not just pushes and shifts for which deque is optimized. Thats just an assumption though, I might be wrong here.

RX14 · 2017-10-15T14:12:23Z

Have you tried it? benchmarked it? Don't work on assumptions.

You can always convert the array into a deque rather easilly.

Or if use reverse! and pop to get O(1) element removal from the ends in Array. It just matters which end you remove from.

oprypin · 2017-10-15T14:15:00Z

There's a bit of a problem: Deque.sort! does not exist, because sorting implementation is hardcoded in Array.

RX14 · 2017-10-15T14:18:08Z

@oprypin which is exactly why I asked @benoist what functions he needed from array that were not on deque...

benoist · 2017-10-15T14:19:48Z

@RX14 Yes I know I can convert it to deque or use reverse and pop, but thats not really the issue. If we can make Array#shift a lot faster, isn't that worth investigating? It's not very intuitive to do all these workarounds because Array#shift is slow. The reason I'm making assumptions for the other parts now, is because putting the effort in testing and verifying still leaves this problem unsolved.

If the general conclusion is that Array#shift is as fast as it's going to get for now, then I have no problem if this issue gets closed. :-)

oprypin · 2017-10-15T14:21:26Z

It's not very intuitive to do all these workarounds because Array#shift is slow.

Yeah, exactly. The non-workaround is Deque.

benoist · 2017-10-15T14:22:47Z

I would still consider that a workaround, but if thats just me, I can live with that :-)

RX14 · 2017-10-15T14:45:19Z

@benoist Using the correct datastructure for the job with the correct algorithmic complexity is a workaround?

benoist · 2017-10-15T14:56:11Z

if Deque is the only data structure that should do shifts, then shift should be removed from array. But I don't think that should be the case. If the overall operations on the same data are faster within one data structure, then it makes no sense to change. I think Array#shift can be made faster.

start = Time.now
a = Array.new(100_000, "a")
a.size.times do
  a.shift
end
puts Time.now - start

Crystal: 1.6069080
Ruby: 0.006131

lbguilherme · 2017-10-15T16:37:58Z

Ruby implements an optionally shared buffer for the Array. Some Array instances owns the memory they use, and they will themselves reallocate it and free it when needed. But some Array instances are shared, in which they will keep reference to an external reference-counted buffer and maintain an offset on it. This shared Array will never touch the buffer unless it is the only array referencing it. This also has Copy-on-Write semantics. Then an shared Array tries to modify some data, it will first allocate its own buffer and copy everything to it. This is all done without any external impact, so the user of an Array cannot tell the difference, except with timing.

Here is some proof of it, taking @benoist sample:

start = Time.now
a = Array.new(100_000, "a")
a.size.times do
  a[0] = "b"  # Modifying the array will force the array to own its own memory.
  a.shift
end
puts Time.now - start

Running on my computer: (I did not run crystal with release optimizations!)

Ruby without changing array:     0.007283187
Ruby changing array:             3.364405281
Crystal without changing array:  2.3464860
Crystal changing array:          2.3259410

This kind of optimization is nice because it makes some things faster and you only have to pay some costs if you need to pay the costs. But it also brings inexplicable slowdowns in functions that shouldn't ever be slow. Who could tell that a[0] = "b" would take time proportional to the length of the array?

Refs:
rb_ary_shift calls ary_make_shared if the Array is not yet shared.
Everything that may modify the array call rb_ary_modify, which will make shared Arrays not shared (by allocating memory and copying/moving).
All happens here: https://github.com/ruby/ruby/blob/trunk/array.c

benoist · 2017-10-15T16:46:35Z

@lbguilherme thank you for this explanation! Just to be sure, what would be your suggestion to do with the current Array#shift implementation, leave it as is or change it into something similar to ruby?

lbguilherme · 2017-10-15T17:00:02Z

I'm not sure in which direction Crystal wants to go here. We could:

Leave it as it is now. shift is popping from the front of a container and it is slow. Document it and so that it is now a burden on the user to adapt to another structure (like copying the data into a Deque before) or pay the cost if he knows the array is small. The nice thing about this is that each operation has a clear cost that can be explained in documentation.
Apply extensive optimizations in all sorts of places. This would have an impact on how data is stored and on pretty much all functions. The end result would be that everything is faster for the average user, but this makes the code more complicated to maintain and it also makes behavior less predictable. Why did writing to a single byte of memory cause my program to slowdown 462 times? This is hard to explain.

Still on point 2, there are simpler optimizations, like keeping an buffer and offset and only reallocating the they differ too much. This would be of much less impact than shared buffers, but still.

Just to be clear, there are possible optimizations for many other functions as well, not just shift. Taking a slice could benefit too, for example.

I don't dislike either solution. Maybe a speed focused container could come as a shard so not everybody would have to fear unpredictable performance. Or maybe is should be in the standard Array itself, so that everybody can benefit the performance. I particularly like optimizing for the average user, even with surprising behavior.

I would like to hear from @asterite on this.

asterite · 2017-10-16T00:44:22Z

From a user perspective, I always liked how Array, Hash and String are super generally optimized data structures in Ruby. They can share data with other instances, they are mutable and can be made immutable, they have generally fast operations, etc. Of course that comes at the cost of implementing all of that. Maybe in Ruby it makes more sense because it's a dynamic language and implementing other data structures is inefficient, unless implemented in C. In most (all?) compiled languages you have different data structures, like in Java you have ArrayList, LinkedList, Dequeue, etc. That's nice but it's more cumbersome for the user, because she has to pick a data structure.

So... I don't know. If you need to shift a lot maybe just use another data structure? If sort! is missing in Dequeue maybe we should implement that? Maybe you can just use reverse! and then shift afterwards? It's hard to know without knowing what problem you are trying to solve, @benoist

Maybe adding an offset to Array is acceptable, I don't know. Maybe String could also have shared memory with parent strings to form views. I think all of that falls in the field of optimization, and right now that's not important, because that can always be done later without changing the external API.

RX14 · 2017-10-16T16:34:51Z

This discussion should happen later once optimizations like this come to the top of our agenda, instead of simply features and stability. At that time we'll have a lot more data on array performance in practice.

benoist · 2017-10-16T16:41:32Z

@asterite I've written a simple column storage database with encoding and to convert the columns to rows again I was shifting the values to form the rows. I'm using iterators because the column storage is not always aligned to values that belong to the same row due to compression.

I'm keeping an offset now and iterate through the array instead of shifting. This allows me to use the sort function without an extra conversion to Deque.

ghost · 2017-10-16T20:35:35Z

What I find interesting is that I can read more about how ruby implements arrays here on crystal, than I can find on the ruby issue tracker or what not. Sorry for the distraction. ;-)

funny-falcon · 2017-10-18T18:45:33Z

@asterite there is no need in circular buffer. Just move array's content to the begin of allocation when allocation end is reached. That is the way Ruby's Array works actially. 'Shared array' is just implementation trick.
Crystal can use just pointer to begin of allocation, size of allocation, pointer to first element and size. In other words, only pointer to first element should be added (or pointer to begin of allocation).

asterite · 2017-10-18T18:49:20Z

@funny-falcon Yes, that's a possibility. But if you do shift + push in a loop doesn't it always grow the array infinitely? The current pointer would be increased but never decreased.

funny-falcon · 2017-10-18T18:49:54Z

If Array will be used as Deque, then there should be amortization room, so move doesn't happen too often.

I did fix for Ruby's Array exactly for this scenario (ie Ruby as Deque).

funny-falcon · 2017-10-18T18:52:25Z

@asterite

But if you do shift + push in a loop doesn't it always grow the array infinitely?

As I've said, when push reaches bound of capacity, and there is a room caused by shifts, elements are moved to begin of allocation.

funny-falcon · 2017-10-18T19:01:49Z

If i'm not non-grata here, I can make PR for this.

asterite · 2017-10-18T19:37:25Z

@funny-falcon Sure! No one is non-grata here.

I still have my doubts about this, though: adding four bytes to all arrays for just one method that's maybe not used all the time. And for example in the OPs use case there was really no need for a shift.

oprypin · 2017-10-18T20:13:09Z

Haven't we already mentioned that it's too early for this kind of optimization? Besides, this is nothing more than a workaround.

oprypin · 2017-10-18T20:20:23Z

~~Bleh nevermind, not having to use Deque would be tempting.~~
Uh nevermind on the nevermind. See my next comment

RX14 · 2017-10-18T20:30:58Z

Just a quick question, what downsides are there of a deque over an array with a start offset from the allocation start? Apart from the obvious to_unsafe one, which I don't think is a big deal. I wouldn't have thought that the array accesses would have been slowed down much by the tiny bit of extra maths.

funny-falcon · 2017-10-18T20:51:34Z

@asterite , single benefit of implementing this is getting rid of Deque, and being closer to Ruby.

All other issues could be solved with programmers discipline.

@RX14 simplest way is to not change usage of @buffer, just move it with shift (and, possibly, with unshift). There is only need to save pointer to allocation, or amount of elements @buffer were shifted from allocation (ie offset). In latter way allocation start is calculated as @buffer - @offset.

funny-falcon · 2017-10-18T20:52:52Z

I'd preffer to store pointer to allocation.

oprypin · 2017-10-18T20:54:47Z

First about the upsides of Deque and downsides of this suggested approach (which, frankly, has NOT been clearly described so far).
This kind of workaround cannot achieve its asymptotic complexity. Deque guarantees O(1) complexity when using a balanced number of push+shift or unshift+pop. This kind of shifted array would instead occasionally (with a constant factor!) cause an O(N) operation. So the improvement is by a constant factor (or as some would say, no improvement) over the basic Array.
Sure, it improves the situation of many consequent shift calls (which Deque also covers very well), but why all the complexity just for this case? And now I'm sure that it definitely cannot replace Deque.

The downside of a Deque is, hmm, I don't know, but it probably destroys various kinds of caches so is unforgivably slower at typical operations you do with it.

RX14 · 2017-10-18T21:01:11Z

I really don't think that Deque will be any slower than Array in typical operations. The only downsides I can think of is that the prefetcher will get confused when wrapping, and that the branch prediction might fail too. Hopefully LLVM can compile it to a conditional move to avoid having to stall the pipeline.

In actual fact, I think that 95% of arrays will never wrap. Adding elements to the end is by far the most common array mutation op, and so I think most arrays won't ever wrap around.

oprypin · 2017-10-18T21:04:17Z

In fact, instead of this suggestion, why not actually mix the implementations of Array and Deque? Have a shortcut scenario for when it's aligned. Re-align it to zero when required (to_unsafe) and before expensive operations (sort!).

RX14 · 2017-10-18T21:06:44Z

@oprypin what do you mean by a "shortcut scenario". The check that offset + i < size will always be true and it'll just act like an array?

oprypin · 2017-10-18T21:07:14Z

That or just offset == 0

RX14 · 2017-10-18T21:08:39Z

@oprypin no, why? That means that there's more branches in the non-zero case and for what gain? Avoiding a single 1 cycle addition instruction in a piece of code that is memory bound, not IPC bound anyway?

funny-falcon · 2017-10-18T21:14:31Z

@oprypin , you are not fully correct about complexity. Yes, occasional single operation is O(n). But amortized complexity is still O(1), cause big move is made once in n/c operations, so its amortized complexity is O(n / (n/c)) = O(c) = O(1) (iirc c=8 in Ruby) . I will not be surprised if Array's amortized performance will exactly the same as Deque for push+shift patern.

akzhan · 2017-10-19T00:45:21Z

@funny-falcon what about simple Hash implementation w/o normalization etc.?

Now I think that small PRs will be reviewed quickly.

akzhan · 2017-10-19T00:50:24Z

LBTM, there are no reasons to mix Array with Deque.

funny-falcon · 2017-10-19T05:03:30Z

there are no reasons to mix Array with Deque.

If I did language, I'd repeat Ruby's trick, just "because I can".

But yeah, given Deque already implemented, there is no much point.

RX14 · 2017-10-19T10:04:16Z

@akzhan what are the reasons not to? Deque appears to be a superset of Array's functionality with the cost being in implementation complexity.

akzhan · 2017-10-19T13:42:33Z

It's just my point. I rarely use Arrays in Deque manner.

@buffer

To achieve it, pointer to allocation is preserved, and @buffer is moved instead of moving elements on `#shift`. For crystal-lang#5126

@buffer

To achieve it, pointer to allocation is preserved, and @buffer is moved instead of moving elements on `#shift`. For crystal-lang#5126

funny-falcon · 2017-10-20T09:21:15Z

I've created #5148 . It contains now smallest change in this direction.
To match with Deque, #insert, #delete_at and #unshift also should be improved.
If you strongly think, it is a way to go, then I will work in this direction.
Otherwise lets close this and #5148.

benoist changed the title ~~Array.shift is slow~~ Array#shift is slow Oct 15, 2017

asterite marked this as a duplicate of #573 Oct 15, 2017

funny-falcon mentioned this issue Oct 19, 2017

Make Array to be workable as a queue, ie respect push+shift pattern. #5148

Closed

asterite mentioned this issue Aug 2, 2019

Array: optimize shift #8036

Closed

asterite mentioned this issue Dec 15, 2020

Array: optimize shift and unshift #10081

Merged

asterite closed this as completed in #10081 Dec 18, 2020

Array#shift is slow #5126

Array#shift is slow #5126

Comments

benoist commented Oct 15, 2017

jhass commented Oct 15, 2017 • edited

asterite commented Oct 15, 2017

asterite commented Oct 15, 2017

benoist commented Oct 15, 2017

asterite commented Oct 15, 2017

asterite commented Oct 15, 2017

benoist commented Oct 15, 2017

asterite commented Oct 15, 2017

benoist commented Oct 15, 2017

asterite commented Oct 15, 2017

benoist commented Oct 15, 2017

RX14 commented Oct 15, 2017 • edited

benoist commented Oct 15, 2017

RX14 commented Oct 15, 2017 • edited

oprypin commented Oct 15, 2017

RX14 commented Oct 15, 2017

benoist commented Oct 15, 2017

oprypin commented Oct 15, 2017

benoist commented Oct 15, 2017

RX14 commented Oct 15, 2017 • edited

benoist commented Oct 15, 2017 • edited

lbguilherme commented Oct 15, 2017 • edited

benoist commented Oct 15, 2017

lbguilherme commented Oct 15, 2017

asterite commented Oct 16, 2017

RX14 commented Oct 16, 2017

benoist commented Oct 16, 2017

ghost commented Oct 16, 2017

funny-falcon commented Oct 18, 2017

asterite commented Oct 18, 2017

funny-falcon commented Oct 18, 2017

funny-falcon commented Oct 18, 2017

funny-falcon commented Oct 18, 2017

asterite commented Oct 18, 2017

oprypin commented Oct 18, 2017

oprypin commented Oct 18, 2017 • edited

RX14 commented Oct 18, 2017

funny-falcon commented Oct 18, 2017

funny-falcon commented Oct 18, 2017

oprypin commented Oct 18, 2017

RX14 commented Oct 18, 2017 • edited

oprypin commented Oct 18, 2017

RX14 commented Oct 18, 2017

oprypin commented Oct 18, 2017

RX14 commented Oct 18, 2017 • edited

funny-falcon commented Oct 18, 2017

akzhan commented Oct 19, 2017

akzhan commented Oct 19, 2017

funny-falcon commented Oct 19, 2017

RX14 commented Oct 19, 2017

akzhan commented Oct 19, 2017

funny-falcon commented Oct 20, 2017

jhass commented Oct 15, 2017 •

edited

RX14 commented Oct 15, 2017 •

edited

RX14 commented Oct 15, 2017 •

edited

RX14 commented Oct 15, 2017 •

edited

benoist commented Oct 15, 2017 •

edited

lbguilherme commented Oct 15, 2017 •

edited

oprypin commented Oct 18, 2017 •

edited

RX14 commented Oct 18, 2017 •

edited

RX14 commented Oct 18, 2017 •

edited