Optimize Indexable#sample(n, random) #10247

HertzDevil · 2021-01-13T18:32:50Z

Implements Algorithm L for multiple-element sampling, which reduces the time complexity from O(size) to O(k(1 + log(size / k))). This requires the ability to skip multiple elements at once, which is only doable in Indexable but not Enumerable.

I personally think we may expose Random#rand_exclusive later. It is required here because the algorithm will overflow if #rand returns exactly 0.0.

Sija · 2021-01-13T19:09:53Z

src/indexable.cr

@@ -571,13 +571,25 @@ module Indexable(T)

  # :nodoc:
  def sample(n : Int, random = Random::DEFAULT)
-    return super unless n == 1
+    # Unweighted reservoir sampling (Algorithm L):


Suggested change

# Unweighted reservoir sampling (Algorithm L):

return [] of T if empty?

# Unweighted reservoir sampling (Algorithm L):

asterite · 2021-01-14T12:58:14Z

Do you have benchmarks for this?

HertzDevil · 2021-01-14T16:34:45Z

Fixture:

require "benchmark"

module Indexable(T)
  def sample_old(n : Int, random = Random::DEFAULT)
    if n != 1
      # copied from Enumerable#sample
      ary = Array(T).new(n)
      return ary if n == 0

      each_with_index do |elem, i|
        if i < n
          ary << elem
        else
          j = random.rand(i + 1)
          if j < n
            ary.to_unsafe[j] = elem
          end
        end
      end

      return ary.shuffle!(random)
    end

    if empty?
      [] of T
    else
      [sample(random)]
    end
  end
end

# ARY_SIZE is selected from 10, 100, 1000, 10000
# N is selected from 2, 10, ARY_SIZE / 2

ary = Array(Int32).new(ENV["ARY_SIZE"].to_i) { 0 }
count = ENV["N"].to_i
puts "Sampling #{count} elements from #{ary.class} with size #{ary.size}"
rng = Random::DEFAULT

Benchmark.ips do |x|
  x.report("sample_old") do
    1000.times { ary.sample_old(count, rng) }
  end

  x.report("sample") do
    1000.times { ary.sample(count, rng) }
  end
end

Results:

Sampling 2 elements from Array(Int32) with size 10
sample_old   4.83k (207.07µs) (± 3.94%)  46.9kB/op        fastest
    sample   2.51k (397.98µs) (± 3.40%)  46.9kB/op   1.92× slower

Sampling 2 elements from Array(Int32) with size 100
sample_old 682.80  (  1.46ms) (± 1.00%)  46.9kB/op   1.92× slower
    sample   1.31k (761.36µs) (± 2.50%)  46.9kB/op        fastest

Sampling 2 elements from Array(Int32) with size 1000
sample_old  75.66  ( 13.22ms) (± 2.34%)  46.9kB/op  11.73× slower
    sample 887.65  (  1.13ms) (± 2.33%)  46.9kB/op        fastest

Sampling 2 elements from Array(Int32) with size 10000
sample_old   7.83  (127.66ms) (± 0.72%)  47.1kB/op  85.85× slower
    sample 672.50  (  1.49ms) (± 1.53%)  46.9kB/op        fastest

Sampling 10 elements from Array(Int32) with size 10
sample_old   5.02k (199.08µs) (± 2.04%)  78.1kB/op   1.13× slower
    sample   5.67k (176.25µs) (± 3.15%)  78.1kB/op        fastest

Sampling 10 elements from Array(Int32) with size 100
sample_old 594.11  (  1.68ms) (± 1.25%)  78.1kB/op        fastest
    sample 467.33  (  2.14ms) (± 2.03%)  78.1kB/op   1.27× slower

Sampling 10 elements from Array(Int32) with size 1000
sample_old  73.79  ( 13.55ms) (± 0.92%)  78.1kB/op   3.35× slower
    sample 246.88  (  4.05ms) (± 2.12%)  78.1kB/op        fastest

Sampling 10 elements from Array(Int32) with size 10000
sample_old   7.79  (128.31ms) (± 1.01%)  78.1kB/op  21.42× slower
    sample 166.96  (  5.99ms) (± 1.44%)  78.1kB/op        fastest

Sampling 5 elements from Array(Int32) with size 10
sample_old   4.41k (226.50µs) (±12.50%)  62.6kB/op        fastest
    sample   1.94k (516.04µs) (± 9.24%)  62.6kB/op   2.28× slower

Sampling 50 elements from Array(Int32) with size 100
sample_old 544.07  (  1.84ms) (± 1.32%)  234kB/op        fastest
    sample 258.26  (  3.87ms) (±10.84%)  234kB/op   2.11× slower

Sampling 500 elements from Array(Int32) with size 1000
sample_old  56.94  ( 17.56ms) (± 1.72%)  1.98MB/op        fastest
    sample  27.54  ( 36.31ms) (± 1.92%)  1.98MB/op   2.07× slower

Sampling 5000 elements from Array(Int32) with size 10000
sample_old   5.62  (177.99ms) (± 7.68%)  19.1MB/op        fastest
    sample   2.69  (371.93ms) (± 5.97%)  19.1MB/op   2.09× slower

A few takeaways here:

The new algorithm is indeed much faster than the old one when size is very large and n is small.
When their difference is not so huge, the extra RNG calls and the mathematical functions pose a bottleneck so the old algorithm is faster even on Indexable. (The case with size = 10 and n = 10 is flawed; there shouldn't be any differences if both algorithms simply forward to shuffle.)
n can itself be O(size); in this case both algorithms are O(size) and the existing one once again beats the new algorithm if the n / size ratio is high enough. The breakaway ratio seems to be around 0.075, but this is only for Int32 elements, and larger structs might be a different story. The most important case would be Pointer(Void) since that has the same size as any reference object.
The results here are valid only for Indexable types that have a contiguous storage with constant-size elements. The same cannot be said for BitArray and Tuple for example.

Maybe we should employ some kind of heuristic here to select with algorithm to use. That requires a lot more research.

Optimize Indexable#sample(n, random)

b2d0b4b

Sija reviewed Jan 13, 2021

View reviewed changes

straight-shoota added kind:feature performance topic:stdlib:collection labels Jan 13, 2021

straight-shoota approved these changes Jan 13, 2021

View reviewed changes

HertzDevil marked this pull request as draft January 14, 2021 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Indexable#sample(n, random) #10247

Optimize Indexable#sample(n, random) #10247

HertzDevil commented Jan 13, 2021

Sija Jan 13, 2021

asterite commented Jan 14, 2021

HertzDevil commented Jan 14, 2021

Optimize Indexable#sample(n, random) #10247

Are you sure you want to change the base?

Optimize Indexable#sample(n, random) #10247

Conversation

HertzDevil commented Jan 13, 2021

Sija Jan 13, 2021

Choose a reason for hiding this comment

asterite commented Jan 14, 2021

HertzDevil commented Jan 14, 2021