Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Indexable#sample(n, random) #10247

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

HertzDevil
Copy link
Contributor

Implements Algorithm L for multiple-element sampling, which reduces the time complexity from O(size) to O(k(1 + log(size / k))). This requires the ability to skip multiple elements at once, which is only doable in Indexable but not Enumerable.

I personally think we may expose Random#rand_exclusive later. It is required here because the algorithm will overflow if #rand returns exactly 0.0.

@@ -571,13 +571,25 @@ module Indexable(T)

# :nodoc:
def sample(n : Int, random = Random::DEFAULT)
return super unless n == 1
# Unweighted reservoir sampling (Algorithm L):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Unweighted reservoir sampling (Algorithm L):
return [] of T if empty?
# Unweighted reservoir sampling (Algorithm L):

@asterite
Copy link
Member

Do you have benchmarks for this?

@HertzDevil
Copy link
Contributor Author

Fixture:

require "benchmark"

module Indexable(T)
  def sample_old(n : Int, random = Random::DEFAULT)
    if n != 1
      # copied from Enumerable#sample
      ary = Array(T).new(n)
      return ary if n == 0

      each_with_index do |elem, i|
        if i < n
          ary << elem
        else
          j = random.rand(i + 1)
          if j < n
            ary.to_unsafe[j] = elem
          end
        end
      end

      return ary.shuffle!(random)
    end

    if empty?
      [] of T
    else
      [sample(random)]
    end
  end
end

# ARY_SIZE is selected from 10, 100, 1000, 10000
# N is selected from 2, 10, ARY_SIZE / 2

ary = Array(Int32).new(ENV["ARY_SIZE"].to_i) { 0 }
count = ENV["N"].to_i
puts "Sampling #{count} elements from #{ary.class} with size #{ary.size}"
rng = Random::DEFAULT

Benchmark.ips do |x|
  x.report("sample_old") do
    1000.times { ary.sample_old(count, rng) }
  end

  x.report("sample") do
    1000.times { ary.sample(count, rng) }
  end
end

Results:

Sampling 2 elements from Array(Int32) with size 10
sample_old   4.83k (207.07µs) (± 3.94%)  46.9kB/op        fastest
    sample   2.51k (397.98µs) (± 3.40%)  46.9kB/op   1.92× slower

Sampling 2 elements from Array(Int32) with size 100
sample_old 682.80  (  1.46ms) (± 1.00%)  46.9kB/op   1.92× slower
    sample   1.31k (761.36µs) (± 2.50%)  46.9kB/op        fastest

Sampling 2 elements from Array(Int32) with size 1000
sample_old  75.66  ( 13.22ms) (± 2.34%)  46.9kB/op  11.73× slower
    sample 887.65  (  1.13ms) (± 2.33%)  46.9kB/op        fastest

Sampling 2 elements from Array(Int32) with size 10000
sample_old   7.83  (127.66ms) (± 0.72%)  47.1kB/op  85.85× slower
    sample 672.50  (  1.49ms) (± 1.53%)  46.9kB/op        fastest
Sampling 10 elements from Array(Int32) with size 10
sample_old   5.02k (199.08µs) (± 2.04%)  78.1kB/op   1.13× slower
    sample   5.67k (176.25µs) (± 3.15%)  78.1kB/op        fastest

Sampling 10 elements from Array(Int32) with size 100
sample_old 594.11  (  1.68ms) (± 1.25%)  78.1kB/op        fastest
    sample 467.33  (  2.14ms) (± 2.03%)  78.1kB/op   1.27× slower

Sampling 10 elements from Array(Int32) with size 1000
sample_old  73.79  ( 13.55ms) (± 0.92%)  78.1kB/op   3.35× slower
    sample 246.88  (  4.05ms) (± 2.12%)  78.1kB/op        fastest

Sampling 10 elements from Array(Int32) with size 10000
sample_old   7.79  (128.31ms) (± 1.01%)  78.1kB/op  21.42× slower
    sample 166.96  (  5.99ms) (± 1.44%)  78.1kB/op        fastest
Sampling 5 elements from Array(Int32) with size 10
sample_old   4.41k (226.50µs) (±12.50%)  62.6kB/op        fastest
    sample   1.94k (516.04µs) (± 9.24%)  62.6kB/op   2.28× slower

Sampling 50 elements from Array(Int32) with size 100
sample_old 544.07  (  1.84ms) (± 1.32%)  234kB/op        fastest
    sample 258.26  (  3.87ms) (±10.84%)  234kB/op   2.11× slower

Sampling 500 elements from Array(Int32) with size 1000
sample_old  56.94  ( 17.56ms) (± 1.72%)  1.98MB/op        fastest
    sample  27.54  ( 36.31ms) (± 1.92%)  1.98MB/op   2.07× slower

Sampling 5000 elements from Array(Int32) with size 10000
sample_old   5.62  (177.99ms) (± 7.68%)  19.1MB/op        fastest
    sample   2.69  (371.93ms) (± 5.97%)  19.1MB/op   2.09× slower

A few takeaways here:

  • The new algorithm is indeed much faster than the old one when size is very large and n is small.
  • When their difference is not so huge, the extra RNG calls and the mathematical functions pose a bottleneck so the old algorithm is faster even on Indexable. (The case with size = 10 and n = 10 is flawed; there shouldn't be any differences if both algorithms simply forward to shuffle.)
  • n can itself be O(size); in this case both algorithms are O(size) and the existing one once again beats the new algorithm if the n / size ratio is high enough. The breakaway ratio seems to be around 0.075, but this is only for Int32 elements, and larger structs might be a different story. The most important case would be Pointer(Void) since that has the same size as any reference object.
  • The results here are valid only for Indexable types that have a contiguous storage with constant-size elements. The same cannot be said for BitArray and Tuple for example.

Maybe we should employ some kind of heuristic here to select with algorithm to use. That requires a lot more research.

@HertzDevil HertzDevil marked this pull request as draft January 14, 2021 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants