Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster iteration #38

Closed
wants to merge 5 commits into from
Closed

Faster iteration #38

wants to merge 5 commits into from

Conversation

christopherzimmerman
Copy link
Member

Before merging, need to:

[ ] Remove old mapping methods
[ ] Port over matrix/axis iters to new yielding macro methods
[ ] Determine if unsafe_iter is needed, or if a workaround is possible.

@jkthorne
Copy link

just curious since this PR is about performance did you run any benchmarks with this?

@christopherzimmerman
Copy link
Member Author

christopherzimmerman commented Sep 14, 2020

@wontruefree I am still optimizing a bit, mostly just removing overhead and speeding up lookups, but here's a small benchmark of 2m elements (1m for the strided tensors).

require "./num"
require "benchmark"

n = 2000000

a = Tensor.random(0.0...1.0, [n])
b = Tensor.random(0.0...1.0, [n])
c = Tensor.random(0.0...1.0, [n])

a_strided = a[{..., 2}]
b_strided = b[{..., 2}]
c_strided = c[{..., 2}]

Benchmark.ips do |bench|
  bench.report("old map") { a.map { |i| i / 2 } }
  bench.report("new map") { a.map_new { |i| i / 2 } }

  bench.report("old map2") { a.map(b) { |i, j| i + j / 2 } }
  bench.report("new map2") { a.map_new(b) { |i, j| i + j / 2 } }

  bench.report("old map3") { a.map(b, c) { |i, j, k| i + j * 2 - k } }
  bench.report("new_map3") { a.map_new(b, c) { |i, j, k| i + k * 2 - k } }

  bench.report("old map strided") { a_strided.map { |i| i / 2 } }
  bench.report("new map strided") { a_strided.map_new { |i| i / 2 } }

  bench.report("old map2 strided") { a_strided.map(b_strided) { |i, j| i + j / 2 } }
  bench.report("new map2 strided") { a_strided.map_new(b_strided) { |i, j| i + j / 2 } }

  bench.report("old map3 strided") { a_strided.map(b_strided, c_strided) { |i, j, k| i + j * 2 - k } }
  bench.report("new_map3 strided") { a_strided.map_new(b_strided, c_strided) { |i, j, k| i + k * 2 - k } }
end
         old map 187.72  (  5.33ms) (± 2.30%)  15.3MB/op   1.99× slower
         new map 295.96  (  3.38ms) (± 2.39%)  15.3MB/op   1.26× slower

        old map2 179.66  (  5.57ms) (± 3.07%)  15.3MB/op   2.08× slower
        new map2 260.18  (  3.84ms) (± 1.92%)  15.3MB/op   1.44× slower

        old map3 175.49  (  5.70ms) (± 2.13%)  15.3MB/op   2.13× slower
        new_map3 253.36  (  3.95ms) (± 2.07%)  15.3MB/op   1.47× slower

 old map strided 255.69  (  3.91ms) (± 2.49%)  7.63MB/op   1.46× slower
 new map strided 373.54  (  2.68ms) (± 1.47%)  7.63MB/op        fastest

old map2 strided 183.26  (  5.46ms) (± 2.56%)  7.63MB/op   2.04× slower
new map2 strided 296.31  (  3.37ms) (± 1.62%)  7.63MB/op   1.26× slower

old map3 strided 159.19  (  6.28ms) (± 1.45%)  7.63MB/op   2.35× slower
new_map3 strided 220.19  (  4.54ms) (± 2.02%)  7.63MB/op   1.70× slower

@jkthorne
Copy link

That seems like a pretty big improvement!

@christopherzimmerman christopherzimmerman mentioned this pull request Sep 17, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants