Reduce allocations in broadcast #19639

pabloferz · 2016-12-17T23:18:29Z

With this PR

julia> function foo(x, n)
           for i = 1:n
               broadcast!(x -> 2x+1, x, x)
           end
           return x
       end
foo (generic function with 1 method)

julia> @time foo([0,0,0], 10^4);
  0.027883 seconds (25.78 k allocations: 1.108 MB)

julia> @time foo([0,0,0], 10^4);
  0.000121 seconds (6 allocations: 288 bytes)

julia> using BenchmarkTools

julia> @benchmark [1,2,3] .+ 1
BenchmarkTools.Trial: 
  memory estimate:  224.00 bytes
  allocs estimate:  2
  --------------
  minimum time:     63.186 ns (0.00% GC)
  median time:      67.704 ns (0.00% GC)
  mean time:        74.881 ns (7.55% GC)
  maximum time:     841.374 ns (89.36% GC)
  --------------
  samples:          10000
  evals/sample:     982
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark broadcast(+, [1,2,3], 1)
BenchmarkTools.Trial: 
  memory estimate:  224.00 bytes
  allocs estimate:  2
  --------------
  minimum time:     65.979 ns (0.00% GC)
  median time:      71.068 ns (0.00% GC)
  mean time:        78.431 ns (7.44% GC)
  maximum time:     1.139 μs (92.01% GC)
  --------------
  samples:          10000
  evals/sample:     982
  time tolerance:   5.00%
  memory tolerance: 1.00%

Compare this with #19608 (comment) and #16285 (comment)

martinholters · 2016-12-18T12:14:44Z

Are the changes to the sparse matrix code related to the addressed problem?

KristofferC · 2016-12-18T12:21:47Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2016-12-18T15:52:55Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

nalimilan · 2016-12-18T17:23:33Z

base/sparse/sparsematrix.jl

@@ -1403,13 +1403,19 @@ sparse(S::UniformScaling, m::Integer, n::Integer=m) = speye_scaled(S.λ, m, n)
 # map/map! entry points
 function map!{Tf,N}(f::Tf, C::SparseMatrixCSC, A::SparseMatrixCSC, Bs::Vararg{SparseMatrixCSC,N})
    _checksameshape(C, A, Bs...)
+    return map_nocheck!(f, C, A, Bs...)


Maybe this could be tied to the bounds checking mechanism? Or would it be an abuse?

Sacha0

This looks great!

Perhaps having @timholy sign off on the inlining changes would be prudent?

The broadcast-fusion and linalg-arithmetic benchmark improvements are lovely. The scalar-floatexp-ldexp, sparse-arithmetic-unary minus, and string-join regressions should be noise. Might the linalg-factorization and array regressions be real?

I agree with @martinholters, the sparse matrix changes are orthogonal to the other changes in this pull request. I would prefer those changes appear in a separate pull request. (I might advocate holding off with that pull request for now, having left that TODO outstanding for two reasons: I wasn't certain whether avoiding the redundant shape check is worth the extra code complexity, and I plan to restructure that code somewhat in the near future in any case.)

Thanks again @pabloferz!

pabloferz · 2016-12-20T00:36:29Z

I removed the sparse related changes. The initial changes seemed to affect somehow some the svd and eigvecs methods for Diagonal and Bidiagonal so I took the chance too also improve them. Should be better now.

The reason for which there was a @noinline in the _broadcast! methods is no longer a concern so I don't think there's any risk in changing them.

Sacha0 · 2016-12-20T01:00:25Z

base/broadcast.jl

+function broadcast_t(f, ::Type{Any}, T::Type, shape, iter, As...)
+    if isempty(iter)
+        return similar(Array{T}, shape)
+    end


Why move the code handling the empty case inside this method and add a second type argument?

Ups. I was playing around reorganizing the code and left this, but shouldn't be necessary. I'll put it back as it was.

Sacha0 · 2016-12-20T02:17:38Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2016-12-20T05:34:57Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels

stevengj · 2016-12-20T13:53:06Z

Combined with dot ops, we now have:

julia> function bar(x, n)
                  for i = 1:n
                      x .= 2 .* x .+ 1
                  end
                  return x
                end
bar (generic function with 1 method)

julia> @time bar([0,0,0], 10^4); # warmup
  0.020100 seconds (17.47 k allocations: 713.317 KB)

julia> @time bar([0,0,0], 10^4);
  0.000226 seconds (6 allocations: 288 bytes)

pabloferz force-pushed the pz/inline-broadcast branch from e55ed1c to 4134488 Compare December 18, 2016 00:20

kshyatt requested a review from Sacha0 December 18, 2016 02:29

kshyatt added the domain:broadcast Applying a function over a collection label Dec 18, 2016

pabloferz mentioned this pull request Dec 18, 2016

unnecessary allocations in broadcast!? #19608

Closed

nalimilan reviewed Dec 18, 2016

View reviewed changes

Sacha0 reviewed Dec 18, 2016

View reviewed changes

pabloferz force-pushed the pz/inline-broadcast branch from 4134488 to 361db25 Compare December 20, 2016 00:05

Sacha0 reviewed Dec 20, 2016

View reviewed changes

pabloferz added 3 commits December 19, 2016 19:37

Reduce allocations in broadcast

4e63b89

svd(::Diagonal) speedup

e5c8309

eigvecs(::Bidiagonal) speedup

4c964d3

pabloferz force-pushed the pz/inline-broadcast branch from 361db25 to 4c964d3 Compare December 20, 2016 01:38

stevengj merged commit 99b6a8c into JuliaLang:master Dec 20, 2016

pabloferz deleted the pz/inline-broadcast branch December 20, 2016 13:58

This was referenced Dec 20, 2016

Vectorization Roadmap #16285

Closed

Reconsider uses of promote_op #19669

Closed

remove obsolete performance workaround using broadcast_elwise_op #19672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations in broadcast #19639

Reduce allocations in broadcast #19639

pabloferz commented Dec 17, 2016

martinholters commented Dec 18, 2016

KristofferC commented Dec 18, 2016

nanosoldier commented Dec 18, 2016

nalimilan Dec 18, 2016

Sacha0 left a comment •

edited

Loading

pabloferz commented Dec 20, 2016 •

edited

Loading

Sacha0 Dec 20, 2016

pabloferz Dec 20, 2016

Sacha0 commented Dec 20, 2016

nanosoldier commented Dec 20, 2016

stevengj commented Dec 20, 2016 •

edited

Loading

Reduce allocations in broadcast #19639

Reduce allocations in broadcast #19639

Conversation

pabloferz commented Dec 17, 2016

martinholters commented Dec 18, 2016

KristofferC commented Dec 18, 2016

nanosoldier commented Dec 18, 2016

nalimilan Dec 18, 2016

Choose a reason for hiding this comment

Sacha0 left a comment • edited Loading

Choose a reason for hiding this comment

pabloferz commented Dec 20, 2016 • edited Loading

Sacha0 Dec 20, 2016

Choose a reason for hiding this comment

pabloferz Dec 20, 2016

Choose a reason for hiding this comment

Sacha0 commented Dec 20, 2016

nanosoldier commented Dec 20, 2016

stevengj commented Dec 20, 2016 • edited Loading

Sacha0 left a comment •

edited

Loading

pabloferz commented Dec 20, 2016 •

edited

Loading

stevengj commented Dec 20, 2016 •

edited

Loading