Retrieving elements from an inhomogeneous tuple is slow #19850

wsshin · 2017-01-04T00:14:43Z

Consider the following function:

julia> function tuple_elems_slow(t::NTuple{3,Real})
     const i1 = 1; const i2 = 2; const i3 = 3
     t1, t2, t3 = t[i1], t[i2], t[i3]
end

The performance of this function varies a lot depending on whether t is homogeneous (e.g., t = (0,0,0) with all Int64 elements) or inhomogeneous (e.g., t = (0.,0,0) with the first element being Float64). When t is inhomogeneous, the function uses extra allocations:

julia> using BenchmarkTools

julia> @benchmark tuple_elems_slow((0,0,0))
BenchmarkTools.Trial:
  memory estimate:  0.00 bytes
  allocs estimate:  0
  median time:      5.533 ns (0.00% GC)

julia> @benchmark tuple_elems_slow((0.,0,0))
BenchmarkTools.Trial:
  memory estimate:  112.00 bytes
  allocs estimate:  4
  median time:      116.735 ns (0.00% GC)

This is not caused by type instability, because @code_warntype shows that the function is type-stable for both executions.

In comparison, if integer linerals are used to index tuple elements, the function does not experience such performance degradation even if the tuple is inhomogeneous:

julia> function tuple_elems_fast(t::NTuple{3,Real})
     t1, t2, t3 = t[1], t[2], t[3]
end
tuple_elems_fast (generic function with 1 method)

julia> @benchmark tuple_elems_fast((0.,0,0))
BenchmarkTools.Trial:
  memory estimate:  0.00 bytes
  allocs estimate:  0
  median time:      5.581 ns (0.00% GC)

julia> @benchmark tuple_elems_fast((0,0,0))
BenchmarkTools.Trial:
  memory estimate:  0.00 bytes
  allocs estimate:  0
  median time:      5.537 ns (0.00% GC)

Here is the versioninfo() result:

julia> versioninfo()
Julia Version 0.6.0-dev.1877
Commit ea73c84* (2017-01-03 20:54 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin16.0.0)
  CPU: Intel(R) Core(TM)2 Duo CPU     T9900  @ 3.06GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Penryn)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, penryn)

I could be wrong, but I think this problem was introduced recently. I don't think I had this problem on the master branch at least about a month ago.

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2017-01-04T00:42:09Z

Thanks. Would you be willing to bisect this to see if it regressed in the past month or so?

giordano · 2017-01-04T00:56:58Z

In Julia 0.5 there was the same behavior:

julia> @benchmark tuple_elems_slow((0.,0,0))
BenchmarkTools.Trial: 
  memory estimate:  112.00 bytes
  allocs estimate:  4
  --------------
  minimum time:     72.274 ns (0.00% GC)
  median time:      75.505 ns (0.00% GC)
  mean time:        89.214 ns (13.57% GC)
  maximum time:     2.915 μs (96.81% GC)
  --------------
  samples:          10000
  evals/sample:     974
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark tuple_elems_slow((0,0,0))
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.271 ns (0.00% GC)
  median time:      3.375 ns (0.00% GC)
  mean time:        3.389 ns (0.00% GC)
  maximum time:     15.121 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%

Hope this helps.

JeffBezanson · 2017-01-04T01:03:59Z

Ok --- I can understand why this might be slow, but I'm definitely concerned if it regressed recently.

andyferris · 2017-01-04T01:50:17Z

Why "might" this be slow? If it is type-stable I don't see why it would be allocating.

julia> @code_warntype tuple_elems_slow((0.0,0,0))
Variables:
  #self#::#tuple_elems_slow
  t::Tuple{Float64,Int64,Int64}
  i1::Int64
  i2::Int64
  i3::Int64
  t1::Float64
  t2::Int64
  t3::Int64

Body:
  begin 
      i1::Int64 = 1 # line 2:
      i2::Int64 = 2 # line 2:
      i3::Int64 = 3 # line 3:
      SSAValue(0) = (Base.getfield)(t::Tuple{Float64,Int64,Int64},i1::Int64)::Float64
      SSAValue(1) = (Base.getfield)(t::Tuple{Float64,Int64,Int64},i2::Int64)::Int64
      SSAValue(2) = (Base.getfield)(t::Tuple{Float64,Int64,Int64},i3::Int64)::Int64
      t1::Float64 = SSAValue(0)
      t2::Int64 = SSAValue(1)
      t3::Int64 = SSAValue(2)
      return (Core.tuple)(SSAValue(0),SSAValue(1),SSAValue(2))::Tuple{Float64,Int64,Int64}
  end::Tuple{Float64,Int64,Int64}

wsshin · 2017-01-04T02:07:38Z

@JeffBezanson Because @giordano says it exists also in v0.5, it might be the case that my memory is wrong and the problem has existed ever since. I'll test the Julia version for which I remember this problem didn't exist and report back.

wsshin · 2017-01-04T05:13:46Z

@JeffBezanson My memory turns out to be wrong. The problem must have existed for quite a while.

I used to use Julia v0.5, but picked up the v0.6 master branch when the "Vectorization Roadmap" (#16285) was completed, which corresponds to the commit 99b6a8c. That's also when I wrote some functions like the above in my project.

I don't remember seeing any significant performance difference between homogeneous and inhomogeneous tuples back then, so I had an impression that the commit didn't have the problem. Maybe I was passing only homogeneous tuples to the functions.

Anyway, I just built Julia fresh with the commit 99b6a8c and ran the same tests, but the problem didn't go away. Together with @giordano's report of the same problem on v0.5, I think it is safe to assume that the problem wasn't introduced recently, but has been there for quite a while.

Sorry about the false information, and hope this helps solving the problem!

chethega · 2018-03-11T14:23:14Z

The thing that happens is that julia actually pulls a copy for the tuple access. I don't understand why codegen decides to do this, but this makes retrieving the element O(N) instead of O(1).

using BenchmarkTools
f(x,i)=x[i];
for N=[2,10,100,1000,10_000]
       nnta=Any[i for i=1:N]; nnta[2]=1.0; nnt = (nnta...,);
       println("tuple with $N")
       @btime f($nnt, $1)
       end

######

tuple with 2
  31.694 ns (1 allocation: 32 bytes)
tuple with 10
  36.790 ns (1 allocation: 96 bytes)
tuple with 100
  115.433 ns (1 allocation: 896 bytes)
tuple with 1000
  802.000 ns (1 allocation: 7.88 KiB)
tuple with 10000
  6.995 ��s (1 allocation: 78.19 KiB)

KristofferC · 2018-03-11T14:28:22Z

Ref: #24596, #15274, #15482

ChrisRackauckas · 2018-08-25T00:13:01Z

It looks like it's fixed in v1.0?

using BenchmarkTools

function tuple_elems_slow(t::NTuple{3,Real})
     i1 = 1; i2 = 2; i3 = 3
     t1, t2, t3 = t[i1], t[i2], t[i3]
end

@btime tuple_elems_slow((0,0,0)) # 1.170 ns (0 allocations: 0 bytes)
@btime tuple_elems_slow((0.0,0,0)) # 1.463 ns (0 allocations: 0 bytes)

Seems like constant propogation of literals #24011 may have done it (though that is slightly different)?

KristofferC · 2020-06-23T11:29:21Z

Yep, looks fixed.

JeffBezanson added compiler:codegen Generation of LLVM IR and native code performance Must go faster labels Jan 4, 2017

KristofferC closed this as completed Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieving elements from an inhomogeneous tuple is slow #19850

Retrieving elements from an inhomogeneous tuple is slow #19850

wsshin commented Jan 4, 2017

JeffBezanson commented Jan 4, 2017

giordano commented Jan 4, 2017

JeffBezanson commented Jan 4, 2017

andyferris commented Jan 4, 2017

wsshin commented Jan 4, 2017

wsshin commented Jan 4, 2017

chethega commented Mar 11, 2018

KristofferC commented Mar 11, 2018

ChrisRackauckas commented Aug 25, 2018 •

edited

Loading

KristofferC commented Jun 23, 2020

Retrieving elements from an inhomogeneous tuple is slow #19850

Retrieving elements from an inhomogeneous tuple is slow #19850

Comments

wsshin commented Jan 4, 2017

JeffBezanson commented Jan 4, 2017

giordano commented Jan 4, 2017

JeffBezanson commented Jan 4, 2017

andyferris commented Jan 4, 2017

wsshin commented Jan 4, 2017

wsshin commented Jan 4, 2017

chethega commented Mar 11, 2018

KristofferC commented Mar 11, 2018

ChrisRackauckas commented Aug 25, 2018 • edited Loading

KristofferC commented Jun 23, 2020

ChrisRackauckas commented Aug 25, 2018 •

edited

Loading