Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in 1.0.1 #29464

Closed
jmichel7 opened this issue Oct 1, 2018 · 11 comments
Closed

Performance regression in 1.0.1 #29464

jmichel7 opened this issue Oct 1, 2018 · 11 comments
Labels
kind:potential benchmark Could make a good benchmark in BaseBenchmarks kind:regression Regression in behavior compared to a previous version performance Must go faster

Comments

@jmichel7
Copy link

jmichel7 commented Oct 1, 2018

One of my programs which took 1mn on 1.0.0 takes 5mn on 1.0.1
I tracked at least part of the problem to the following code snippet:

function foo(gens,elts::Vararg{Vector{Int}})
  ee=collect(elts)
  for i in eachindex(gens)
    if let i=i
      all(w->w[i]>w[i+1],ee)
       end
      error("should not happen")
    end
  end
end

l=map(i->collect(1:5),1:4)
for i in 1:4 l[i][[i,i+1]]=l[i][[i+1,i]] end
const gens=l
const a=[2,1,3,4,5]
const b=[1,2,4,3,5]
test()=foo(gens,a,b)

On 1.0.0

julia> @btime test()
  176.287 ns (2 allocations: 128 bytes)

On 1.0.1:

julia> @btime test()
  695.735 ns (6 allocations: 192 bytes)

Does anyone know what’s happening here?

@KristofferC
Copy link
Sponsor Member

I'm bisecting

@KristofferC
Copy link
Sponsor Member

commit b5e2c2ff2f563140fa707674ec0789e03951b170
Author: Klaus Crusius <KlausC@users.noreply.github.com>
Date:   Wed Sep 12 23:35:11 2018 +0200

    Allow `Iterators.flatten` for empty tuple (#29112)

    (cherry picked from commit b4c370dd05ac33d0c8c02b440b861e8582d1fc74)

:040000 040000 b0f6ee2c6abb7b729a78c14b39dc9f823289976c 817f36cd24bdc1696ce76067b707506378748205 M      base
:040000 040000 07f50c8dce7d7a9b2fb21608bd1a0c94ac991c20 adf4a997f804fbcdab58c7df9356eb74ddde2a6d M      test

cc @KlausC #29112

@KristofferC KristofferC added performance Must go faster kind:regression Regression in behavior compared to a previous version kind:potential benchmark Could make a good benchmark in BaseBenchmarks labels Oct 2, 2018
@KristofferC
Copy link
Sponsor Member

@code_typed is identical for the function on the commit with the regression and the one before...

@jmichel7 jmichel7 closed this as completed Oct 4, 2018
@jmichel7 jmichel7 reopened this Oct 4, 2018
@jmichel7
Copy link
Author

jmichel7 commented Oct 4, 2018

Oops! I closed the issue by a misclick. I wanted to say: I do not know if the issue has been identified, but if this may help here is a simplified snippet which still shows exactly the same problem:

 function foo(elts::Vararg{Vector{Int}})
  ee=collect(elts)
  for i in 1:4
    if all(w->w[i]>w[i+1],ee)
      error("should not happen")
    end
  end
end

const a=[2,1,3,4,5]
const b=[1,2,4,3,5]
test()=foo(a,b)

@KristofferC
Copy link
Sponsor Member

KristofferC commented Oct 4, 2018

Here is a way to generate arbitrary slowdown:

N = 10000
a = collect(1:N)
b = collect(2:N+1)

function foo(elts::Vararg{Vector{Int}})
    ee=collect(elts)
    for i in 1:length(ee[1])-1
       f = w -> w[i] > w[i+1]
       all(f,ee)
    end
end

using BenchmarkTools

@btime foo($a,$b)

0.7:

julia> @btime foo($a,$b)
  12.316 μs (2 allocations: 128 bytes)

1.0.1

julia> @btime foo($a,$b)
  3.672 ms (38979 allocations: 765.39 KiB)

Profiling shows that we spend a lot in the runtime.

julia> Profile.print(noisefloor=5, C=true)
3255 /Users/kristoffer/julia/src/task.c:271; start_task
 3255 /Users/kristoffer/julia/src/./julia.h:1559; jl_apply
  3255 ./task.jl:259; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
   3255 .../julia/stdlib/v1.1/REPL/src/REPL.jl:117; macro expansion
    3255 .../julia/stdlib/v1.1/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
     3255 ./boot.jl:319; eval(::Module, ::Any)
      3255 ...rs/kristoffer/julia/src/builtins.c:622; jl_toplevel_eval_in
       3255 ...rs/kristoffer/julia/src/toplevel.c:812; jl_toplevel_eval_flex
        3255 /Users/kristoffer/julia/src/gf.c:1843; jl_fptr_trampoline
         3255 ...tdlib/v1.1/Profile/src/Profile.jl:25; top-level scope
          3255 ./REPL[9]:1; macro expansion
           291  ./REPL[4]:3; foo(::Array{Int64,1}, ::Vararg{Array{Int64,1},N} ...
            204 .../kristoffer/julia/src/builtins.c:731; jl_f_getfield
             79 .../kristoffer/julia/src/datatype.c:543; jl_new_bits
              74 .../kristoffer/julia/src/datatype.c:681; jl_box_int64
               63 ...er/julia/src/./julia_internal.h:263; jl_gc_alloc_
           1388 ./REPL[4]:4; foo(::Array{Int64,1}, ::Vararg{Array{Int64,1},N} ...
            778 ...s/kristoffer/julia/src/jltypes.c:879; jl_apply_type
             755 ...s/kristoffer/julia/src/jltypes.c:925; jl_instantiate_unionall
              535 .../kristoffer/julia/src/jltypes.c:1499; inst_type_w_
               535 .../kristoffer/julia/src/jltypes.c:1307; inst_datatype
                493 ...kristoffer/julia/src/jltypes.c:1301; inst_datatype_env
                 437 ...kristoffer/julia/src/jltypes.c:1297; inst_datatype_env
                  293 ...ristoffer/julia/src/jltypes.c:1092; inst_datatype_inner
                   221 ...ristoffer/julia/src/jltypes.c:716; lookup_type
                    118 ...ristoffer/julia/src/jltypes.c:678; lookup_type_idx
           1553 ./REPL[4]:5; foo(::Array{Int64,1}, ::Vararg{Array{Int64,1},N} ...
            828 /Users/kristoffer/julia/src/gf.c:2194; jl_apply_generic
             643 /Users/kristoffer/julia/src/gf.c:2148; jl_lookup_generic_
              628 ...er/julia/src/./julia_internal.h:876; jl_typemap_assoc_exact
               431 .../kristoffer/julia/src/typemap.c:798; jl_typemap_entry_assoc_exact
                385 ...kristoffer/julia/src/subtype.c:1229; jl_tuple_isa
                 261 ...kristoffer/julia/src/subtype.c:1283; jl_isa
                  261 ...ristoffer/julia/src/subtype.c:1198; jl_subtype
                   261 ...ristoffer/julia/src/subtype.c:1180; jl_subtype_env
                    115 ...ristoffer/julia/src/subtype.c:1120; forall_exists_subtype
                     111 ...istoffer/julia/src/subtype.c:1092; exists_subtype
            280 ...er/julia/usr/lib/julia/sys.dylib:?; jfptr_iterate_7082
             156 /Users/kristoffer/julia/src/gc.c:965; jl_gc_pool_alloc
             328 ./reducedim.jl:658; all(::Function, ::Array{Array{Int64,1},1})
              285 ./reducedim.jl:658; #all#550(::Function, ::Function, ::Function, ::A...
               94  /Users/kristoffer/julia/src/gf.c:2194; jl_apply_generic

@KristofferC
Copy link
Sponsor Member

KristofferC commented Oct 4, 2018

Ok, it is a code specialization heuristic. Changing the function declaration from

foo(elts::Vararg{Vector{Int}})

to

foo(elts::Vararg{Vector{Int}, N}) where {N}

fixes it:

julia> @btime foo($a,$b)
  11.868 μs (1 allocation: 96 bytes)

@mauro3
Copy link
Contributor

mauro3 commented Oct 4, 2018

Is this more or less what is described here: https://docs.julialang.org/en/v1/devdocs/functions/#Compiler-efficiency-issues-1?

@KristofferC
Copy link
Sponsor Member

I guess, don't know how it is related to #29112 though. Or if I just worked around the problem with the extra specialization but the original issue is not fixed.

@martinholters
Copy link
Member

The collect call there cannot be inferred in the non-specialized case. (Check code_warntype(foo, Tuple{Vararg{Vector{Int}}})). That's a consequence of Base.IteratorEltype not be inferable in that case:

julia> code_warntype(Base.IteratorEltype, Tuple{Tuple{Vararg{Vector{Int}}}})
Body::Union{EltypeUnknown, HasEltype}
123 1%1 = (Base.typeof)(x)::Type{#s55} where #s55<:Tuple{Vararg{Array{Int64,1},N} where N}                   │%2 = (isa)(%1, Type{Union{}})::Bool                                                                     │
    └──      goto #3 if not %2                                                                                  │
    2 ─      goto #4                                                                                            │
    3%5 = (Base.IteratorEltype)(%1)::Union{EltypeUnknown, HasEltype}                                         │
    └──      goto #4                                                                                            │
    4%7 = φ (#2 => $(QuoteNode(Base.EltypeUnknown())), #3 => %5)::Union{EltypeUnknown, HasEltype}            │
    └──      return %7     

Removing this method, added in #29112:

IteratorEltype(::Type{Union{}}) = EltypeUnknown()

we get:

julia> code_warntype(Base.IteratorEltype, Tuple{Tuple{Vararg{Vector{Int}}}})
Body::Base.HasEltype
123 1return $(QuoteNode(Base.HasEltype()))     

and performance of the original example is back at 0.7 levels. Not sure what to do here.

@StefanKarpinski
Copy link
Sponsor Member

@JeffBezanson, any ideas how to do better here?

@KristofferC
Copy link
Sponsor Member

Fixed #29548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:potential benchmark Could make a good benchmark in BaseBenchmarks kind:regression Regression in behavior compared to a previous version performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants