Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in broadcast assignment to a SlowSubArray on nightly #53158

Open
jishnub opened this issue Feb 2, 2024 · 2 comments
Open

Regression in broadcast assignment to a SlowSubArray on nightly #53158

jishnub opened this issue Feb 2, 2024 · 2 comments
Labels
domain:arrays [a, r, r, a, y, s] kind:regression Regression in behavior compared to a previous version performance Must go faster regression 1.11

Comments

@jishnub
Copy link
Contributor

jishnub commented Feb 2, 2024

On v1.10.0

julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min  max):  293.806 ms  296.067 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     294.500 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   294.634 ms ± 639.044 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁    ▁▁  ▁ ▁▁▁▁   ▁  ▁ ▁  █   ▁        ▁                  ▁ ▁  
  █▁▁▁▁██▁▁█▁████▁▁▁█▁▁█▁█▁▁█▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁█ ▁
  294 ms           Histogram: frequency by time          296 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

vs on v"1.11.0-DEV.1442" as well as the current master (d54a455)

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 10 samples with 1 evaluation.
 Range (min  max):  547.709 ms  551.888 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     548.422 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   548.925 ms ±   1.418 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █      ▁▁▁▁  ▁▁                                   ▁         ▁  
  █▁▁▁▁▁▁████▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█ ▁
  548 ms           Histogram: frequency by time          552 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

versioninfo:

julia> versioninfo()
Julia Version 1.11.0-DEV.1442
Commit c16472b0014 (2024-02-01 14:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
  JULIA_EDITOR = subl

Curiously, profiling points to integer comparison checks while iterating over CartesianIndices to be the most expensive step:

julia> @bprofile $a[1:end, 1:end] .= $b;

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
    ╎4638 @Base/client.jl:535; _start()
    ╎ 4638 @Base/client.jl:561; repl_main
    ╎  4638 @Base/client.jl:424; run_main_repl(interactive::Bool, quiet::Bool, banner::Symbol, history_file::Bool, color_set::Bool)
    ╎   4638 @Base/essentials.jl:1017; invokelatest
    ╎    4638 @Base/essentials.jl:1020; #invokelatest#24638 @Base/client.jl:440; (::Base.var"#1100#1102"{Bool, Symbol, Bool})(REPL::Module)
    ╎    ╎ 4638 a-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:447; run_repl(repl::REPL.AbstractREPL, consumer::Any)
    ╎    ╎  4638 -master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:461; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::
    ╎    ╎   4638 -master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:302; kwcall(::NamedTuple, ::typeof(REPL.start_repl_backend), backend::REPL.REPLBackend, consu
    ╎    ╎    4638 -master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:305; start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
    ╎    ╎     4638 master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:320; repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
    ╎    ╎    ╎ 4638 master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:224; eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
  13╎    ╎    ╎  4638 @Base/boot.jl:428; eval
    ╎    ╎    ╎   4624 @BenchmarkTools/src/execution.jl:126; run(b::BenchmarkTools.Benchmark)
    ╎    ╎    ╎    4624 @BenchmarkTools/src/execution.jl:126; run
    ╎    ╎    ╎     4624 @BenchmarkTools/src/execution.jl:134; run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, n
    ╎    ╎    ╎    ╎ 4624 @BenchmarkTools/src/execution.jl:40; run_result
    ╎    ╎    ╎    ╎  4624 @BenchmarkTools/src/execution.jl:41; #run_result#45
    ╎    ╎    ╎    ╎   4624 @Base/essentials.jl:1017; invokelatest
  24╎    ╎    ╎    ╎    4624 @Base/essentials.jl:1020; #invokelatest#2
    ╎    ╎    ╎    ╎     2    @Base/compiler/typeinfer.jl:1073; typeinf_ext_toplevel(mi::Core.MethodInstance, world::UInt64)
    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/typeinfer.jl:1077; typeinf_ext_toplevel(interp::Core.Compiler.NativeInterpreter, mi::Core.MethodInstance)
    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/typeinfer.jl:1039; typeinf_ext(interp::Core.Compiler.NativeInterpreter, mi::Core.MethodInstance)
    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/typeinfer.jl:216; typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
    ╎    ╎    ╎    ╎    ╎    2    @Base/compiler/typeinfer.jl:246; _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
    ╎    ╎    ╎    ╎    ╎     2    @Base/compiler/abstractinterpretation.jl:3373; typeinf_nocycle(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.Infere
    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/abstractinterpretation.jl:3295; typeinf_local(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.Inferen
    ╎    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/abstractinterpretation.jl:3041; abstract_eval_basic_statement(interp::Core.Compiler.NativeInterpreter, stmt::Any, 
    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/abstractinterpretation.jl:2730; abstract_eval_statement(interp::Core.Compiler.NativeInterpreter, e::Any, vtypes::
    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/compiler/abstractinterpretation.jl:2425; abstract_eval_statement_expr(interp::Core.Compiler.NativeInterpreter, e::Expr, v
    ╎    ╎    ╎    ╎    ╎    ╎     2    @Base/compiler/abstractinterpretation.jl:2409; abstract_eval_call(interp::Core.Compiler.NativeInterpreter, e::Expr, vtypes::Ve
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/abstractinterpretation.jl:2394; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::Core.Compiler.
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/abstractinterpretation.jl:2249; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::Core.Compiler
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/abstractinterpretation.jl:2256; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::Core.Compile
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/compiler/abstractinterpretation.jl:2174; abstract_call_known(interp::Core.Compiler.NativeInterpreter, f::Any, arginf
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     2    @Base/compiler/abstractinterpretation.jl:102; abstract_call_gf_by_type(interp::Core.Compiler.NativeInterpreter, f::Any, a
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/abstractinterpretation.jl:650; abstract_call_method(interp::Core.Compiler.NativeInterpreter, method::Meth
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/typeinfer.jl:867; typeinf_edge(interp::Core.Compiler.NativeInterpreter, method::Method, atype::Any, spar
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/typeinfer.jl:216; typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/typeinfer.jl:246; _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceStat
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/abstractinterpretation.jl:3373; typeinf_nocycle(interp::Core.Compiler.NativeInterpreter, frame::Core.
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/abstractinterpretation.jl:3295; typeinf_local(interp::Core.Compiler.NativeInterpreter, frame::Core.C
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/abstractinterpretation.jl:3041; abstract_eval_basic_statement(interp::Core.Compiler.NativeInterpret
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/abstractinterpretation.jl:2730; abstract_eval_statement(interp::Core.Compiler.NativeInterpreter, e
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    ase/compiler/abstractinterpretation.jl:2425; abstract_eval_statement_expr(interp::Core.Compiler.NativeInterpret
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    ase/compiler/abstractinterpretation.jl:2409; abstract_eval_call(interp::Core.Compiler.NativeInterpreter, e::Ex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    se/compiler/abstractinterpretation.jl:2394; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::C
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    se/compiler/abstractinterpretation.jl:2249; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    se/compiler/abstractinterpretation.jl:2256; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo:
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    e/compiler/abstractinterpretation.jl:2174; abstract_call_known(interp::Core.Compiler.NativeInterpreter, f:
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    e/compiler/abstractinterpretation.jl:111; abstract_call_gf_by_type(interp::Core.Compiler.NativeInterprete
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    e/compiler/abstractinterpretation.jl:813; abstract_call_method_with_const_args(interp::Core.Compiler.Nat
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    /compiler/abstractinterpretation.jl:837; abstract_call_method_with_const_args(interp::Core.Compiler.Nat
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    /compiler/abstractinterpretation.jl:1201; semi_concrete_eval_call(interp::Core.Compiler.NativeInterpre
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/ssair/irinterp.jl:440; ir_abstract_constant_propagation(interp::Core.Compiler.NativeInt
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/ssair/irinterp.jl:280; _ir_abstract_constant_propagation(interp::Core.Compiler.NativeI
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/ssair/irinterp.jl:294; _ir_abstract_constant_propagation(interp::Core.Compiler.Native
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/ssair/irinterp.jl:248; scan!(callback::Core.Compiler.var"#559#562"{Nothing, Core.Com
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/ssair/irinterp.jl:326; (::Core.Compiler.var"#559#562"{Nothing, Core.Compiler.Native
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/ssair/irinterp.jl:141; reprocess_instruction!(interp::Core.Compiler.NativeInterpre
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    mpiler/abstractinterpretation.jl:2428; abstract_eval_statement_expr(interp::Core.Compiler.Nativ
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/tfuncs.jl:99; instanceof_tfunc(t::Any, astag::Bool)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/tfuncs.jl:100; instanceof_tfunc(t::Any, astag::Bool, troot::Core.Const)
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/typeutils.jl:115; valid_as_lattice(x::Any, astag::Bool)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/typeinfer.jl:264; _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceStat
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/optimize.jl:950; optimize(interp::Core.Compiler.NativeInterpreter, opt::Core.Compiler.OptimizationSta
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/optimize.jl:976; run_passes_ipo_safe
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/optimize.jl:961; run_passes_ipo_safe(ci::Core.CodeInfo, sv::Core.Compiler.OptimizationState{Core.Co
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/ssair/passes.jl:2037; adce_pass!(ir::Core.Compiler.IRCode, inlining::Core.Compiler.InliningState{C
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/ssair/ir.jl:1725; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/ssair/ir.jl:1802; iterate_compact(compact::Core.Compiler.IncrementalCompact)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/ssair/ir.jl:276; setindex!
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/array.jl:972; setindex!
    ╎    ╎    ╎    ╎     4598 @BenchmarkTools/src/execution.jl:102; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎ 510  @BenchmarkTools/src/execution.jl:109; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwarg
    ╎    ╎    ╎    ╎    ╎  510  @BenchmarkTools/src/execution.jl:556; var"##sample#224"(::Tuple{Matrix{Float64}, Matrix{Float64}}, __params::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎   510  @BenchmarkTools/src/execution.jl:547; var"##core#223"(a#221::Matrix{Float64}, b#222::Matrix{Float64})
    ╎    ╎    ╎    ╎    ╎    510  @Base/broadcast.jl:875; materialize!
    ╎    ╎    ╎    ╎    ╎     510  @Base/broadcast.jl:878; materialize!
    ╎    ╎    ╎    ╎    ╎    ╎ 510  @Base/broadcast.jl:920; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎  510  @Base/broadcast.jl:961; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎   510  @Base/abstractarray.jl:1061; copyto!
  60╎    ╎    ╎    ╎    ╎    ╎    60   @Base/abstractarray.jl:0; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRa
    ╎    ╎    ╎    ╎    ╎    ╎    64   @Base/abstractarray.jl:1116; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni
    ╎    ╎    ╎    ╎    ╎    ╎     64   @Base/abstractarray.jl:1411; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 64   @Base/abstractarray.jl:1441; _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  64   @Base/subarray.jl:366; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   64   @Base/array.jl:979; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    64   @Base/abstractarray.jl:1345; _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     64   @Base/abstractarray.jl:2975; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 64   @Base/abstractarray.jl:2991; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  64   @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   64   @Base/abstractarray.jl:3007; _sub2ind_recurse
  64╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    64   @Base/int.jl:88; *
    ╎    ╎    ╎    ╎    ╎    ╎    386  @Base/abstractarray.jl:1120; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni
    ╎    ╎    ╎    ╎    ╎    ╎     386  @Base/multidimensional.jl:422; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 62   @Base/multidimensional.jl:446; __inc
  62╎    ╎    ╎    ╎    ╎    ╎    ╎  62   @Base/int.jl:87; +
   1╎    ╎    ╎    ╎    ╎    ╎    ╎ 324  @Base/multidimensional.jl:447; __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  323  @Base/operators.jl:276; !=
 323╎    ╎    ╎    ╎    ╎    ╎    ╎   323  @Base/promotion.jl:620; ==
    ╎    ╎    ╎    ╎    ╎ 4088 @BenchmarkTools/src/execution.jl:115; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwarg
    ╎    ╎    ╎    ╎    ╎  4088 @BenchmarkTools/src/execution.jl:556; var"##sample#224"(::Tuple{Matrix{Float64}, Matrix{Float64}}, __params::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎   4088 @BenchmarkTools/src/execution.jl:547; var"##core#223"(a#221::Matrix{Float64}, b#222::Matrix{Float64})
    ╎    ╎    ╎    ╎    ╎    4088 @Base/broadcast.jl:875; materialize!
    ╎    ╎    ╎    ╎    ╎     4088 @Base/broadcast.jl:878; materialize!
    ╎    ╎    ╎    ╎    ╎    ╎ 4088 @Base/broadcast.jl:920; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎  4088 @Base/broadcast.jl:961; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎   4088 @Base/abstractarray.jl:1061; copyto!
 488╎    ╎    ╎    ╎    ╎    ╎    488  @Base/abstractarray.jl:0; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRa
    ╎    ╎    ╎    ╎    ╎    ╎    449  @Base/abstractarray.jl:1116; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni
    ╎    ╎    ╎    ╎    ╎    ╎     448  @Base/abstractarray.jl:1411; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 448  @Base/abstractarray.jl:1441; _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  448  @Base/subarray.jl:366; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   445  @Base/array.jl:979; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    445  @Base/abstractarray.jl:1345; _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     445  @Base/abstractarray.jl:2975; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 445  @Base/abstractarray.jl:2991; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  445  @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   445  @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/abstractarray.jl:3014; offsetin
   2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     2    @Base/int.jl:86; -
 439╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    439  @Base/int.jl:88; *
   4╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    4    @Base/int.jl:87; +
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   3    @Base/subarray.jl:293; reindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    3    @Base/array.jl:3058; getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     3    @Base/range.jl:932; _getindex
   3╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 3    @Base/int.jl:87; +
   1╎    ╎    ╎    ╎    ╎    ╎     1    @Base/essentials.jl:882; getindex
    ╎    ╎    ╎    ╎    ╎    ╎    3151 @Base/abstractarray.jl:1120; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni
    ╎    ╎    ╎    ╎    ╎    ╎     3151 @Base/multidimensional.jl:422; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 499  @Base/multidimensional.jl:446; __inc
 499╎    ╎    ╎    ╎    ╎    ╎    ╎  499  @Base/int.jl:87; +
   3╎    ╎    ╎    ╎    ╎    ╎    ╎ 2652 @Base/multidimensional.jl:447; __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2649 @Base/operators.jl:276; !=
2649╎    ╎    ╎    ╎    ╎    ╎    ╎   2649 @Base/promotion.jl:620; ==
Total snapshots: 4645. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.
@jishnub jishnub added performance Must go faster kind:regression Regression in behavior compared to a previous version domain:arrays [a, r, r, a, y, s] labels Feb 2, 2024
@jishnub
Copy link
Contributor Author

jishnub commented Feb 2, 2024

Bisected to 9aa7980:

9aa7980358349ee7017fa614525f571ffa92c55d is the first bad commit
commit 9aa7980358349ee7017fa614525f571ffa92c55d
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Fri Nov 17 13:58:01 2023 -0500

    codegen: ensure i1 bool is widened to i8 before storing (#52189)
    
    Teach value_to_pointer to convert primitive types to their stored
    representation first, to avoid exposing undef bits later (via memcpy).
    
    Take this opportunity to also generalizes the support for zext Bool to
    anywhere inside any struct for changing any bitwidth to a multiple of 8
    bytes. This would change a vector like <2 x i4> from occupying i8 to i16
    (c.f. LLVM's LangRef), if such an operation were expressible in Julia
    today. And take this opportunity to do a bit of code cleanup, now that
    codegen is better and using helpers from LLVM.
    
    Fixes #52127

 src/cgutils.cpp    |   3 --
 src/codegen.cpp    |  27 ++++--------
 src/intrinsics.cpp | 119 ++++++++++++++++++++++++++++++++++++-----------------
 test/llvmcall2.jl  |   9 ++++
 4 files changed, 98 insertions(+), 60 deletions(-)

On this commit,

julia> a = zeros(4000,4000); b = rand(size(a)...);

julia> @btime $a[1:end,1:end] .= $b;
  61.351 ms (0 allocations: 0 bytes)

vs on 045b6f9:

julia> @btime $a[1:end,1:end] .= $b;
  20.189 ms (0 allocations: 0 bytes)

vtjnash pushed a commit that referenced this issue Feb 20, 2024
…3383)

With this, the following (and equivalent calls) work:
```julia
julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef

julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef
```

Close #53098. With this, all
the `_unsetindex!` branches in `copyto_unaliased!` work for
`Array`-views, and this makes certain indexing operations vectorize and
speed-up:
```julia
julia> using BenchmarkTools

julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...);

julia> @Btime copyto!($b, $a);
  16.427 μs (0 allocations: 0 bytes) # master
  2.308 μs (0 allocations: 0 bytes) # PR
``` 

Improves (but doesn't resolve)
#40962 and
#53158

```julia
julia> a = rand(40,40); b = rand(40,40);

julia> @Btime $a[1:end,1:end] .= $b;
  5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16"
  3.194 μs (0 allocations: 0 bytes) # PR
```
ƒ
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
tecosaur pushed a commit to tecosaur/julia that referenced this issue Mar 4, 2024
…liaLang#53383)

With this, the following (and equivalent calls) work:
```julia
julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef

julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef
```

Close JuliaLang#53098. With this, all
the `_unsetindex!` branches in `copyto_unaliased!` work for
`Array`-views, and this makes certain indexing operations vectorize and
speed-up:
```julia
julia> using BenchmarkTools

julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...);

julia> @Btime copyto!($b, $a);
  16.427 μs (0 allocations: 0 bytes) # master
  2.308 μs (0 allocations: 0 bytes) # PR
``` 

Improves (but doesn't resolve)
JuliaLang#40962 and
JuliaLang#53158

```julia
julia> a = rand(40,40); b = rand(40,40);

julia> @Btime $a[1:end,1:end] .= $b;
  5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16"
  3.194 μs (0 allocations: 0 bytes) # PR
```
ƒ
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@oscardssmith oscardssmith added this to the 1.11 milestone Mar 6, 2024
mkitti pushed a commit to mkitti/julia that referenced this issue Mar 7, 2024
…liaLang#53383)

With this, the following (and equivalent calls) work:
```julia
julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef

julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef
```

Close JuliaLang#53098. With this, all
the `_unsetindex!` branches in `copyto_unaliased!` work for
`Array`-views, and this makes certain indexing operations vectorize and
speed-up:
```julia
julia> using BenchmarkTools

julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...);

julia> @Btime copyto!($b, $a);
  16.427 μs (0 allocations: 0 bytes) # master
  2.308 μs (0 allocations: 0 bytes) # PR
``` 

Improves (but doesn't resolve)
JuliaLang#40962 and
JuliaLang#53158

```julia
julia> a = rand(40,40); b = rand(40,40);

julia> @Btime $a[1:end,1:end] .= $b;
  5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16"
  3.194 μs (0 allocations: 0 bytes) # PR
```
ƒ
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
KristofferC pushed a commit that referenced this issue Mar 27, 2024
…3383)

With this, the following (and equivalent calls) work:
```julia
julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef

julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2))
2-element view(::Vector{BigInt}, 1:2) with eltype BigInt:
 #undef
 #undef
```

Close #53098. With this, all
the `_unsetindex!` branches in `copyto_unaliased!` work for
`Array`-views, and this makes certain indexing operations vectorize and
speed-up:
```julia
julia> using BenchmarkTools

julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...);

julia> @Btime copyto!($b, $a);
  16.427 μs (0 allocations: 0 bytes) # master
  2.308 μs (0 allocations: 0 bytes) # PR
```

Improves (but doesn't resolve)
#40962 and
#53158

```julia
julia> a = rand(40,40); b = rand(40,40);

julia> @Btime $a[1:end,1:end] .= $b;
  5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16"
  3.194 μs (0 allocations: 0 bytes) # PR
```
ƒ
Co-authored-by: Jameson Nash <vtjnash@gmail.com>

(cherry picked from commit 1a90409)
@jishnub
Copy link
Contributor Author

jishnub commented May 14, 2024

This seems to have regressed on the current nightly (v"1.12.0-DEV.528").
On v"1.11.0-beta1":

julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min  max):  311.599 ms  332.538 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     313.798 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   315.770 ms ±   5.354 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▁█▁▁▁██   ▁         ▁ ▁     ▁                              ▁  
  ████████▁▁▁█▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  312 ms           Histogram: frequency by time          333 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

vs on nightly:

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 12 samples with 1 evaluation.
 Range (min  max):  448.373 ms  452.969 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     450.255 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   450.305 ms ±   1.671 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ██ ██                █  █      █    █           █   █     █  
  █▁██▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁█ ▁
  448 ms           Histogram: frequency by time          453 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> VERSION
v"1.12.0-DEV.528"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s] kind:regression Regression in behavior compared to a previous version performance Must go faster regression 1.11
Projects
None yet
Development

No branches or pull requests

3 participants