Performance disparity in expression operations #85

jd-lara · 2021-05-26T04:18:06Z

I made a test of the performance impact of having Parameters in an affine expression for a different number of expressions in the vector product operation.

The test compares 3 possible cases.

Expressions with parameters ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}
Expresions without parameters. JuMP.GenericAffExpr{Float64, JuMP.VariableRef}
Mix of expressions using JuMP.GenericAffExpr{Float64, JuMP.VariableRef} and JuMP.GenericAffExpr{Float64, JuMP.VariableRef}

Here is some reproducible code:

using JuMP
using ParameterJuMP
using Random

function model_with_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
    end

    col = rand(size_1,)
    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed @constraint(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_without_params(size_1, size_2)
    m2 = Model()

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m2, [v in set_names_1, t in time])
    x = @variable(m2, [v in set_names_2, t in time])

    expression_container = JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
    end

    col = rand(size_1,)
    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed @constraint(m2, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_with_mixed_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    param_ixs = shuffle(1:size_1)[1:Int(size_1/2)]
    for i in 1:size_1, t in 1:size_2
        if i in param_ixs
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
        else
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
        end
    end

    col = rand(size_1,)
    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed @constraint(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)
    return build_time, build_bytes_alloc, build_sec_in_gc
end


model_with_params(10, 48)
model_without_params(10, 48)
model_with_mixed_params(10, 48)

results = Dict()
results_params = Dict()
results_mixed = Dict()
for i in [100, 1000, 5000, 10_000, 50_000, 100_000, 500_000]
    println(i)
    @show results[i] = model_without_params(i, 48)
    @show results_params[i] = model_with_params(i, 48)
    @show results_mixed = model_with_mixed_params(i, 48)
end

Results:

100
results[i] = model_without_params(i, 48) = (0.00015558, 29488, 0.0)
results_params[i] = model_with_params(i, 48) = (0.000182823, 76048, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.000172758, 67904, 0.0)
1000
results[i] = model_without_params(i, 48) = (0.000626027, 197072, 0.0)
results_params[i] = model_with_params(i, 48) = (0.001597523, 560976, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.001377474, 462640, 0.0)
5000
results[i] = model_without_params(i, 48) = (0.00356614, 1241216, 0.0)
results_params[i] = model_with_params(i, 48) = (0.008178023, 3541104, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.006022408, 2940048, 0.0)
10000
results[i] = model_without_params(i, 48) = (0.006917794, 2477760, 0.0)
results_params[i] = model_with_params(i, 48) = (0.019982686, 7071424, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.014490844, 6131968, 0.0)
50000
results[i] = model_without_params(i, 48) = (0.048876917, 10544256, 0.0)
results_params[i] = model_with_params(i, 48) = (0.234299942, 31453584, 0.11880938)
results_mixed = model_with_mixed_params(i, 48) = (0.322152996, 25348384, 0.203175162)
100000
results[i] = model_without_params(i, 48) = (0.106661881, 18983120, 0.0)
results_params[i] = model_with_params(i, 48) = (7.070552948, 54502512, 6.839998649)
results_mixed = model_with_mixed_params(i, 48) = (0.384903389, 48584096, 0.173239921)

The text was updated successfully, but these errors were encountered:

jd-lara · 2021-05-26T15:57:37Z

The 500 000 case finished in 4 seconds for GenericAffExpr and crashed my julia session for the cases with parameters.

odow · 2021-05-26T21:45:02Z

I think your results are getting mixed with when the GC runs.

using JuMP
using ParameterJuMP
using Random

function model_with_params(m)
    model = Model()
    @variable(model, x[1:m])
    @variable(model, y[1:m])
    @variable(model, p[1:m] == 1.0, ParameterJuMP.Param())
    @expression(model, expr[i=1:m], x[i] + y[i] + p[i] + rand())
    col = rand(m)
    @time @constraint(model, sum(col[i] * expr[i] for i in 1:m) == 0.0)
    return
end

function model_without_params(m)
    model = Model()
    @variable(model, x[1:m])
    @variable(model, y[1:m])
    @variable(model, p[1:m] == 1.0)
    @expression(model, expr[i=1:m], x[i] + y[i] + p[i] + rand())
    col = rand(m)
    @time @constraint(model, sum(col[i] * expr[i] for i in 1:m) == 0.0)
    return
end

model_with_params(10)
model_without_params(10)
for i in [100, 1000, 10_000, 50_000, 100_000, 500_000]
    println(i)
    GC.gc()
    model_without_params(i)
    GC.gc()
    model_with_params(i)
end

Yields

100
  0.000066 seconds (33 allocations: 35.188 KiB)
  0.000130 seconds (102 allocations: 69.453 KiB)
1000
  0.000607 seconds (45 allocations: 299.047 KiB)
  0.001099 seconds (137 allocations: 543.016 KiB)
10000
  0.006241 seconds (57 allocations: 3.534 MiB)
  0.027691 seconds (184 allocations: 6.740 MiB, 53.47% gc time)
50000
  0.038874 seconds (65 allocations: 13.946 MiB)
  0.072116 seconds (214 allocations: 29.975 MiB)
100000
  0.081927 seconds (70 allocations: 25.032 MiB)
  0.178597 seconds (228 allocations: 51.979 MiB)
500000
  0.631749 seconds (79 allocations: 116.645 MiB)
  1.029031 seconds (259 allocations: 202.852 MiB)

So there is a ~2x penalty to using parameters. Which makes sense: every expression has two dictionaries.

jd-lara · 2021-05-26T21:46:38Z

Thanks, that makes sense.

odow · 2021-05-26T22:00:16Z

@jd-lara can you try the following instead?

Replace

@constraint(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)

with

ex = @expression(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
c = ScalarConstraint(ex, MOI.EqualTo(0.0))
add_constraint(m, c, "")

jd-lara · 2021-05-28T00:04:56Z

@odow I have updated the test as follows:

using JuMP
using ParameterJuMP
using Random

function _make_constraint(model::JuMP.Model, col::Vector{Float64}, expression_container::JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}, size_1::Int)
    ex = @expression(model, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
    c = ScalarConstraint(ex, MOI.EqualTo(0.0))
    add_constraint(model, c, "")
end

function _make_constraint(model::JuMP.Model, col::Vector{Float64}, expression_container::JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}, size_1::Int)
    ex = @expression(model, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
    c = ScalarConstraint(ex, MOI.EqualTo(0.0))
    add_constraint(model, c, "")
end

function model_with_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
    end

    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed _make_constraint(m, rand(size_1,), expression_container, size_1)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_without_params(size_1, size_2)
    m2 = Model()

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m2, [v in set_names_1, t in time])
    x = @variable(m2, [v in set_names_2, t in time])

    expression_container = JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
    end

    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed _make_constraint(m2, rand(size_1,), expression_container, size_1)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_with_mixed_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    param_ixs = shuffle(1:size_1)[1:Int(size_1/2)]
    for i in 1:size_1, t in 1:size_2
        if i in param_ixs
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
        else
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
        end
    end


    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed _make_constraint(m, rand(size_1,), expression_container, size_1)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

# FIrst compile run
model_with_params(10, 48)
model_without_params(10, 48)
model_with_mixed_params(10, 48)

results = Dict()
results_params = Dict()
results_mixed = Dict()
for i in [100, 1000, 5000, 10_000, 50_000, 100_000]
    println(i)
    @show results[i] = model_without_params(i, 48)
    GC.gc()
    @show results_params[i] = model_with_params(i, 48)
    GC.gc()
    @show results_mixed = model_with_mixed_params(i, 48)
end

Looks like up to 10000 expressions, the penalty is about 2x as expected and over 50000 the penalty is 4x or more

100
results[i] = model_without_params(i, 48) = (9.7006e-5, 30384, 0.0)
results_params[i] = model_with_params(i, 48) = (0.000171875, 76912, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.000131348, 68880, 0.0)
1000
results[i] = model_without_params(i, 48) = (0.000831869, 205200, 0.0)
results_params[i] = model_with_params(i, 48) = (0.001245529, 569072, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.001065928, 470624, 0.0)
5000
results[i] = model_without_params(i, 48) = (0.003808976, 1281296, 0.0)
results_params[i] = model_with_params(i, 48) = (0.008973684, 3581152, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.005882702, 2980096, 0.0)
10000
results[i] = model_without_params(i, 48) = (0.008080068, 2557840, 0.0)
results_params[i] = model_with_params(i, 48) = (0.01753953, 7151472, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.01423789, 6212016, 0.0)
50000
results[i] = model_without_params(i, 48) = (0.050672775, 10944336, 0.0)
results_params[i] = model_with_params(i, 48) = (0.228111582, 31853632, 0.116043022)
results_mixed = model_with_mixed_params(i, 48) = (0.085427278, 25748320, 0.0)
100000
results[i] = model_without_params(i, 48) = (0.117079187, 19783200, 0.0)
results_params[i] = model_with_params(i, 48) = (0.428364249, 55302560, 0.13931007)
results_mixed = model_with_mixed_params(i, 48) = (0.668327524, 49384144, 0.377572102)

odow · 2021-05-28T00:08:16Z

That's all just GC differences though. If you subtract the GC time, you get 2.5x again:

julia> (0.428364249 - 0.13931007) / 0.117079187
2.4688775725782923

julia> (0.228111582 - 0.116043022) / 0.050672775
2.211612843385822

julia> (0.668327524 - 0.377572102) / 0.117079187
2.4834082764855543

jd-lara · 2021-05-28T00:10:30Z

So the question is how to reduce the GC times.

odow · 2021-05-28T00:12:58Z

It'd be good to confirm why the 2x exists first. Is it creating the extra dictionary? Extra key lookups? Resizing the two dictionaries?

One option that might be faster is to refactor everything into a single dictionary with Tuple{Bool,VariableRef} keys.

That might also help GC by creating less dictionaries.

joaquimg · 2021-06-03T17:58:19Z

I dont understand why having two dicts would be much worse than having a single larger one.
Dis you run any profiling?

jd-lara · 2021-06-03T17:59:46Z

I have been doing @code_warntype to check type stability but I haven't explored more since this issue.

joaquimg · 2021-06-07T00:16:18Z

Have you tried: https://github.com/timholy/ProfileView.jl ?

jd-lara · 2021-06-07T20:35:33Z

@joaquimg I made some observations with @kdheepak with pprof() but haven't found anything definitive

jd-lara · 2021-06-16T20:02:22Z

Profile using Parameters

Profile without parameters

jd-lara · 2021-06-16T20:11:10Z

@joaquimg and @odow I think the culprit of this issue is in this line

https://github.com/jump-dev/JuMP.jl/blob/2ff880d46161a8502d9ca378d098d60d1c9864c3/src/aff_expr.jl#L26

odow · 2021-06-17T03:16:09Z

Okay. I have a MWE

julia> using JuMP, ParameterJuMP

julia> function foo(M, N; param)
           model = Model()
           @variable(model, x[1:N])
           if param
               @variable(model, p[1:N], ParameterJuMP.Param())
           else
               @variable(model, p[1:N])
           end
           @expression(model, y[i=1:N], x[i] + p[i])
           for j in 1:M
               @constraint(model, sum(j * i * y[i] for i in 1:N) == 0)
           end
       end
foo (generic function with 2 methods)

julia> M, N = 1000, 100
(1000, 100)

julia> GC.gc(); @time foo(M, N; param = true);
  0.219966 seconds (432.51 k allocations: 60.510 MiB, 4.07% gc time, 72.76% compilation time)

julia> GC.gc(); @time foo(N, N; param = false);
  0.023459 seconds (19.14 k allocations: 3.442 MiB, 87.16% compilation time)

There's a big difference in allocations there.

odow · 2021-06-17T04:14:26Z

I think I understand the problem

julia> using JuMP, ParameterJuMP

julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> @variable(model, x[1:2])
2-element Vector{VariableRef}:
 x[1]
 x[2]

julia> @variable(model, p, ParameterJuMP.Param())
p

julia> f(x) = ScalarConstraint(x, MOI.EqualTo(0.0))
f (generic function with 1 method)

julia> A = x[1] + x[2]
x[1] + x[2]

julia> B = x[1] + p
x[1] + p

julia> @benchmark f($A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.665 ns (0.00% GC)
  median time:      1.845 ns (0.00% GC)
  mean time:        1.813 ns (0.00% GC)
  maximum time:     20.717 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark f($B)
BenchmarkTools.Trial: 
  memory estimate:  1.56 KiB
  allocs estimate:  19
  --------------
  minimum time:     669.399 ns (0.00% GC)
  median time:      696.278 ns (0.00% GC)
  mean time:        917.536 ns (18.24% GC)
  maximum time:     81.787 μs (98.82% GC)
  --------------
  samples:          10000
  evals/sample:     153

This is Julia's heap/stack optimization in practice. ScalarConstraint is a struct, but it contains mutable fields. Normally, this requires an allocation to the heap. But Julia recently (~~don't remember when @blegat??~~ Since v1.5) introduced an optimization that means in some cases it will but the entire thing on the heap, skipping allocations. I guess that heuristic distinguishes between 1 dictionary and 2!

The heap allocations explain why there is much greater GC in the parameter version, and why creating a parameter thing is slower, even though there isn't a good explanation.

jd-lara · 2021-06-17T04:29:06Z

we saw something similar with structs in PowerSystems that julia spent quite some time moving them between the stack and the heap, so we made them mutable and prevent that. @daniel-thom, do you recall how did we measure that?

jd-lara · 2021-06-17T16:03:00Z

~~For reference, it seems that this issue could be related to JuliaLang/julia#34847.~~ Nope, this seems to have been solved in JuliaLang/julia#34126

jd-lara · 2021-06-17T16:34:49Z

As a reference here is the result of @code_llvm for each case in the example above

julia> @code_llvm f(A)
;  @ REPL[8]:1 within `f'
define void @julia_f_3849({ {}*, [1 x double] }* noalias nocapture sret %0, [1 x {}*]* noalias nocapture %1, {}* nonnull align 8 dereferenceable(16) %2) {
top:
  %3 = getelementptr inbounds [1 x {}*], [1 x {}*]* %1, i64 0, i64 0
  store {}* %2, {}** %3, align 8
  %.repack = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 0
  store {}* %2, {}** %.repack, align 8
  %4 = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 1, i64 0
  store double 0.000000e+00, double* %4, align 8
  ret void
}

julia> @code_llvm f(B)
;  @ REPL[8]:1 within `f'
define void @julia_f_3834({ {}*, [1 x double] }* noalias nocapture sret %0, [1 x {}*]* noalias nocapture %1, {}* nonnull align 8 dereferenceable(16) %2) {
top:
  %3 = alloca [2 x {}*], align 8
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %3, i64 0, i64 0
; ┌ @ /Users/jdlara/cache/JuMP.jl/src/constraints.jl:437 within `ScalarConstraint' @ /Users/jdlara/cache/JuMP.jl/src/constraints.jl:437
   store {}* inttoptr (i64 5878957680 to {}*), {}** %.sub, align 8
   %4 = getelementptr inbounds [2 x {}*], [2 x {}*]* %3, i64 0, i64 1
   store {}* %2, {}** %4, align 8
   %5 = call nonnull {}* @j1_convert_3836({}* inttoptr (i64 4637115040 to {}*), {}** nonnull %.sub, i32 2)
; └
  %6 = getelementptr inbounds [1 x {}*], [1 x {}*]* %1, i64 0, i64 0
  store {}* %5, {}** %6, align 8
  %.repack = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 0
  store {}* %5, {}** %.repack, align 8
  %7 = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 1, i64 0
  store double 0.000000e+00, double* %7, align 8
  ret void
}

jd-lara · 2021-06-17T16:57:39Z

@odow I can't seem to reproduce the issue outside of JuMP

using BenchmarkTools
using DataStructures

struct fooOD{T, U, V}
    a::OrderedDict{T, V}
    b::OrderedDict{U, V}
end

struct barOD{T, V}
   a::OrderedDict{T, V}
end

struct FooBar{F,S}
    a::F
    b::S
end

f(x) = FooBar(x, 1.0)

test_foo_od = fooOD(OrderedDict("A" => 10.0, "B" => 11.0), OrderedDict(3 => 13.0, 4 => 19.0))
test_bar_od = barOD(OrderedDict(1 => 10.0, 2 => 11.0, 3 => 13.0, 4 => 19.0))

@benchmark f($test_bar_od)
@benchmark f($test_foo_od)


bar: one dict
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.664 ns (0.00% GC)
  median time:      2.064 ns (0.00% GC)
  mean time:        2.210 ns (0.00% GC)
  maximum time:     53.048 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

foo: two dicts
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.662 ns (0.00% GC)
  median time:      2.051 ns (0.00% GC)
  mean time:        2.281 ns (0.00% GC)
  maximum time:     133.248 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

jd-lara · 2021-06-17T18:22:35Z

This is a second attempt to reproduce without any success.

using BenchmarkTools
using DataStructures

mutable struct model_
    field1::Dict
end

model_foobar = model_(Dict("key" => 12.3))

abstract type keys end

struct Key1 <: keys
    index::Int
    model::model_
end

struct Key2 <: keys
    index::Int
    model::model_
end

struct FooBar{F,S}
    a::F
    b::S
end

mutable struct fooOD{T, U, V}
    a::OrderedDict{T, V}
    b::OrderedDict{U, V}
end

mutable struct barOD{T, V}
   a::OrderedDict{T, V}
end

test_foo_od = fooOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0), OrderedDict(Key2(1, model_foobar) => 13.0,  Key2(2, model_foobar) => 19.0))
test_bar_od = barOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0, Key1(3, model_foobar) => 13.0, Key1(4, model_foobar) => 19.0))

f(x) = FooBar(x, 1.0)

@benchmark f($test_bar_od)
@benchmark f($test_foo_od)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.757 ns (0.00% GC)
  median time:      1.871 ns (0.00% GC)
  mean time:        1.994 ns (0.00% GC)
  maximum time:     33.388 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.357 ns (0.00% GC)
  median time:      1.413 ns (0.00% GC)
  mean time:        1.533 ns (0.00% GC)
  maximum time:     60.137 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

odow · 2021-06-19T00:02:59Z

Yeah. What's weird is that the following with MyVar doesn't have the problem, even though it is identical to the definition of ParameterRef. Therefore, there must be something about the methods defined in ParameterJuMP that are causing this...

julia> using JuMP, ParameterJuMP, BenchmarkTools

julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> x = VariableRef(model, MOI.VariableIndex(1))
noname

julia> p = ParameterJuMP.ParameterRef(1, model);

julia> struct Foo{A,B}
           a::A
           b::B
       end

julia> A = 2 * x;

julia> B = 2 * p;

julia> foo(x) = Foo(x, 0.0)
foo (generic function with 1 method)

julia> @benchmark foo($A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.617 ns (0.00% GC)
  median time:      1.873 ns (0.00% GC)
  mean time:        1.929 ns (0.00% GC)
  maximum time:     31.496 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark foo($B)
BenchmarkTools.Trial: 
  memory estimate:  1.27 KiB
  allocs estimate:  16
  --------------
  minimum time:     502.902 ns (0.00% GC)
  median time:      520.249 ns (0.00% GC)
  mean time:        653.548 ns (15.73% GC)
  maximum time:     30.005 μs (97.00% GC)
  --------------
  samples:          10000
  evals/sample:     193

julia> using JuMP, ParameterJuMP, BenchmarkTools

julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> x = VariableRef(model, MOI.VariableIndex(1))
noname

julia> struct MyVar <: JuMP.AbstractVariableRef 
           x::Int64
           model::Model
       end

julia> p = MyVar(1, model);

julia> struct Foo{A,B}
           a::A
           b::B
       end

julia> A = 2 * x;

julia> B = 2 * p;

julia> foo(x) = Foo(x, 0.0)
foo (generic function with 1 method)

julia> @benchmark foo($A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.617 ns (0.00% GC)
  median time:      1.877 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     20.480 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark foo($B)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.619 ns (0.00% GC)
  median time:      1.870 ns (0.00% GC)
  mean time:        1.903 ns (0.00% GC)
  maximum time:     22.370 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

joaquimg · 2021-06-19T20:29:42Z

@odow's last example suggests that we can try commenting out stuff from the package to find the culprit.

jd-lara · 2021-06-19T21:45:12Z

from the same for @odow pasted

Just the creation of a 2 * ParameterRef is significantly different than 2 * VariableRef. When 2 * x gets called, it goes to JuMP._build_aff_expr while in the case of 2 * p it is creating a pair (rhs => lhs)

jd-lara · 2021-06-19T21:52:16Z

using JuMP
using ParameterJuMP
using Profile
using PProf
using BenchmarkTools

model = Model()
x = VariableRef(model, MOI.VariableIndex(1))
p = ParameterJuMP.ParameterRef(1, model)

struct Foo{A,B}
    a::A
    b::B
end

A1 = 2 * x
B1 = 2 * p

A2 = JuMP._build_aff_expr(0.0, 2.0, x)
B2 = JuMP._build_aff_expr(0.0, 2.0, p)

foo(x) = Foo(x, 0.0)

@benchmark foo($A1)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.393 ns (0.00% GC)
  median time:      1.401 ns (0.00% GC)
  mean time:        1.473 ns (0.00% GC)
  maximum time:     42.435 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@benchmark foo($B1)

BenchmarkTools.Trial: 
  memory estimate:  1.27 KiB
  allocs estimate:  16
  --------------
  minimum time:     537.564 ns (0.00% GC)
  median time:      562.479 ns (0.00% GC)
  mean time:        755.226 ns (19.27% GC)
  maximum time:     43.067 μs (98.49% GC)
  --------------
  samples:          10000
  evals/sample:     188

@benchmark foo($A2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.473 ns (0.00% GC)
  median time:      1.519 ns (0.00% GC)
  mean time:        1.697 ns (0.00% GC)
  maximum time:     55.618 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@benchmark foo($B2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.474 ns (0.00% GC)
  median time:      1.642 ns (0.00% GC)
  mean time:        1.744 ns (0.00% GC)
  maximum time:     38.107 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@joaquimg seems that in fact there is something off with the code for PAE's

odow · 2021-06-19T23:32:35Z

Just the creation of a 2 * ParameterRef is significantly different than 2 * VariableRef.

~~That's because it's happening outside the macros. This doesn't get called usually.~~ O wait. I mis-read your benchmark code. Let me try it.

odow · 2021-06-19T23:58:27Z

Ah. I'm an idiot.

I assumed that 2 * p made a GenericAffExpr{Float64,ParameterRef}, but it doesn't:

julia> typeof(A1)
AffExpr (alias for GenericAffExpr{Float64, VariableRef})

julia> typeof(B1)
ParameterJuMP.DoubleGenericAffExpr{Int64, VariableRef, ParameterRef}

This is just the difference between GenericAffExpr and DoubleGenericAffExpr.

It seems reasonable that you could put the first on the stack and the second on the heap.

joaquimg · 2021-06-20T21:55:33Z

The rationale of creating directly a DoubleGenericAffExpr was that variable would be used inevitably. hence avoid some type instability. And allowing in-place update all the way.
Maybe GenericAffExpr{Float64,ParameterRef} should be created and DoubleGenericAffExpr would only appear when really needed...

joaquimg · 2021-06-20T22:02:28Z

Few other things to consider are:
1 - Make DoubleGenericAffExpr immutable.

2 - Replace:

mutable struct DoubleGenericAffExpr{C,V,P} <: JuMP.AbstractJuMPScalar
v::JuMP.GenericAffExpr{C,V}
p::JuMP.GenericAffExpr{C,P}
end

by:

mutable struct GenericAffExpr{CoefType,VarType,ParType} <: AbstractJuMPScalar
    constant::CoefType
    v_terms::OrderedDict{VarType,CoefType}
    p_terms::OrderedDict{ParType,CoefType}
end

odow · 2021-06-20T22:18:23Z

Make DoubleGenericAffExpr immutable

This might work.

Option 2 is unlikely to make a difference. If Julia won't put a DoubleGenericAffExpr on the stack, it's unlikely to do so with the two OrderedDicts.

jd-lara · 2021-06-20T22:24:41Z

This is a second attempt to reproduce without any success.

using BenchmarkTools
using DataStructures

mutable struct model_
    field1::Dict
end

model_foobar = model_(Dict("key" => 12.3))

abstract type keys end

struct Key1 <: keys
    index::Int
    model::model_
end

struct Key2 <: keys
    index::Int
    model::model_
end

struct FooBar{F,S}
    a::F
    b::S
end

mutable struct fooOD{T, U, V}
    a::OrderedDict{T, V}
    b::OrderedDict{U, V}
end

mutable struct barOD{T, V}
   a::OrderedDict{T, V}
end

test_foo_od = fooOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0), OrderedDict(Key2(1, model_foobar) => 13.0,  Key2(2, model_foobar) => 19.0))
test_bar_od = barOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0, Key1(3, model_foobar) => 13.0, Key1(4, model_foobar) => 19.0))

f(x) = FooBar(x, 1.0)

@benchmark f($test_bar_od)
@benchmark f($test_foo_od)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.757 ns (0.00% GC)
  median time:      1.871 ns (0.00% GC)
  mean time:        1.994 ns (0.00% GC)
  maximum time:     33.388 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.357 ns (0.00% GC)
  median time:      1.413 ns (0.00% GC)
  mean time:        1.533 ns (0.00% GC)
  maximum time:     60.137 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@odow in this example with 2 ordered dicts it worked.

joaquimg · 2021-07-25T23:20:42Z

@odow in this example with 2 ordered dicts it worked.

This is the reason for suggestion 2 of my post.
Currently, there is an extra layer which is a JuMP.GenericAffExpr before the OrderedDict.
This might be bad for the compiler.

jd-lara · 2021-07-25T23:24:00Z

@odow in this example with 2 ordered dicts it worked.

This is the reason for suggestion 2 of my post.

Currently, there is an extra layer which is a JuMP.GenericAffExpr before the OrderedDict.

This might be bad for the compiler.

I started implementing suggestion 2. Progress has been slow.

jd-lara mentioned this issue May 26, 2021

performance patch3 NREL-Sienna/PowerSimulations.jl#691

Merged

odow mentioned this issue Jun 17, 2021

DNMY: Tests for performance improvements jump-dev/JuMP.jl#2629

Closed

odow mentioned this issue Jun 19, 2021

Affine expressions are slow jump-dev/JuMP.jl#2628

Closed

jd-lara mentioned this issue Jul 25, 2021

Use MOI.DoubleDict for all dicts that have ConstraintIndex as key jump-dev/ParametricOptInterface.jl#32

Closed

jd-lara mentioned this issue Dec 17, 2021

Tracking issue for known performance challenges NREL-Sienna/PowerSimulations.jl#800

Open

5 tasks

jd-lara mentioned this issue Feb 14, 2023

Add Parameter{T} <: AbstractScalarSet jump-dev/MathOptInterface.jl#2094

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance disparity in expression operations #85

Performance disparity in expression operations #85

jd-lara commented May 26, 2021

jd-lara commented May 26, 2021

odow commented May 26, 2021

jd-lara commented May 26, 2021

odow commented May 26, 2021

jd-lara commented May 28, 2021 •

edited

odow commented May 28, 2021

jd-lara commented May 28, 2021

odow commented May 28, 2021

joaquimg commented Jun 3, 2021

jd-lara commented Jun 3, 2021

joaquimg commented Jun 7, 2021

jd-lara commented Jun 7, 2021

jd-lara commented Jun 16, 2021 •

edited

jd-lara commented Jun 16, 2021

odow commented Jun 17, 2021 •

edited

odow commented Jun 17, 2021 •

edited

jd-lara commented Jun 17, 2021

jd-lara commented Jun 17, 2021 •

edited

jd-lara commented Jun 17, 2021

jd-lara commented Jun 17, 2021 •

edited

jd-lara commented Jun 17, 2021 •

edited

odow commented Jun 19, 2021

joaquimg commented Jun 19, 2021

jd-lara commented Jun 19, 2021 •

edited

jd-lara commented Jun 19, 2021 •

edited

odow commented Jun 19, 2021 •

edited

odow commented Jun 19, 2021

joaquimg commented Jun 20, 2021

joaquimg commented Jun 20, 2021

odow commented Jun 20, 2021

jd-lara commented Jun 20, 2021

joaquimg commented Jul 25, 2021

jd-lara commented Jul 25, 2021

Performance disparity in expression operations #85

Performance disparity in expression operations #85

Comments

jd-lara commented May 26, 2021

jd-lara commented May 26, 2021

odow commented May 26, 2021

jd-lara commented May 26, 2021

odow commented May 26, 2021

jd-lara commented May 28, 2021 • edited

odow commented May 28, 2021

jd-lara commented May 28, 2021

odow commented May 28, 2021

joaquimg commented Jun 3, 2021

jd-lara commented Jun 3, 2021

joaquimg commented Jun 7, 2021

jd-lara commented Jun 7, 2021

jd-lara commented Jun 16, 2021 • edited

jd-lara commented Jun 16, 2021

odow commented Jun 17, 2021 • edited

odow commented Jun 17, 2021 • edited

jd-lara commented Jun 17, 2021

jd-lara commented Jun 17, 2021 • edited

jd-lara commented Jun 17, 2021

jd-lara commented Jun 17, 2021 • edited

jd-lara commented Jun 17, 2021 • edited

odow commented Jun 19, 2021

joaquimg commented Jun 19, 2021

jd-lara commented Jun 19, 2021 • edited

jd-lara commented Jun 19, 2021 • edited

odow commented Jun 19, 2021 • edited

odow commented Jun 19, 2021

joaquimg commented Jun 20, 2021

joaquimg commented Jun 20, 2021

odow commented Jun 20, 2021

jd-lara commented Jun 20, 2021

joaquimg commented Jul 25, 2021

jd-lara commented Jul 25, 2021

jd-lara commented May 28, 2021 •

edited

jd-lara commented Jun 16, 2021 •

edited

odow commented Jun 17, 2021 •

edited

odow commented Jun 17, 2021 •

edited

jd-lara commented Jun 17, 2021 •

edited

jd-lara commented Jun 17, 2021 •

edited

jd-lara commented Jun 17, 2021 •

edited

jd-lara commented Jun 19, 2021 •

edited

jd-lara commented Jun 19, 2021 •

edited

odow commented Jun 19, 2021 •

edited