Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance disparity in expression operations #85

Open
jd-lara opened this issue May 26, 2021 · 33 comments
Open

Performance disparity in expression operations #85

jd-lara opened this issue May 26, 2021 · 33 comments

Comments

@jd-lara
Copy link
Contributor

jd-lara commented May 26, 2021

I made a test of the performance impact of having Parameters in an affine expression for a different number of expressions in the vector product operation.

The test compares 3 possible cases.

  1. Expressions with parameters ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}
  2. Expresions without parameters. JuMP.GenericAffExpr{Float64, JuMP.VariableRef}
  3. Mix of expressions using JuMP.GenericAffExpr{Float64, JuMP.VariableRef} and JuMP.GenericAffExpr{Float64, JuMP.VariableRef}

Here is some reproducible code:

using JuMP
using ParameterJuMP
using Random

function model_with_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
    end

    col = rand(size_1,)
    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed @constraint(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_without_params(size_1, size_2)
    m2 = Model()

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m2, [v in set_names_1, t in time])
    x = @variable(m2, [v in set_names_2, t in time])

    expression_container = JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
    end

    col = rand(size_1,)
    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed @constraint(m2, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_with_mixed_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    param_ixs = shuffle(1:size_1)[1:Int(size_1/2)]
    for i in 1:size_1, t in 1:size_2
        if i in param_ixs
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
        else
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
        end
    end

    col = rand(size_1,)
    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed @constraint(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)
    return build_time, build_bytes_alloc, build_sec_in_gc
end


model_with_params(10, 48)
model_without_params(10, 48)
model_with_mixed_params(10, 48)

results = Dict()
results_params = Dict()
results_mixed = Dict()
for i in [100, 1000, 5000, 10_000, 50_000, 100_000, 500_000]
    println(i)
    @show results[i] = model_without_params(i, 48)
    @show results_params[i] = model_with_params(i, 48)
    @show results_mixed = model_with_mixed_params(i, 48)
end

Results:

100
results[i] = model_without_params(i, 48) = (0.00015558, 29488, 0.0)
results_params[i] = model_with_params(i, 48) = (0.000182823, 76048, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.000172758, 67904, 0.0)
1000
results[i] = model_without_params(i, 48) = (0.000626027, 197072, 0.0)
results_params[i] = model_with_params(i, 48) = (0.001597523, 560976, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.001377474, 462640, 0.0)
5000
results[i] = model_without_params(i, 48) = (0.00356614, 1241216, 0.0)
results_params[i] = model_with_params(i, 48) = (0.008178023, 3541104, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.006022408, 2940048, 0.0)
10000
results[i] = model_without_params(i, 48) = (0.006917794, 2477760, 0.0)
results_params[i] = model_with_params(i, 48) = (0.019982686, 7071424, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.014490844, 6131968, 0.0)
50000
results[i] = model_without_params(i, 48) = (0.048876917, 10544256, 0.0)
results_params[i] = model_with_params(i, 48) = (0.234299942, 31453584, 0.11880938)
results_mixed = model_with_mixed_params(i, 48) = (0.322152996, 25348384, 0.203175162)
100000
results[i] = model_without_params(i, 48) = (0.106661881, 18983120, 0.0)
results_params[i] = model_with_params(i, 48) = (7.070552948, 54502512, 6.839998649)
results_mixed = model_with_mixed_params(i, 48) = (0.384903389, 48584096, 0.173239921)
@jd-lara
Copy link
Contributor Author

jd-lara commented May 26, 2021

The 500 000 case finished in 4 seconds for GenericAffExpr and crashed my julia session for the cases with parameters.

@odow
Copy link
Member

odow commented May 26, 2021

I think your results are getting mixed with when the GC runs.

using JuMP
using ParameterJuMP
using Random

function model_with_params(m)
    model = Model()
    @variable(model, x[1:m])
    @variable(model, y[1:m])
    @variable(model, p[1:m] == 1.0, ParameterJuMP.Param())
    @expression(model, expr[i=1:m], x[i] + y[i] + p[i] + rand())
    col = rand(m)
    @time @constraint(model, sum(col[i] * expr[i] for i in 1:m) == 0.0)
    return
end

function model_without_params(m)
    model = Model()
    @variable(model, x[1:m])
    @variable(model, y[1:m])
    @variable(model, p[1:m] == 1.0)
    @expression(model, expr[i=1:m], x[i] + y[i] + p[i] + rand())
    col = rand(m)
    @time @constraint(model, sum(col[i] * expr[i] for i in 1:m) == 0.0)
    return
end

model_with_params(10)
model_without_params(10)
for i in [100, 1000, 10_000, 50_000, 100_000, 500_000]
    println(i)
    GC.gc()
    model_without_params(i)
    GC.gc()
    model_with_params(i)
end

Yields

100
  0.000066 seconds (33 allocations: 35.188 KiB)
  0.000130 seconds (102 allocations: 69.453 KiB)
1000
  0.000607 seconds (45 allocations: 299.047 KiB)
  0.001099 seconds (137 allocations: 543.016 KiB)
10000
  0.006241 seconds (57 allocations: 3.534 MiB)
  0.027691 seconds (184 allocations: 6.740 MiB, 53.47% gc time)
50000
  0.038874 seconds (65 allocations: 13.946 MiB)
  0.072116 seconds (214 allocations: 29.975 MiB)
100000
  0.081927 seconds (70 allocations: 25.032 MiB)
  0.178597 seconds (228 allocations: 51.979 MiB)
500000
  0.631749 seconds (79 allocations: 116.645 MiB)
  1.029031 seconds (259 allocations: 202.852 MiB)

So there is a ~2x penalty to using parameters. Which makes sense: every expression has two dictionaries.

@jd-lara
Copy link
Contributor Author

jd-lara commented May 26, 2021

Thanks, that makes sense.

@odow
Copy link
Member

odow commented May 26, 2021

@jd-lara can you try the following instead?

Replace

@constraint(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1) == 0.0)

with

ex = @expression(m, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
c = ScalarConstraint(ex, MOI.EqualTo(0.0))
add_constraint(m, c, "")

@jd-lara
Copy link
Contributor Author

jd-lara commented May 28, 2021

@odow I have updated the test as follows:

using JuMP
using ParameterJuMP
using Random

function _make_constraint(model::JuMP.Model, col::Vector{Float64}, expression_container::JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}, size_1::Int)
    ex = @expression(model, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
    c = ScalarConstraint(ex, MOI.EqualTo(0.0))
    add_constraint(model, c, "")
end

function _make_constraint(model::JuMP.Model, col::Vector{Float64}, expression_container::JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}, size_1::Int)
    ex = @expression(model, sum(col[i]*expression_container[i, 1] for i in 1:size_1))
    c = ScalarConstraint(ex, MOI.EqualTo(0.0))
    add_constraint(model, c, "")
end

function model_with_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
    end

    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed _make_constraint(m, rand(size_1,), expression_container, size_1)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_without_params(size_1, size_2)
    m2 = Model()

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m2, [v in set_names_1, t in time])
    x = @variable(m2, [v in set_names_2, t in time])

    expression_container = JuMP.Containers.DenseAxisArray{JuMP.GenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    for i in 1:size_1, t in 1:size_2
        expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
    end

    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed _make_constraint(m2, rand(size_1,), expression_container, size_1)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

function model_with_mixed_params(size_1, size_2)
    m = Model()
    ParameterJuMP.enable_parameters(m)

    set_names_1 = [randstring() for _ in 1:size_1]
    set_names_2 = [randstring() for _ in 1:size_1]
    time = 1:size_2

    z = @variable(m, [v in set_names_1, t in time])
    x = @variable(m, [v in set_names_2, t in time])

    param_array = Matrix(undef, size_1, size_2)
    for i in 1:size_1, t in 1:size_2
        param_array[i, t] = JuMP.@variable(m, variable_type = ParameterJuMP.Param())
        ParameterJuMP.set_value(param_array[i, t], rand())
    end

    expression_container = JuMP.Containers.DenseAxisArray{ParameterJuMP.ParametrizedGenericAffExpr{Float64, JuMP.VariableRef}}(undef, 1:size_1, 1:size_2)

    param_ixs = shuffle(1:size_1)[1:Int(size_1/2)]
    for i in 1:size_1, t in 1:size_2
        if i in param_ixs
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + param_array[i, t] + rand()
        else
            expression_container[i, t] = z[set_names_1[i], t] + 5.0*x[set_names_2[i], t] + rand()
        end
    end


    _, build_time, build_bytes_alloc, build_sec_in_gc =
            @timed _make_constraint(m, rand(size_1,), expression_container, size_1)
    return build_time, build_bytes_alloc, build_sec_in_gc
end

# FIrst compile run
model_with_params(10, 48)
model_without_params(10, 48)
model_with_mixed_params(10, 48)

results = Dict()
results_params = Dict()
results_mixed = Dict()
for i in [100, 1000, 5000, 10_000, 50_000, 100_000]
    println(i)
    @show results[i] = model_without_params(i, 48)
    GC.gc()
    @show results_params[i] = model_with_params(i, 48)
    GC.gc()
    @show results_mixed = model_with_mixed_params(i, 48)
end

Looks like up to 10000 expressions, the penalty is about 2x as expected and over 50000 the penalty is 4x or more

100
results[i] = model_without_params(i, 48) = (9.7006e-5, 30384, 0.0)
results_params[i] = model_with_params(i, 48) = (0.000171875, 76912, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.000131348, 68880, 0.0)
1000
results[i] = model_without_params(i, 48) = (0.000831869, 205200, 0.0)
results_params[i] = model_with_params(i, 48) = (0.001245529, 569072, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.001065928, 470624, 0.0)
5000
results[i] = model_without_params(i, 48) = (0.003808976, 1281296, 0.0)
results_params[i] = model_with_params(i, 48) = (0.008973684, 3581152, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.005882702, 2980096, 0.0)
10000
results[i] = model_without_params(i, 48) = (0.008080068, 2557840, 0.0)
results_params[i] = model_with_params(i, 48) = (0.01753953, 7151472, 0.0)
results_mixed = model_with_mixed_params(i, 48) = (0.01423789, 6212016, 0.0)
50000
results[i] = model_without_params(i, 48) = (0.050672775, 10944336, 0.0)
results_params[i] = model_with_params(i, 48) = (0.228111582, 31853632, 0.116043022)
results_mixed = model_with_mixed_params(i, 48) = (0.085427278, 25748320, 0.0)
100000
results[i] = model_without_params(i, 48) = (0.117079187, 19783200, 0.0)
results_params[i] = model_with_params(i, 48) = (0.428364249, 55302560, 0.13931007)
results_mixed = model_with_mixed_params(i, 48) = (0.668327524, 49384144, 0.377572102)

@odow
Copy link
Member

odow commented May 28, 2021

That's all just GC differences though. If you subtract the GC time, you get 2.5x again:

julia> (0.428364249 - 0.13931007) / 0.117079187
2.4688775725782923

julia> (0.228111582 - 0.116043022) / 0.050672775
2.211612843385822

julia> (0.668327524 - 0.377572102) / 0.117079187
2.4834082764855543

@jd-lara
Copy link
Contributor Author

jd-lara commented May 28, 2021

So the question is how to reduce the GC times.

@odow
Copy link
Member

odow commented May 28, 2021

It'd be good to confirm why the 2x exists first. Is it creating the extra dictionary? Extra key lookups? Resizing the two dictionaries?

One option that might be faster is to refactor everything into a single dictionary with Tuple{Bool,VariableRef} keys.

That might also help GC by creating less dictionaries.

@joaquimg
Copy link
Collaborator

joaquimg commented Jun 3, 2021

I dont understand why having two dicts would be much worse than having a single larger one.
Dis you run any profiling?

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 3, 2021

I have been doing @code_warntype to check type stability but I haven't explored more since this issue.

@joaquimg
Copy link
Collaborator

joaquimg commented Jun 7, 2021

Have you tried: https://github.com/timholy/ProfileView.jl ?

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 7, 2021

@joaquimg I made some observations with @kdheepak with pprof() but haven't found anything definitive

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 16, 2021

Profile using Parameters

image

Profile without parameters

image

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 16, 2021

@odow
Copy link
Member

odow commented Jun 17, 2021

Okay. I have a MWE

julia> using JuMP, ParameterJuMP

julia> function foo(M, N; param)
           model = Model()
           @variable(model, x[1:N])
           if param
               @variable(model, p[1:N], ParameterJuMP.Param())
           else
               @variable(model, p[1:N])
           end
           @expression(model, y[i=1:N], x[i] + p[i])
           for j in 1:M
               @constraint(model, sum(j * i * y[i] for i in 1:N) == 0)
           end
       end
foo (generic function with 2 methods)

julia> M, N = 1000, 100
(1000, 100)

julia> GC.gc(); @time foo(M, N; param = true);
  0.219966 seconds (432.51 k allocations: 60.510 MiB, 4.07% gc time, 72.76% compilation time)

julia> GC.gc(); @time foo(N, N; param = false);
  0.023459 seconds (19.14 k allocations: 3.442 MiB, 87.16% compilation time)

There's a big difference in allocations there.

@odow
Copy link
Member

odow commented Jun 17, 2021

I think I understand the problem

julia> using JuMP, ParameterJuMP

julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> @variable(model, x[1:2])
2-element Vector{VariableRef}:
 x[1]
 x[2]

julia> @variable(model, p, ParameterJuMP.Param())
p

julia> f(x) = ScalarConstraint(x, MOI.EqualTo(0.0))
f (generic function with 1 method)

julia> A = x[1] + x[2]
x[1] + x[2]

julia> B = x[1] + p
x[1] + p

julia> @benchmark f($A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.665 ns (0.00% GC)
  median time:      1.845 ns (0.00% GC)
  mean time:        1.813 ns (0.00% GC)
  maximum time:     20.717 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark f($B)
BenchmarkTools.Trial: 
  memory estimate:  1.56 KiB
  allocs estimate:  19
  --------------
  minimum time:     669.399 ns (0.00% GC)
  median time:      696.278 ns (0.00% GC)
  mean time:        917.536 ns (18.24% GC)
  maximum time:     81.787 μs (98.82% GC)
  --------------
  samples:          10000
  evals/sample:     153

This is Julia's heap/stack optimization in practice. ScalarConstraint is a struct, but it contains mutable fields. Normally, this requires an allocation to the heap. But Julia recently (don't remember when @blegat?? Since v1.5) introduced an optimization that means in some cases it will but the entire thing on the heap, skipping allocations. I guess that heuristic distinguishes between 1 dictionary and 2!

The heap allocations explain why there is much greater GC in the parameter version, and why creating a parameter thing is slower, even though there isn't a good explanation.

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 17, 2021

we saw something similar with structs in PowerSystems that julia spent quite some time moving them between the stack and the heap, so we made them mutable and prevent that. @daniel-thom, do you recall how did we measure that?

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 17, 2021

For reference, it seems that this issue could be related to JuliaLang/julia#34847. Nope, this seems to have been solved in JuliaLang/julia#34126

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 17, 2021

As a reference here is the result of @code_llvm for each case in the example above

julia> @code_llvm f(A)
;  @ REPL[8]:1 within `f'
define void @julia_f_3849({ {}*, [1 x double] }* noalias nocapture sret %0, [1 x {}*]* noalias nocapture %1, {}* nonnull align 8 dereferenceable(16) %2) {
top:
  %3 = getelementptr inbounds [1 x {}*], [1 x {}*]* %1, i64 0, i64 0
  store {}* %2, {}** %3, align 8
  %.repack = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 0
  store {}* %2, {}** %.repack, align 8
  %4 = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 1, i64 0
  store double 0.000000e+00, double* %4, align 8
  ret void
}
julia> @code_llvm f(B)
;  @ REPL[8]:1 within `f'
define void @julia_f_3834({ {}*, [1 x double] }* noalias nocapture sret %0, [1 x {}*]* noalias nocapture %1, {}* nonnull align 8 dereferenceable(16) %2) {
top:
  %3 = alloca [2 x {}*], align 8
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %3, i64 0, i64 0
; ┌ @ /Users/jdlara/cache/JuMP.jl/src/constraints.jl:437 within `ScalarConstraint' @ /Users/jdlara/cache/JuMP.jl/src/constraints.jl:437
   store {}* inttoptr (i64 5878957680 to {}*), {}** %.sub, align 8
   %4 = getelementptr inbounds [2 x {}*], [2 x {}*]* %3, i64 0, i64 1
   store {}* %2, {}** %4, align 8
   %5 = call nonnull {}* @j1_convert_3836({}* inttoptr (i64 4637115040 to {}*), {}** nonnull %.sub, i32 2)
; └
  %6 = getelementptr inbounds [1 x {}*], [1 x {}*]* %1, i64 0, i64 0
  store {}* %5, {}** %6, align 8
  %.repack = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 0
  store {}* %5, {}** %.repack, align 8
  %7 = getelementptr inbounds { {}*, [1 x double] }, { {}*, [1 x double] }* %0, i64 0, i32 1, i64 0
  store double 0.000000e+00, double* %7, align 8
  ret void
}

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 17, 2021

@odow I can't seem to reproduce the issue outside of JuMP

using BenchmarkTools
using DataStructures

struct fooOD{T, U, V}
    a::OrderedDict{T, V}
    b::OrderedDict{U, V}
end

struct barOD{T, V}
   a::OrderedDict{T, V}
end

struct FooBar{F,S}
    a::F
    b::S
end

f(x) = FooBar(x, 1.0)

test_foo_od = fooOD(OrderedDict("A" => 10.0, "B" => 11.0), OrderedDict(3 => 13.0, 4 => 19.0))
test_bar_od = barOD(OrderedDict(1 => 10.0, 2 => 11.0, 3 => 13.0, 4 => 19.0))

@benchmark f($test_bar_od)
@benchmark f($test_foo_od)


bar: one dict
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.664 ns (0.00% GC)
  median time:      2.064 ns (0.00% GC)
  mean time:        2.210 ns (0.00% GC)
  maximum time:     53.048 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

foo: two dicts
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.662 ns (0.00% GC)
  median time:      2.051 ns (0.00% GC)
  mean time:        2.281 ns (0.00% GC)
  maximum time:     133.248 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 17, 2021

This is a second attempt to reproduce without any success.

using BenchmarkTools
using DataStructures

mutable struct model_
    field1::Dict
end

model_foobar = model_(Dict("key" => 12.3))

abstract type keys end

struct Key1 <: keys
    index::Int
    model::model_
end

struct Key2 <: keys
    index::Int
    model::model_
end

struct FooBar{F,S}
    a::F
    b::S
end

mutable struct fooOD{T, U, V}
    a::OrderedDict{T, V}
    b::OrderedDict{U, V}
end

mutable struct barOD{T, V}
   a::OrderedDict{T, V}
end

test_foo_od = fooOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0), OrderedDict(Key2(1, model_foobar) => 13.0,  Key2(2, model_foobar) => 19.0))
test_bar_od = barOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0, Key1(3, model_foobar) => 13.0, Key1(4, model_foobar) => 19.0))

f(x) = FooBar(x, 1.0)

@benchmark f($test_bar_od)
@benchmark f($test_foo_od)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.757 ns (0.00% GC)
  median time:      1.871 ns (0.00% GC)
  mean time:        1.994 ns (0.00% GC)
  maximum time:     33.388 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.357 ns (0.00% GC)
  median time:      1.413 ns (0.00% GC)
  mean time:        1.533 ns (0.00% GC)
  maximum time:     60.137 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@odow
Copy link
Member

odow commented Jun 19, 2021

Yeah. What's weird is that the following with MyVar doesn't have the problem, even though it is identical to the definition of ParameterRef. Therefore, there must be something about the methods defined in ParameterJuMP that are causing this...

julia> using JuMP, ParameterJuMP, BenchmarkTools

julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> x = VariableRef(model, MOI.VariableIndex(1))
noname

julia> p = ParameterJuMP.ParameterRef(1, model);

julia> struct Foo{A,B}
           a::A
           b::B
       end

julia> A = 2 * x;

julia> B = 2 * p;

julia> foo(x) = Foo(x, 0.0)
foo (generic function with 1 method)

julia> @benchmark foo($A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.617 ns (0.00% GC)
  median time:      1.873 ns (0.00% GC)
  mean time:        1.929 ns (0.00% GC)
  maximum time:     31.496 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark foo($B)
BenchmarkTools.Trial: 
  memory estimate:  1.27 KiB
  allocs estimate:  16
  --------------
  minimum time:     502.902 ns (0.00% GC)
  median time:      520.249 ns (0.00% GC)
  mean time:        653.548 ns (15.73% GC)
  maximum time:     30.005 μs (97.00% GC)
  --------------
  samples:          10000
  evals/sample:     193

julia> using JuMP, ParameterJuMP, BenchmarkTools

julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> x = VariableRef(model, MOI.VariableIndex(1))
noname

julia> struct MyVar <: JuMP.AbstractVariableRef 
           x::Int64
           model::Model
       end

julia> p = MyVar(1, model);

julia> struct Foo{A,B}
           a::A
           b::B
       end

julia> A = 2 * x;

julia> B = 2 * p;

julia> foo(x) = Foo(x, 0.0)
foo (generic function with 1 method)

julia> @benchmark foo($A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.617 ns (0.00% GC)
  median time:      1.877 ns (0.00% GC)
  mean time:        1.904 ns (0.00% GC)
  maximum time:     20.480 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark foo($B)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.619 ns (0.00% GC)
  median time:      1.870 ns (0.00% GC)
  mean time:        1.903 ns (0.00% GC)
  maximum time:     22.370 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@joaquimg
Copy link
Collaborator

@odow's last example suggests that we can try commenting out stuff from the package to find the culprit.

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 19, 2021

from the same for @odow pasted

image

image

Just the creation of a 2 * ParameterRef is significantly different than 2 * VariableRef. When 2 * x gets called, it goes to JuMP._build_aff_expr while in the case of 2 * p it is creating a pair (rhs => lhs)

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 19, 2021

using JuMP
using ParameterJuMP
using Profile
using PProf
using BenchmarkTools

model = Model()
x = VariableRef(model, MOI.VariableIndex(1))
p = ParameterJuMP.ParameterRef(1, model)

struct Foo{A,B}
    a::A
    b::B
end

A1 = 2 * x
B1 = 2 * p

A2 = JuMP._build_aff_expr(0.0, 2.0, x)
B2 = JuMP._build_aff_expr(0.0, 2.0, p)

foo(x) = Foo(x, 0.0)

@benchmark foo($A1)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.393 ns (0.00% GC)
  median time:      1.401 ns (0.00% GC)
  mean time:        1.473 ns (0.00% GC)
  maximum time:     42.435 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@benchmark foo($B1)

BenchmarkTools.Trial: 
  memory estimate:  1.27 KiB
  allocs estimate:  16
  --------------
  minimum time:     537.564 ns (0.00% GC)
  median time:      562.479 ns (0.00% GC)
  mean time:        755.226 ns (19.27% GC)
  maximum time:     43.067 μs (98.49% GC)
  --------------
  samples:          10000
  evals/sample:     188

@benchmark foo($A2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.473 ns (0.00% GC)
  median time:      1.519 ns (0.00% GC)
  mean time:        1.697 ns (0.00% GC)
  maximum time:     55.618 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@benchmark foo($B2)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.474 ns (0.00% GC)
  median time:      1.642 ns (0.00% GC)
  mean time:        1.744 ns (0.00% GC)
  maximum time:     38.107 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@joaquimg seems that in fact there is something off with the code for PAE's

@odow
Copy link
Member

odow commented Jun 19, 2021

Just the creation of a 2 * ParameterRef is significantly different than 2 * VariableRef.

That's because it's happening outside the macros. This doesn't get called usually. O wait. I mis-read your benchmark code. Let me try it.

@odow
Copy link
Member

odow commented Jun 19, 2021

Ah. I'm an idiot.

I assumed that 2 * p made a GenericAffExpr{Float64,ParameterRef}, but it doesn't:

julia> typeof(A1)
AffExpr (alias for GenericAffExpr{Float64, VariableRef})

julia> typeof(B1)
ParameterJuMP.DoubleGenericAffExpr{Int64, VariableRef, ParameterRef}

This is just the difference between GenericAffExpr and DoubleGenericAffExpr.

It seems reasonable that you could put the first on the stack and the second on the heap.

@joaquimg
Copy link
Collaborator

The rationale of creating directly a DoubleGenericAffExpr was that variable would be used inevitably. hence avoid some type instability. And allowing in-place update all the way.
Maybe GenericAffExpr{Float64,ParameterRef} should be created and DoubleGenericAffExpr would only appear when really needed...

@joaquimg
Copy link
Collaborator

Few other things to consider are:
1 - Make DoubleGenericAffExpr immutable.

2 - Replace:

mutable struct DoubleGenericAffExpr{C,V,P} <: JuMP.AbstractJuMPScalar
v::JuMP.GenericAffExpr{C,V}
p::JuMP.GenericAffExpr{C,P}
end

by:

mutable struct GenericAffExpr{CoefType,VarType,ParType} <: AbstractJuMPScalar
    constant::CoefType
    v_terms::OrderedDict{VarType,CoefType}
    p_terms::OrderedDict{ParType,CoefType}
end

@odow
Copy link
Member

odow commented Jun 20, 2021

Make DoubleGenericAffExpr immutable

This might work.

Option 2 is unlikely to make a difference. If Julia won't put a DoubleGenericAffExpr on the stack, it's unlikely to do so with the two OrderedDicts.

@jd-lara
Copy link
Contributor Author

jd-lara commented Jun 20, 2021

This is a second attempt to reproduce without any success.

using BenchmarkTools
using DataStructures

mutable struct model_
    field1::Dict
end

model_foobar = model_(Dict("key" => 12.3))

abstract type keys end

struct Key1 <: keys
    index::Int
    model::model_
end

struct Key2 <: keys
    index::Int
    model::model_
end

struct FooBar{F,S}
    a::F
    b::S
end

mutable struct fooOD{T, U, V}
    a::OrderedDict{T, V}
    b::OrderedDict{U, V}
end

mutable struct barOD{T, V}
   a::OrderedDict{T, V}
end

test_foo_od = fooOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0), OrderedDict(Key2(1, model_foobar) => 13.0,  Key2(2, model_foobar) => 19.0))
test_bar_od = barOD(OrderedDict(Key1(1, model_foobar) => 10.0, Key1(2, model_foobar) => 11.0, Key1(3, model_foobar) => 13.0, Key1(4, model_foobar) => 19.0))

f(x) = FooBar(x, 1.0)

@benchmark f($test_bar_od)
@benchmark f($test_foo_od)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.757 ns (0.00% GC)
  median time:      1.871 ns (0.00% GC)
  mean time:        1.994 ns (0.00% GC)
  maximum time:     33.388 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.357 ns (0.00% GC)
  median time:      1.413 ns (0.00% GC)
  mean time:        1.533 ns (0.00% GC)
  maximum time:     60.137 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

@odow in this example with 2 ordered dicts it worked.

@joaquimg
Copy link
Collaborator

@odow in this example with 2 ordered dicts it worked.

This is the reason for suggestion 2 of my post.
Currently, there is an extra layer which is a JuMP.GenericAffExpr before the OrderedDict.
This might be bad for the compiler.

@jd-lara
Copy link
Contributor Author

jd-lara commented Jul 25, 2021

@odow in this example with 2 ordered dicts it worked.

This is the reason for suggestion 2 of my post.

Currently, there is an extra layer which is a JuMP.GenericAffExpr before the OrderedDict.

This might be bad for the compiler.

I started implementing suggestion 2. Progress has been slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants