[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jolin-io/fall-in-love-with-julia/main?filepath=10%20SciML%20-%2002%20Optimization.ipynb)

<a href="https://www.jolin.io" target="_blank" rel="noreferrer noopener">
<img src="https://www.jolin.io/assets/Jolin/Jolin-Banner-Website-v1.1-darkmode.webp">
</a>

# Fall-in-love-with-Julia: Optimization in Julia 101

an introduction session


I am Stephan Sahm, and today we are going to learn about Optimization.jl

1. Common interface to different optimizers
2. Derivatives
3. Constraints
4. Neural networks - special support for minibatches
5. Symbolic problem specification

# Optimization.jl: A Unified Optimization Package

> Optimization.jl is a package with a scope that is beyond your normal global optimization package. Optimization.jl seeks to bring together all of the optimization packages it can find, local and global, into one unified Julia interface. This means, you learn one package and you learn them all! Optimization.jl adds a few high-level features, such as integrating with automatic differentiation, to make its usage fairly simple for most cases, while allowing all of the options in a single unified interface.

In [None]:
using CommonSolve: solve  # generic interface
import Optimization  # meta package
import OptimizationOptimJL, OptimizationBBO, OptimizationMOI  # specific optimizers
import ModelingToolkit  # symbolic support
import Juniper, Ipopt, AmplNLWriter  # MOI (MathOptInterface ~ JuMP) optimizers
import ForwardDiff, Zygote  # automatic derivatives

import Plots  # plotting
Plots.plotlyjs();

## 1. Common interface to different optimizers

Rosenbrock function

In [None]:
rosenbrock(u, p) =  (p[1] - u[1])^2 + p[2] * (u[2] - u[1]^2)^2
p  = [1.0, 100.0]


![wikipedia](https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Rosenbrock-contour.svg/450px-Rosenbrock-contour.svg.png)

The optimum is on (1,1)

In [None]:
x, y = -2:0.01:2, 0:0.01:2
Plots.surface(x, y, (x,y)->rosenbrock([x,y], p),
    linealpha = 0.3, xlabel="x (u1)", ylabel="y (u2)",
    zlabel="rosenbrock", zscale=:log, c=:deep)

Specify optimization problem with `Optimization.jl`

In [None]:
u0 = zeros(2)
prob = Optimization.OptimizationProblem(rosenbrock, u0, p)

Solve problem by using specific solvers

In [None]:
solve(prob, OptimizationOptimJL.ParticleSwarm())

In [None]:
solve(prob, OptimizationOptimJL.SimulatedAnnealing())

In [None]:
solve(prob, OptimizationOptimJL.NelderMead())

overview optimizers, see also http://optimization.sciml.ai/stable
<table><tbody><tr><th style="text-align: right">Package</th><th style="text-align: center">Local Gradient-Based</th><th style="text-align: center">Local Hessian-Based</th><th style="text-align: center">Local Derivative-Free</th><th style="text-align: center">Local Constrained</th><th style="text-align: center">Global Unconstrained</th><th style="text-align: center">Global Constrained</th></tr><tr><td style="text-align: right">BlackBoxOptim</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">❌</td></tr><tr><td style="text-align: right">CMAEvolutionaryStrategy</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">❌</td></tr><tr><td style="text-align: right">Evolutionary</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td></tr><tr><td style="text-align: right">Flux</td><td style="text-align: center">✅</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td></tr><tr><td style="text-align: right">GCMAES</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">❌</td></tr><tr><td style="text-align: right">MathOptInterface</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td></tr><tr><td style="text-align: right">MultistartOptimization</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">❌</td></tr><tr><td style="text-align: right">Metaheuristics</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td></tr><tr><td style="text-align: right">NOMAD</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td></tr><tr><td style="text-align: right">NLopt</td><td style="text-align: center">✅</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td></tr><tr><td style="text-align: right">Nonconvex</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td><td style="text-align: center">✅</td><td style="text-align: center">🟡</td></tr><tr><td style="text-align: right">Optim</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td><td style="text-align: center">✅</td></tr><tr><td style="text-align: right">QuadDIRECT</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">❌</td><td style="text-align: center">✅</td><td style="text-align: center">❌</td></tr></tbody></table>

In [None]:
sol = solve(prob, OptimizationOptimJL.NelderMead())
sol

In [None]:
fieldnames(typeof(sol))

In [None]:
sol.u

In [None]:
Array(sol)  # very common

In [None]:
sol.minimum

In [None]:
sol.retcode

In [None]:
sol.original

👉 check extra information on ParticleSwarm or SimulatedAnnealing

In [None]:
# your space

## 2. Derivatives

You can define derivatives manually, but most often you want to use automatic differentiation.

There many kinds of auto differentiations but two are most important:
- forward mode
    - use for small number of learned parameters
    - supports full julia
- reverse mode
    - use for large number of learned parameters
    - supports large subset of julia

In [None]:
# forward mode
optf = Optimization.OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff())
prob = Optimization.OptimizationProblem(optf, u0, p)
solve(prob, OptimizationOptimJL.BFGS())

In [None]:
# reverse mode
optf = Optimization.OptimizationFunction(rosenbrock, Optimization.AutoZygote())
prob = Optimization.OptimizationProblem(optf, u0, p)
solve(prob, OptimizationOptimJL.BFGS())

👉 check extra information `original` on those solutions

👉 compare `minimum` to our derivative-free NelderMead solution above 

In [None]:
# your space

## 3. Constraints

Lower and upper bounds:

In [None]:
# Using gradient-free BlackBoxOptimization
prob = Optimization.OptimizationProblem(rosenbrock, u0, p, lb = [-1.0,-1.0], ub = [1.0,1.0])
solve(prob, OptimizationBBO.BBO_adaptive_de_rand_1_bin_radiuslimited())

In [None]:
# gradient method
optf = Optimization.OptimizationFunction(rosenbrock, Optimization.AutoZygote())
prob = Optimization.OptimizationProblem(optf, u0, p, lb = [-1.0,-1.0], ub = [1.0,1.0])
solve(prob, OptimizationOptimJL.BFGS())

Extra constraint functions with bounds,

for example:
$$
-\infty <= u_1^2+u_2^2 <= 0.8 \\
-1.0 <= u_1∗u_2 <= 2.0
$$

In [None]:
# constraints = (sum of squares, product)
function cons(res, x, p)
    res .= [x[1]^2+x[2]^2, x[1]*x[2]]
end

In [None]:
optprob = Optimization.OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff(), cons = cons)
prob = Optimization.OptimizationProblem(optprob, u0, p, lcons = [-Inf, -1.0], ucons = [0.8, 2.0])
sol = solve(prob, OptimizationOptimJL.IPNewton())

In [None]:
sol.original

To manually inspect our constraints we need to create a little array helper.

In [None]:
res = zeros(2)
cons(res, sol.u, p)
res

You can use equality constraints, by having equal lowerbound and upperbounds for your constraints.

$$
u_1^2​+u_2^2​ = 1.0\\
u_1​∗u_2​ = 0.5​
$$

In [None]:
# using symbolic derivatives via ModelingToolkit
optprob = Optimization.OptimizationFunction(rosenbrock, Optimization.AutoModelingToolkit(), cons = cons)
prob = Optimization.OptimizationProblem(optprob, u0, p, lcons = [1.0, 0.5], ucons = [1.0, 0.5])

In [None]:
# using OptimizationMOI under the hood
# here we can directly use the original Optimizer
sol = solve(prob, Ipopt.Optimizer())

In [None]:
res = zeros(2)
cons(res, sol.u, p)
println(res)

👉 try out an arbitrary other constraint and see whether you can solve it

In [None]:
# your space 

## 4. Neural networks - special support for minibatches

In [None]:
import Flux, OptimizationOptimisers
import IterTools, MLUtils, NNlib

👉 Generate 128 datapoints from the polynomial $y = x² - 2x$ and add some noise.

You need `randn`, and you might use `range` for x.

Plot it using `Plots.plot(x, y)`.

In [None]:
# your space
# x = ...
# y = ...

In [None]:
# making sure x and y are matrix of size (1,128)
@assert length(x) == length(y) == 128
x = size(x) == (128,) ? collect(x') : x
y = size(y) == (128,) ? collect(y') : y

In [None]:
nn_flux = Flux.Chain(
    Flux.Dense(1, 16, NNlib.relu),
    Flux.Dense(16, 1),
)
# we need to make the parameters explicit which we want to optimize
# in Flux we can do this via destructure
parameters_initial, reconstruct_nn_flux = Flux.destructure(nn_flux)

In [None]:
# this is how to get a prediction
y_pred = reconstruct_nn_flux(parameters_initial)(x)

👉 plot both the true solution and our prediction

(you may need `Plots.plot!` and `transpose`)

In [None]:
# your space

In [None]:
# a loss function is a simple function
function loss_flux(parameters, x, y)
    y_pred = reconstruct_nn_flux(parameters)(x)
    sum(abs2, y .- y_pred)
end

👉 calculate our initial loss

In [None]:
# your space

Minibatches 🙂

In [None]:
k = 10
minibatches = MLUtils.DataLoader((x, y), batchsize = k)
fieldnames(typeof(minibatches))

In [None]:
x1_batch, y1_batch = first(minibatches)
Plots.plot(x1_batch', y1_batch')

In [None]:
losses = Float64[]
function callback(p, l)  # further outputs of the loss function are given as further input arguments to this callback function
    push!(losses, l)
    if length(losses) % 50 == 0
        Plots.plot(losses, show = :inline, yscale = :log10,
            label = "loss", xlabel = "#epochs", ylabel="loss (log10 scale)")
    end
    return false  # return bool `halt`
end

_optfun = (θ, _, x_batch, y_batch) -> loss_flux(θ, x_batch, y_batch)
optfun = Optimization.OptimizationFunction(_optfun, Optimization.AutoZygote())
optprob = Optimization.OptimizationProblem(optfun, parameters_initial)

numEpochs = 500
sol = solve(
    optprob,
    OptimizationOptimisers.ADAM(0.01),
    IterTools.ncycle(minibatches, numEpochs),
    callback = callback,
)
parameters_learned = sol.minimizer

👉 calculate the final loss

In [None]:
# your space

👉 plot our prediction vs the true solution

In [None]:
# more space

👉 plot our prediction vs the true solution over a much larger range (remember that x and y need to be row vectors)

In [None]:
# your space

⚠️ IMPORTANT: Note that this minibatch data support is not universal, but just supported by a few Optimizer backends. OptimizationOptimisers and OptimizationOptimJL for instance do support this extra data argument. However others like OptimizationMOI does not support it.

This is not only a missing feature, but might really be ill-defined, depending on your solver and your problem. Mini-batches introduce a kind of randomness. Some problems (like differential equations) and some solvers loose their mathematical guarantees as soon as you add such stochasticity.

⚠️ So be always cautious whether minibatches are really appropriate for your optimization problem!


## 5. Symbolic problem specification

In [None]:
using ModelingToolkit: @variables, @parameters, @named

In [None]:
@variables x y
@parameters a b

In [None]:
loss = (a - x)^2 + b * (y - x^2)^2
@named sys = ModelingToolkit.OptimizationSystem(loss, [x, y], [a, b])

Specifying parameters and initial state is a bit more complex: We need to map the symbols to values 

In [None]:
u0 = [
    x => 1.0
    y => 2.0
]
p = [
    a => 6.0
    b => 7.0
];

Now we get numerous benefits: Symbolic auto differentiation, auto-parallelism, sparsification, & many more. You can even hierarchically nest systems to have it generate huge optimization problems.

In [None]:
prob = Optimization.OptimizationProblem(sys, u0, p, grad=true, hess=true)
solve(prob, OptimizationOptimJL.Newton())

For more details about symbolic problem descriptions check out the ModelingToolkit.jl [OptimizationSystem documentation](https://mtk.sciml.ai/dev/systems/OptimizationSystem/).

Further details about Optimization.jl can be found at its [official documentation](http://optimization.sciml.ai/stable/), which was also the main source for the juypter-notebook at hand.

# Thank you for your participation

for questions or suggestions please contact me at stephan.sahm@jolin.io


#### Sponsored by [Jolin.io](https://www.jolin.io)

Jolin.io is an IT-consultancy focussing on Julia

We are there to help you, if you want to
- try out Julia at your company, or
- transition Matlab, Fortran, R, Python, etc. to Julia
- or speed up your existing Julia code

<a href="https://www.jolin.io" target="_blank" rel="noreferrer noopener">
<img src="https://www.jolin.io/assets/Jolin/Jolin-Banner-Website-v1.1-darkmode.webp">
</a>