# Optimization in Julia - Basics

## Contents

- [Getting Started with Optimization.jl](#getting_started)
- [Equality and Inequality Constraints](#equality_and_inequality_constraints)

## Getting Started with Optimization.jl <a id="getting_started" />

In this tutorial, we introduce the basics of Optimization.jl by showing how to easily mix local optimizers and global optimizers on the Rosenbrock equation. The simplest copy-pasteable code using a quasi-Newton method (LBFGS) to solve the Rosenbrock problem is the following:

In [1]:
using Optimization, Zygote

rosenbrock(u, p) = (p[1] - u[1])^2 + p[2] * (u[2] - u[1]^2)^2
u0 = zeros(2)
p = [1.0, 100.0]

optf = OptimizationFunction(rosenbrock, AutoZygote())
prob = OptimizationProblem(optf, u0, p)

sol = solve(prob, Optimization.LBFGS())

retcode: Success
u: 2-element Vector{Float64}:
 0.9999997057368228
 0.999999398151528

### Import a different solver package

OptimizationOptimJL is a wrapper for [Optim.jl](#https://github.com/JuliaNLSolvers/Optim.jl) and OptimizationBBO is a wrapper for [BlackBoxOptim.jl](#https://github.com/robertfeldt/BlackBoxOptim.jl).

First let's use the `NelderMead`, a derivative free solver from Optim.jl:

In [2]:
using OptimizationOptimJL

sol = solve(prob, Optim.NelderMead())

retcode: Success
u: 2-element Vector{Float64}:
 0.9999634355313174
 0.9999315506115275

BlackBoxOptim.jl offers derivative-free global optimization solvers that requrie the bounds to be set via `lb` and `ub` in the `OptimizationProblem`. Let's use the `BBOadaptivederand1binradiuslimited()` solver:

In [3]:
using OptimizationBBO

prob = OptimizationProblem(rosenbrock, u0, p, lb = [-1.0, -1.0], ub = [1.0, 1.0])
sol = solve(prob, BBO_adaptive_de_rand_1_bin_radiuslimited())

retcode: MaxIters
u: 2-element Vector{Float64}:
 0.9999999999999996
 0.999999999999999

The solution from the original solver can always be obtained via `original`:

In [4]:
sol.original

BlackBoxOptim.OptimizationResults("adaptive_de_rand_1_bin_radiuslimited", "Max number of steps (10000) reached", 10001, 1.744326347439e9, 0.25999999046325684, BlackBoxOptim.ParamsDictChain[BlackBoxOptim.ParamsDictChain[Dict{Symbol, Any}(:RngSeed => 417099, :SearchRange => [(-1.0, 1.0), (-1.0, 1.0)], :TraceMode => :silent, :Method => :adaptive_de_rand_1_bin_radiuslimited, :MaxSteps => 10000),Dict{Symbol, Any}()],Dict{Symbol, Any}(:CallbackInterval => -1.0, :TargetFitness => nothing, :TraceMode => :compact, :FitnessScheme => BlackBoxOptim.ScalarFitnessScheme{true}(), :MinDeltaFitnessTolerance => 1.0e-50, :NumDimensions => :NotSpecified, :FitnessTolerance => 1.0e-8, :TraceInterval => 0.5, :MaxStepsWithoutProgress => 10000, :MaxSteps => 10000…)], 10129, BlackBoxOptim.ScalarFitnessScheme{true}(), BlackBoxOptim.TopListArchiveOutput{Float64, Vector{Float64}}(1.4298103907130839e-30, [0.9999999999999996, 0.999999999999999]), BlackBoxOptim.PopulationOptimizerOutput{BlackBoxOptim.FitPopulation{Fl

### Defining the objective function

Optimization.jl assumes that your objective function takes two arguments `objective(x, p)`

1. The optimization variables `x`.
2. Other parameters `p`, such as hyper parameters of the cost function. If you have no “other parameters”, you can safely disregard this argument. 

> Note: If your objective function is defined by someone else, you can create an anonymous function that just discards the extra parameters like this:

```julia
obj = (x, p) -> objective(x) # Pass this function into OptimizationFunction
```

### Controlling Gradient Calculations (Automatic Differentiation)

Notice that both of the above methods were derivative-free methods, and thus no gradients were required to do the optimization. However, often first order optimization (i.e., using gradients) is much more efficient. Defining gradients can be done in two ways. One way is to manually provide a gradient definition in the `OptimizationFunction` constructor. However, the more convenient way to obtain gradients is to provide an AD backend type.

For example, let's now use the OptimizationOptimJL `BFGS` method to solve the same problem. We will import the forward-mode automatic differentiation library (using ForwardDiff) and then specify in the `OptimizationFunction` to automatically construct the derivative functions using ForwardDiff.jl. This looks like:

In [7]:
using ForwardDiff

optf = OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u0, p)
sol = solve(prob, BFGS())

retcode: Success
u: 2-element Vector{Float64}:
 0.9999999999373614
 0.999999999868622

We can inspect the original to see the statistics on the number of steps required and gradients computed:

In [8]:
sol.original

 * Status: success

 * Candidate solution
    Final objective value:     7.645684e-21

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 3.48e-07 ≰ 0.0e+00
    |x - x'|/|x'|          = 3.48e-07 ≰ 0.0e+00
    |f(x) - f(x')|         = 6.91e-14 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 9.03e+06 ≰ 0.0e+00
    |g(x)|                 = 2.32e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    16
    f(x) calls:    53
    ∇f(x) calls:   53


Sure enough, it's a lot less than the derivative-free methods!

However, the compute cost of forward-mode automatic differentiation scales via the number of inputs, and thus as our optimization problem grows large it slows down. To counteract this, for larger optimization problems (>100 state variables) one normally would want to use reverse-mode automatic differentiation. One common choice for reverse-mode automatic differentiation is Zygote.jl. We can demonstrate this via:

In [9]:
using Zygote

optf = OptimizationFunction(rosenbrock, Optimization.AutoZygote())
prob = OptimizationProblem(optf, u0, p)
sol = solve(prob, BFGS())

retcode: Success
u: 2-element Vector{Float64}:
 0.9999999999373614
 0.999999999868622

### Setting Box Constraints

In many cases, one knows the potential bounds on the solution values. In Optimization.jl, these can be supplied as the `lb` and `ub` arguments for the lower bounds and upper bounds respectively, supplying a vector of values with one per state variable.

In [10]:
prob = OptimizationProblem(optf, u0, p, lb = [-1.0, -1.0], ub = [1.0, 1.0])
sol = solve(prob, BFGS())

retcode: Success
u: 2-element Vector{Float64}:
 0.9999999993561103
 0.9999999987161009

In [12]:
prob = OptimizationProblem(optf, u0, p, lb = [-1.0, -1.0], ub = [Inf, Inf])
sol = solve(prob, BFGS())

retcode: Success
u: 2-element Vector{Float64}:
 1.0000000007498584
 1.0000000015022157

In [17]:
x0 = [0.5 0.5 0.5
      0.5 0.5 0.5
      0.5 0.5 0.5]
A = [1. 1. 1.
     1. 1. 1.
     1. 1. 1.]
opt_fun(x, p) = sum(A*x)
p = 1
optf = OptimizationFunction(opt_fun, Optimization.AutoZygote())
prob = OptimizationProblem(optf, x0, p, lb = [-1.0 -1.0 -1.0; -1.0 -1.0 -1.0; -1.0 -1.0 -1.0], ub = [1.0 1.0 1.0; 1.0 1.0 1.0; 1.0 1.0 1.0])
sol = solve(prob, BFGS())

retcode: Success
u: 3×3 Matrix{Float64}:
 -1.0  -1.0  -1.0
 -1.0  -1.0  -1.0
 -1.0  -1.0  -1.0

## Equality and Inequality Constraints <a id="equality_and_inequality_constraints" />