/
search_index.js
3 lines (3 loc) · 183 KB
/
search_index.js
1
2
3
var documenterSearchIndex = {"docs":
[{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"EditURL = \"../rasch.jl\"","category":"page"},{"location":"examples/generated/rasch/#Conditional-Maximum-Likelihood-for-the-Rasch-Model","page":"Conditional maximum likelihood estimation","title":"Conditional Maximum Likelihood for the Rasch Model","text":"","category":"section"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"tip: Tip\nThis example is also available as a Jupyter notebook: raschipynb","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"using Optim, Random #hide","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"The Rasch model is used in psychometrics as a model for assessment data such as student responses to a standardized test. Let X_pi be the response accuracy of student p to item i where X_pi=1 if the item was answered correctly and X_pi=0 otherwise for p=1ldotsn and i=1ldotsm. The model for this accuracy is","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":" P(mathbfX_p=mathbfx_pxi_p mathbfepsilon) = prod_i=1^m dfrac(xi_p epsilon_j)^x_pi1 + xi_pepsilon_i","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"where xi_p 0 the latent ability of person p and epsilon_i 0 is the difficulty of item i.","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"We simulate data from this model:","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"Random.seed!(123)\nn = 1000\nm = 5\ntheta = randn(n)\ndelta = randn(m)\nr = zeros(n)\ns = zeros(m)\n\nfor i in 1:n\n p = exp.(theta[i] .- delta) ./ (1.0 .+ exp.(theta[i] .- delta))\n for j in 1:m\n if rand() < p[j] ##correct\n r[i] += 1\n s[j] += 1\n end\n end\nend\nf = [sum(r.==j) for j in 1:m];\nnothing #hide","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"Since the number of parameters increases with sample size standard maximum likelihood will not provide us consistent estimates. Instead we consider the conditional likelihood. It can be shown that the Rasch model is an exponential family model and that the sum score r_p = sum_i x_pi is the sufficient statistic for xi_p. If we condition on the sum score we should be able to eliminate xi_p. Indeed, with a bit of algebra we can show","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"P(mathbfX_p = mathbfx_p r_p mathbfepsilon) = dfracprod_i=1^m epsilon_i^xijgamma_r_i(mathbfepsilon)","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"where gamma_r(mathbfepsilon) is the elementary symmetric function of order r","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"gamma_r(mathbfepsilon) = sum_mathbfy mathbf1^intercal mathbfy = r prod_j=1^m epsilon_j^y_j","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"where the sum is over all possible answer configurations that give a sum score of r. Algorithms to efficiently compute gamma and its derivatives are available in the literature (see eg Baker (1996) for a review and Biscarri (2018) for a more modern approach)","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"function esf_sum!(S::AbstractArray{T,1}, x::AbstractArray{T,1}) where T <: Real\n n = length(x)\n fill!(S,zero(T))\n S[1] = one(T)\n @inbounds for col in 1:n\n for r in 1:col\n row = col - r + 1\n S[row+1] = S[row+1] + x[col] * S[row]\n end\n end\nend\n\nfunction esf_ext!(S::AbstractArray{T,1}, H::AbstractArray{T,3}, x::AbstractArray{T,1}) where T <: Real\n n = length(x)\n esf_sum!(S, x)\n H[:,:,1] .= zero(T)\n H[:,:,2] .= one(T)\n\n @inbounds for i in 3:n+1\n for j in 1:n\n H[j,j,i] = S[i-1] - x[j] * H[j,j,i-1]\n for k in j+1:n\n H[k,j,i] = S[i-1] - ((x[j]+x[k])*H[k,j,i-1] + x[j]*x[k]*H[k,j,i-2])\n H[j,k,i] = H[k,j,i]\n end\n end\n end\nend","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"The objective function we want to minimize is the negative log conditional likelihood","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"beginaligned\nlogL_C(mathbfepsilonmathbfr) = sum_p=1^n sum_i=1^m x_pi logepsilon_i - loggamma_r_p(mathbfepsilon)\n = sum_i=1^m s_i logepsilon_i - sum_r=1^m f_r loggamma_r(mathbfepsilon)\nendaligned","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"ϵ = ones(Float64, m)\nβ0 = zeros(Float64, m)\nlast_β = fill(NaN, m)\nS = zeros(Float64, m+1)\nH = zeros(Float64, m, m, m+1)\n\nfunction calculate_common!(x, last_x)\n if x != last_x\n copyto!(last_x, x)\n ϵ .= exp.(-x)\n esf_ext!(S, H, ϵ)\n end\nend\nfunction neglogLC(β)\n calculate_common!(β, last_β)\n return -s'log.(ϵ) + f'log.(S[2:end])\nend","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"Parameter estimation is usually performed with respect to the unconstrained parameter beta_i = -logepsilon_i. Taking the derivative with respect to beta_i (and applying the chain rule) one obtains","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":" dfracpartiallog L_C(mathbfepsilonmathbfr)partial beta_i = -s_i + epsilon_isum_r=1^m dfracf_r gamma_r-1^(j)gamma_r","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"where gamma_r-1^(i) = partial gamma_r(mathbfepsilon)partialepsilon_i.","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"function g!(storage, β)\n calculate_common!(β, last_β)\n for j in 1:m\n storage[j] = s[j]\n for l in 1:m\n storage[j] -= ϵ[j] * f[l] * (H[j,j,l+1] / S[l+1])\n end\n end\nend","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"Similarly the Hessian matrix can be computed","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":" dfracpartial^2 log L_C(mathbfepsilonmathbfr)partial beta_ipartialbeta_j = begincases displaystyle -epsilon_i sum_r=1^m dfracf_rgamma_r-1^(i)gamma_rleft(1 - dfracgamma_r-1^(i)gamma_rright) textif i=j\n displaystyle -epsilon_iepsilon_jsum_r=1^m dfracf_r gamma_r-2^(ij)gamma_r - dfracf_rgamma_r-1^(i)gamma_r-1^(j)gamma_r^2 textif ineq j\n endcases","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"where gamma_r-2^(ij) = partial^2 gamma_r(mathbfepsilon)partialepsilon_ipartialepsilon_j.","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"function h!(storage, β)\n calculate_common!(β, last_β)\n for j in 1:m\n for k in 1:m\n storage[k,j] = 0.0\n for l in 1:m\n if j == k\n storage[j,j] += f[l] * (ϵ[j]*H[j,j,l+1] / S[l+1]) *\n (1 - ϵ[j]*H[j,j,l+1] / S[l+1])\n elseif k > j\n storage[k,j] += ϵ[j] * ϵ[k] * f[l] *\n ((H[k,j,l] / S[l+1]) - (H[j,j,l+1] * H[k,k,l+1]) / S[l+1] ^ 2)\n else #k < j\n storage[k,j] += ϵ[j] * ϵ[k] * f[l] *\n ((H[j,k,l] / S[l+1]) - (H[j,j,l+1] * H[k,k,l+1]) / S[l+1] ^ 2)\n end\n end\n end\n end\nend","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"The estimates of the item parameters are then obtained via standard optimization algorithms (either Newton-Raphson or L-BFGS). One last issue is that the model is not identifiable (multiplying the xi_p by a constant and dividing the epsilon_i by the same constant results in the same likelihood). Therefore some kind of constraint must be imposed when estimating the parameters. Typically either epsilon_1 = 0 or prod_i=1^m epsilon_i = 1 (which is equivalent to sum_i=1^m beta_i = 0).","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"con_c!(c, x) = (c[1] = sum(x); c)\nfunction con_jacobian!(J, x)\n J[1,:] .= ones(length(x))\nend\nfunction con_h!(h, x, λ)\n for i in 1:size(h)[1]\n for j in 1:size(h)[2]\n h[i,j] += (i == j) ? λ[1] : 0.0\n end\n end\nend\nlx = Float64[]; ux = Float64[]\nlc = [0.0]; uc = [0.0]\ndf = TwiceDifferentiable(neglogLC, g!, h!, β0)\ndfc = TwiceDifferentiableConstraints(con_c!, con_jacobian!, con_h!, lx, ux, lc, uc)\nres = optimize(df, dfc, β0, IPNewton())","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"Compare the estimate to the truth","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"delta_hat = res.minimizer\n[delta delta_hat]","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"","category":"page"},{"location":"examples/generated/rasch/","page":"Conditional maximum likelihood estimation","title":"Conditional maximum likelihood estimation","text":"This page was generated using Literate.jl.","category":"page"},{"location":"algo/samin/#SAMIN","page":"Simulated Annealing w/ bounds","title":"SAMIN","text":"","category":"section"},{"location":"algo/samin/#Constructor","page":"Simulated Annealing w/ bounds","title":"Constructor","text":"","category":"section"},{"location":"algo/samin/","page":"Simulated Annealing w/ bounds","title":"Simulated Annealing w/ bounds","text":"SAMIN(; nt::Int = 5 # reduce temperature every nt*ns*dim(x_init) evaluations\n ns::Int = 5 # adjust bounds every ns*dim(x_init) evaluations\n rt::T = 0.9 # geometric temperature reduction factor: when temp changes, new temp is t=rt*t\n neps::Int = 5 # number of previous best values the final result is compared to\n f_tol::T = 1e-12 # the required tolerance level for function value comparisons\n x_tol::T = 1e-6 # the required tolerance level for x\n coverage_ok::Bool = false, # if false, increase temperature until initial parameter space is covered\n verbosity::Int = 0) # scalar: 0, 1, 2 or 3 (default = 0).","category":"page"},{"location":"algo/samin/#Description","page":"Simulated Annealing w/ bounds","title":"Description","text":"","category":"section"},{"location":"algo/samin/","page":"Simulated Annealing w/ bounds","title":"Simulated Annealing w/ bounds","text":"The SAMIN method implements the Simulated Annealing algorithm for problems with bounds constraints as described in Goffe et. al. (1994) and Goffe (1996). A key control parameter is rt, the geometric temperature reduction rate, which should be between zero and one. Setting rt lower will cause the algorithm to contract the search space more quickly, reducing the run time. Setting rt too low will cause the algorithm to narrow the search too quickly, and the true minimizer may be skipped over. If possible, run the algorithm multiple times to verify that the same solution is found each time. If this is not the case, increase rt. When in doubt, start with a conservative rt, for example, rt=0.95, and allow for a generous iteration limit. The algorithm requires lower and upper bounds on the parameters, although these bounds are often set rather wide, and are not necessarily meant to reflect constraints in the model, but rather bounds that enclose the parameter space. If the final xs are very close to the boundary (which can be checked by setting verbosity=1), it is a good idea to restart the optimizer with wider bounds, unless the bounds actually reflect hard constraints on x.","category":"page"},{"location":"algo/samin/#Example","page":"Simulated Annealing w/ bounds","title":"Example","text":"","category":"section"},{"location":"algo/samin/","page":"Simulated Annealing w/ bounds","title":"Simulated Annealing w/ bounds","text":"This example shows a successful minimization:","category":"page"},{"location":"algo/samin/","page":"Simulated Annealing w/ bounds","title":"Simulated Annealing w/ bounds","text":"julia> using Optim, OptimTestProblems\n\njulia> prob = UnconstrainedProblems.examples[\"Rosenbrock\"];\n\njulia> res = Optim.optimize(prob.f, fill(-100.0, 2), fill(100.0, 2), prob.initial_x, SAMIN(), Optim.Options(iterations=10^6))\n================================================================================\nSAMIN results\n==> Normal convergence <==\ntotal number of objective function evaluations: 23701\n\n Obj. value: 0.0000000000\n\n parameter search width\n 1.00000 0.00000\n 1.00000 0.00000\n================================================================================\n\nResults of Optimization Algorithm\n * Algorithm: SAMIN\n * Starting Point: [-1.2,1.0]\n * Minimizer: [0.9999999893140956,0.9999999765350857]\n * Minimum: 5.522977e-16\n * Iterations: 23701\n * Convergence: false\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = NaN\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = NaN |f(x)|\n * |g(x)| ≤ 0.0e+00: false\n |g(x)| = NaN\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 23701\n * Gradient Calls: 0","category":"page"},{"location":"algo/samin/#Example-2","page":"Simulated Annealing w/ bounds","title":"Example","text":"","category":"section"},{"location":"algo/samin/","page":"Simulated Annealing w/ bounds","title":"Simulated Annealing w/ bounds","text":"This example shows an unsuccessful minimization, because the cooling rate, rt=0.5, is too rapid:","category":"page"},{"location":"algo/samin/","page":"Simulated Annealing w/ bounds","title":"Simulated Annealing w/ bounds","text":"julia> using Optim, OptimTestProblems\n\njulia> prob = UnconstrainedProblems.examples[\"Rosenbrock\"];\njulia> res = Optim.optimize(prob.f, fill(-100.0, 2), fill(100.0, 2), prob.initial_x, SAMIN(rt=0.5), Optim.Options(iterations=10^6))\n================================================================================\nSAMIN results\n==> Normal convergence <==\ntotal number of objective function evaluations: 12051\n\n Obj. value: 0.0011613045\n\n parameter search width\n 0.96592 0.00000\n 0.93301 0.00000\n================================================================================\n\nResults of Optimization Algorithm\n * Algorithm: SAMIN\n * Starting Point: [-1.2,1.0]\n * Minimizer: [0.9659220825756248,0.9330054696322896]\n * Minimum: 1.161304e-03\n * Iterations: 12051\n * Convergence: false\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = NaN\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = NaN |f(x)|\n * |g(x)| ≤ 0.0e+00: false\n |g(x)| = NaN\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 12051\n * Gradient Calls: 0\n","category":"page"},{"location":"algo/samin/#References","page":"Simulated Annealing w/ bounds","title":"References","text":"","category":"section"},{"location":"algo/samin/","page":"Simulated Annealing w/ bounds","title":"Simulated Annealing w/ bounds","text":"Goffe, et. al. (1994) \"Global Optimization of Statistical Functions with Simulated Annealing\", Journal of Econometrics, V. 60, N. 1/2.\nGoffe, William L. (1996) \"SIMANN: A Global Optimization Algorithm using Simulated Annealing \" Studies in Nonlinear Dynamics & Econometrics, Oct96, Vol. 1 Issue 3.","category":"page"},{"location":"dev/#Using-Nelder-Mead","page":"-","title":"Using Nelder Mead","text":"","category":"section"},{"location":"algo/newton_trust_region/#Newton's-Method-With-a-Trust-Region","page":"Newton with Trust Region","title":"Newton's Method With a Trust Region","text":"","category":"section"},{"location":"algo/newton_trust_region/#Constructor","page":"Newton with Trust Region","title":"Constructor","text":"","category":"section"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"NewtonTrustRegion(; initial_delta = 1.0,\n delta_hat = 100.0,\n eta = 0.1,\n rho_lower = 0.25,\n rho_upper = 0.75)","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"The constructor takes keywords that determine the initial and maximal size of the trust region, when to grow and shrink the region, and how close the function should be to the quadratic approximation. The notation follows chapter four of Numerical Optimization. Below, rho =rho refers to the ratio of the actual function change to the change in the quadratic approximation for a given step.","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"initial_delta:The starting trust region radius\ndelta_hat: The largest allowable trust region radius\neta: When rho is at least eta, accept the step.\nrho_lower: When rho is less than rho_lower, shrink the trust region.\nrho_upper: When rho is greater than rho_upper, grow the trust region (though no greater than delta_hat).","category":"page"},{"location":"algo/newton_trust_region/#Description","page":"Newton with Trust Region","title":"Description","text":"","category":"section"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"Newton's method with a trust region is designed to take advantage of the second-order information in a function's Hessian, but with more stability than Newton's method when functions are not globally well-approximated by a quadratic. This is achieved by repeatedly minimizing quadratic approximations within a dynamically-sized \"trust region\" in which the function is assumed to be locally quadratic [1].","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"Newton's method optimizes a quadratic approximation to a function. When a function is well approximated by a quadratic (for example, near an optimum), Newton's method converges very quickly by exploiting the second-order information in the Hessian matrix. However, when the function is not well-approximated by a quadratic, either because the starting point is far from the optimum or the function has a more irregular shape, Newton steps can be erratically large, leading to distant, irrelevant areas of the space.","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"Trust region methods use second-order information but restrict the steps to be within a \"trust region\" where the function is believed to be approximately quadratic. At iteration k, a trust region method chooses a step p to minimize a quadratic approximation to the objective such that the step size is no larger than a given trust region size, Delta_k.","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"undersetpinmathbbR^nmin m_k(p) = f_k + g_k^T p + frac12p^T B_k p quadtextrmsuch that ple Delta_k","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"Here, p is the step to take at iteration k, so that x_k+1 = x_k + p. In the definition of m_k(p), f_k = f(x_k) is the value at the previous location, g_k=nabla f(x_k) is the gradient at the previous location, B_k = nabla^2 f(x_k) is the Hessian matrix at the previous iterate, and cdot is the Euclidian norm.","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"If the trust region size, Delta_k, is large enough that the minimizer of the quadratic approximation m_k(p) has p le Delta_k, then the step is the same as an ordinary Newton step. However, if the unconstrained quadratic minimizer lies outside the trust region, then the minimizer to the constrained problem will occur on the boundary, i.e. we will have p = Delta_k. It turns out that when the Cholesky decomposition of B_k can be computed, the optimal p can be found numerically with relative ease. ([1], section 4.3) This is the method currently used in Optim.","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"It makes sense to adapt the trust region size, Delta_k, as one moves through the space and assesses the quality of the quadratic fit. This adaptation is controlled by the parameters eta, rho_lower, and rho_upper, which are parameters to the NewtonTrustRegion optimization method. For each step, we calculate","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"rho_k = fracf(x_k+1) - f(x_k)m_k(p) - m_k(0)","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"Intuitively, rho_k measures the quality of the quadratic approximation: if rho_k approx 1, then our quadratic approximation is reasonable. If p was on the boundary and rho_k rho_upper, then perhaps we can benefit from larger steps. In this case, for the next iteration we grow the trust region geometrically up to a maximum of hatDelta:","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"rho_k rho_upper Rightarrow Delta_k+1 = min(2 Delta_k hatDelta)","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"Conversely, if rho_k rho_lower, then we shrink the trust region geometrically:","category":"page"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"rho_k rho_lower Rightarrow Delta_k+1 = 025 Delta_k. Finally, we only accept a point if its decrease is appreciable compared to the quadratic approximation. Specifically, a step is only accepted rho_k eta. As long as we choose eta to be less than rho_lower, we will shrink the trust region whenever we reject a step. Eventually, if the objective function is locally quadratic, Delta_k will become small enough that a quadratic approximation will be accurate enough to make progress again.","category":"page"},{"location":"algo/newton_trust_region/#Example","page":"Newton with Trust Region","title":"Example","text":"","category":"section"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"using Optim, OptimTestProblems\nprob = UnconstrainedProblems.examples[\"Rosenbrock\"];\nres = Optim.optimize(prob.f, prob.g!, prob.h!, prob.initial_x, NewtonTrustRegion())","category":"page"},{"location":"algo/newton_trust_region/#References","page":"Newton with Trust Region","title":"References","text":"","category":"section"},{"location":"algo/newton_trust_region/","page":"Newton with Trust Region","title":"Newton with Trust Region","text":"[1] Nocedal, Jorge, and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.","category":"page"},{"location":"algo/#Solvers","page":"Solvers","title":"Solvers","text":"","category":"section"},{"location":"algo/particle_swarm/#Particle-Swarm","page":"Particle Swarm","title":"Particle Swarm","text":"","category":"section"},{"location":"algo/particle_swarm/#Constructor","page":"Particle Swarm","title":"Constructor","text":"","category":"section"},{"location":"algo/particle_swarm/","page":"Particle Swarm","title":"Particle Swarm","text":"ParticleSwarm(; lower = [],\n upper = [],\n n_particles = 0)","category":"page"},{"location":"algo/particle_swarm/","page":"Particle Swarm","title":"Particle Swarm","text":"The constructor takes three keywords:","category":"page"},{"location":"algo/particle_swarm/","page":"Particle Swarm","title":"Particle Swarm","text":"lower = [], a vector of lower bounds, unbounded below if empty or Inf's\nupper = [], a vector of upper bounds, unbounded above if empty or Inf's\nn_particles = 0, number of particles in the swarm, defaults to least three","category":"page"},{"location":"algo/particle_swarm/#Description","page":"Particle Swarm","title":"Description","text":"","category":"section"},{"location":"algo/particle_swarm/","page":"Particle Swarm","title":"Particle Swarm","text":"The Particle Swarm implementation in Optim.jl is the so-called Adaptive Particle Swarm algorithm in [1]. It attempts to improve global coverage and convergence by switching between four evolutionary states: exploration, exploitation, convergence, and jumping out. In the jumping out state it intentially tries to take the best particle and move it away from its (potentially and probably) local optimum, to improve the ability to find a global optimum. Of course, this comes a the cost of slower convergence, but hopefully converges to the global optimum as a result.","category":"page"},{"location":"algo/particle_swarm/#References","page":"Particle Swarm","title":"References","text":"","category":"section"},{"location":"algo/particle_swarm/","page":"Particle Swarm","title":"Particle Swarm","text":"[1] Zhan, Zhang, and Chung. Adaptive particle swarm optimization, IEEE Transactions on Systems, Man, and Cybernetics, Part B: CyberneticsVolume 39, Issue 6, 2009, Pages 1362-1381 (2009)","category":"page"},{"location":"user/config/#Configurable-options","page":"Configurable Options","title":"Configurable options","text":"","category":"section"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"There are several options that simply take on some default values if the user doesn't supply anything else than a function (and gradient) and a starting point.","category":"page"},{"location":"user/config/#Solver-options","page":"Configurable Options","title":"Solver options","text":"","category":"section"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"There quite a few different solvers available in Optim, and they are all listed below. Notice that the constructors are written without input here, but they generally take keywords to tweak the way they work. See the pages describing each solver for more detail.","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Requires only a function handle:","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"NelderMead()\nSimulatedAnnealing()","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Requires a function and gradient (will be approximated if omitted):","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"BFGS()\nLBFGS()\nConjugateGradient()\nGradientDescent()\nMomentumGradientDescent()\nAcceleratedGradientDescent()","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Requires a function, a gradient, and a Hessian (cannot be omitted):","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Newton()\nNewtonTrustRegion()","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Box constrained minimization:","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Fminbox()","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Special methods for bounded univariate optimization:","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Brent()\nGoldenSection()","category":"page"},{"location":"user/config/#General-Options","page":"Configurable Options","title":"General Options","text":"","category":"section"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"In addition to the solver, you can alter the behavior of the Optim package by using the following keywords:","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"x_tol: Absolute tolerance in changes of the input vector x, in infinity norm. Defaults to 0.0.\nf_tol: Relative tolerance in changes of the objective value. Defaults to 0.0.\ng_tol: Absolute tolerance in the gradient, in infinity norm. Defaults to 1e-8. For gradient free methods, this will control the main convergence tolerance, which is solver specific.\nf_calls_limit: A soft upper limit on the number of objective calls. Defaults to 0 (unlimited).\ng_calls_limit: A soft upper limit on the number of gradient calls. Defaults to 0 (unlimited).\nh_calls_limit: A soft upper limit on the number of Hessian calls. Defaults to 0 (unlimited).\nallow_f_increases: Allow steps that increase the objective value. Defaults to false. Note that, when setting this to true, the last iterate will be returned as the minimizer even if the objective increased.\niterations: How many iterations will run before the algorithm gives up? Defaults to 1_000.\nstore_trace: Should a trace of the optimization algorithm's state be stored? Defaults to false.\nshow_trace: Should a trace of the optimization algorithm's state be shown on stdout? Defaults to false.\nextended_trace: Save additional information. Solver dependent. Defaults to false.\nshow_warnings: Should warnings due to NaNs or Inf be shown? Defaults to true.\ntrace_simplex: Include the full simplex in the trace for NelderMead. Defaults to false.\nshow_every: Trace output is printed every show_everyth iteration.\ncallback: A function to be called during tracing. A return value of true stops the optimize call. The callback function is called every show_everyth iteration. If store_trace is false, the argument to the callback is of the type OptimizationState, describing the state of the current iteration. If store_trace is true, the argument is a list of all the states from the first iteration to the current. \ntime_limit: A soft upper limit on the total run time. Defaults to NaN (unlimited).","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Box constrained optimization has additional keywords to alter the behavior of the outer solver:","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"outer_x_tol: Absolute tolerance in changes of the input vector x, in infinity norm. Defaults to 0.0.\nouter_f_tol: Relative tolerance in changes of the objective value. Defaults to 0.0.\nouter_g_tol: Absolute tolerance in the gradient, in infinity norm. Defaults to 1e-8. For gradient free methods, this will control the main convergence tolerance, which is solver specific.\nallow_outer_f_increases: Allow steps that increase the objective value. Defaults to false. Note that, when setting this to true, the last iterate will be returned as the minimizer even if the objective increased.\nouter_iterations: How many iterations will run before the algorithm gives up? Defaults to 1_000.","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"If you specify outer_iterations = 10 and iterations = 100, the outer algorithm will run for 10 iterations, and for each outer iteration the inner algorithm will run for 100 iterations.","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"We currently recommend the statically dispatched interface by using the Optim.Options constructor:","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"res = optimize(f, g!,\n [0.0, 0.0],\n GradientDescent(),\n Optim.Options(g_tol = 1e-12,\n iterations = 10,\n store_trace = true,\n show_trace = false,\n show_warnings = true))","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Another interface is also available, based directly on keywords:","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"res = optimize(f, g!,\n [0.0, 0.0],\n method = GradientDescent(),\n g_tol = 1e-12,\n iterations = 10,\n store_trace = true,\n show_trace = false,\n show_warnings = true)","category":"page"},{"location":"user/config/","page":"Configurable Options","title":"Configurable Options","text":"Notice the need to specify the method using a keyword if this syntax is used. This approach might be deprecated in the future, and as a result we recommend writing code that has to maintained using the Optim.Options approach.","category":"page"},{"location":"algo/cg/#Conjugate-Gradient-Descent","page":"Conjugate Gradient","title":"Conjugate Gradient Descent","text":"","category":"section"},{"location":"algo/cg/#Constructor","page":"Conjugate Gradient","title":"Constructor","text":"","category":"section"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"ConjugateGradient(; alphaguess = LineSearches.InitialHagerZhang(),\n linesearch = LineSearches.HagerZhang(),\n eta = 0.4,\n P = nothing,\n precondprep = (P, x) -> nothing)","category":"page"},{"location":"algo/cg/#Description","page":"Conjugate Gradient","title":"Description","text":"","category":"section"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"The ConjugateGradient method implements Hager and Zhang (2006) and elements from Hager and Zhang (2013). Notice, that the default linesearch is HagerZhang from LineSearches.jl. This line search is exactly the one proposed in Hager and Zhang (2006). The constant eta is used in determining the next step direction, and the default here deviates from the one used in the original paper (001). It needs to be a strictly positive number.","category":"page"},{"location":"algo/cg/#Example","page":"Conjugate Gradient","title":"Example","text":"","category":"section"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"Let's optimize the 2D Rosenbrock function. The function and gradient are given by","category":"page"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\nfunction g!(storage, x)\n storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]\n storage[2] = 200.0 * (x[2] - x[1]^2)\nend","category":"page"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"we can then try to optimize this function from x=[0.0, 0.0]","category":"page"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"julia> optimize(f, g!, zeros(2), ConjugateGradient())\nResults of Optimization Algorithm\n * Algorithm: Conjugate Gradient\n * Starting Point: [0.0,0.0]\n * Minimizer: [1.000000002262018,1.0000000045408348]\n * Minimum: 5.144946e-18\n * Iterations: 21\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 2.09e-10\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = 1.55e+00 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 3.36e-09\n * stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 54\n * Gradient Calls: 39","category":"page"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"We can compare this to the default first order solver in Optim.jl","category":"page"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":" julia> optimize(f, g!, zeros(2))\n\n Results of Optimization Algorithm\n * Algorithm: L-BFGS\n * Starting Point: [0.0,0.0]\n * Minimizer: [0.9999999999373614,0.999999999868622]\n * Minimum: 7.645684e-21\n * Iterations: 16\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 3.48e-07\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = 9.03e+06 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 2.32e-09\n * stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 53\n * Gradient Calls: 53","category":"page"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"We see that for this objective and starting point, ConjugateGradient() requires fewer gradient evaluations to reach convergence.","category":"page"},{"location":"algo/cg/#References","page":"Conjugate Gradient","title":"References","text":"","category":"section"},{"location":"algo/cg/","page":"Conjugate Gradient","title":"Conjugate Gradient","text":"W. W. Hager and H. Zhang (2006) Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Transactions on Mathematical Software 32: 113-137.\nW. W. Hager and H. Zhang (2013), The Limited Memory Conjugate Gradient Method. SIAM Journal on Optimization, 23, pp. 2150-2168.","category":"page"},{"location":"algo/brent/#Brent's-Method","page":"Brent's Method","title":"Brent's Method","text":"","category":"section"},{"location":"algo/brent/#Constructor","page":"Brent's Method","title":"Constructor","text":"","category":"section"},{"location":"algo/brent/#Description","page":"Brent's Method","title":"Description","text":"","category":"section"},{"location":"algo/brent/#Example","page":"Brent's Method","title":"Example","text":"","category":"section"},{"location":"algo/brent/#References","page":"Brent's Method","title":"References","text":"","category":"section"},{"location":"algo/brent/","page":"Brent's Method","title":"Brent's Method","text":"R. P. Brent (2002) Algorithms for Minimization Without Derivatives. Dover edition.","category":"page"},{"location":"algo/lbfgs/#(L-)BFGS","page":"(L-)BFGS","title":"(L-)BFGS","text":"","category":"section"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"This page contains information about BFGS and its limited memory version L-BFGS.","category":"page"},{"location":"algo/lbfgs/#Constructors","page":"(L-)BFGS","title":"Constructors","text":"","category":"section"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"BFGS(; alphaguess = LineSearches.InitialStatic(),\n linesearch = LineSearches.HagerZhang(),\n initial_invH = nothing,\n initial_stepnorm = nothing,\n manifold = Flat())","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"initial_invH has a default value of nothing. If the user has a specific initial matrix they want to supply, it should be supplied as a function of an array similar to the initial point x0.","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"If initial_stepnorm is set to a number z, the initial matrix will be the identity matrix scaled by z times the sup-norm of the gradient at the initial point x0.","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"LBFGS(; m = 10,\n alphaguess = LineSearches.InitialStatic(),\n linesearch = LineSearches.HagerZhang(),\n P = nothing,\n precondprep = (P, x) -> nothing,\n manifold = Flat(),\n scaleinvH0::Bool = true && (typeof(P) <: Nothing))","category":"page"},{"location":"algo/lbfgs/#Description","page":"(L-)BFGS","title":"Description","text":"","category":"section"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"This means that it takes steps according to","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"x_n+1 = x_n - P^-1nabla f(x_n)","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"where P is a positive definite matrix. If P is the Hessian, we get Newton's method. In (L-)BFGS, the matrix is an approximation to the Hessian built using differences in the gradient across iterations. As long as the initial matrix is positive definite it is possible to show that all the follow matrices will be as well. The starting matrix could simply be the identity matrix, such that the first step is identical to the Gradient Descent algorithm, or even the actual Hessian.","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"There are two versions of BFGS in the package: BFGS, and L-BFGS. The latter is different from the former because it doesn't use a complete history of the iterative procedure to construct P, but rather only the latest m steps. It doesn't actually build the Hessian approximation matrix either, but computes the direction directly. This makes more suitable for large scale problems, as the memory requirement to store the relevant vectors will grow quickly in large problems.","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"As with the other quasi-Newton solvers in this package, a scalar alpha is introduced as follows","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"x_n+1 = x_n - alpha P^-1nabla f(x_n)","category":"page"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"and is chosen by a linesearch algorithm such that each step gives sufficient descent.","category":"page"},{"location":"algo/lbfgs/#Example","page":"(L-)BFGS","title":"Example","text":"","category":"section"},{"location":"algo/lbfgs/#References","page":"(L-)BFGS","title":"References","text":"","category":"section"},{"location":"algo/lbfgs/","page":"(L-)BFGS","title":"(L-)BFGS","text":"Wright, Stephen, and Jorge Nocedal (2006) \"Numerical optimization.\" Springer","category":"page"},{"location":"algo/gradientdescent/#Gradient-Descent","page":"Gradient Descent","title":"Gradient Descent","text":"","category":"section"},{"location":"algo/gradientdescent/#Constructor","page":"Gradient Descent","title":"Constructor","text":"","category":"section"},{"location":"algo/gradientdescent/","page":"Gradient Descent","title":"Gradient Descent","text":"GradientDescent(; alphaguess = LineSearches.InitialPrevious(),\n linesearch = LineSearches.HagerZhang(),\n P = nothing,\n precondprep = (P, x) -> nothing)","category":"page"},{"location":"algo/gradientdescent/#Description","page":"Gradient Descent","title":"Description","text":"","category":"section"},{"location":"algo/gradientdescent/","page":"Gradient Descent","title":"Gradient Descent","text":"Gradient Descent a common name for a quasi-Newton solver. This means that it takes steps according to","category":"page"},{"location":"algo/gradientdescent/","page":"Gradient Descent","title":"Gradient Descent","text":"x_n+1 = x_n - P^-1nabla f(x_n)","category":"page"},{"location":"algo/gradientdescent/","page":"Gradient Descent","title":"Gradient Descent","text":"where P is a positive definite matrix. If P is the Hessian, we get Newton's method. In Gradient Descent, P is simply an appropriately dimensioned identity matrix, such that we go in the exact opposite direction of the gradient. This means that we do not use the curvature information from the Hessian, or an approximation of it. While it does seem quite logical to go in the opposite direction of the fastest increase in objective value, the procedure can be very slow if the problem is ill-conditioned. See the section on preconditioners for ways to remedy this when using Gradient Descent.","category":"page"},{"location":"algo/gradientdescent/","page":"Gradient Descent","title":"Gradient Descent","text":"As with the other quasi-Newton solvers in this package, a scalar alpha is introduced as follows","category":"page"},{"location":"algo/gradientdescent/","page":"Gradient Descent","title":"Gradient Descent","text":"x_n+1 = x_n - alpha P^-1nabla f(x_n)","category":"page"},{"location":"algo/gradientdescent/","page":"Gradient Descent","title":"Gradient Descent","text":"and is chosen by a linesearch algorithm such that each step gives sufficient descent.","category":"page"},{"location":"algo/gradientdescent/#Example","page":"Gradient Descent","title":"Example","text":"","category":"section"},{"location":"algo/gradientdescent/#References","page":"Gradient Descent","title":"References","text":"","category":"section"},{"location":"algo/complex/#Complex-optimization","page":"Complex optimization","title":"Complex optimization","text":"","category":"section"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"Optimization of functions defined on complex inputs (mathbbC^n to mathbbR) is supported by simply passing a complex x as input. The algorithms supported are all those which can naturally be extended to work with complex numbers: simulated annealing and all the first-order methods.","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"The gradient of a complex-to-real function is defined as the only vector g such that","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"f(x+h) = f(x) + mboxRe(g * h) + mathcalO(h^2)","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"This is sometimes written","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"g = fracdfd(z*) = fracdfd(mboxRe(z)) + i fracdfd(mboxIm(z))","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"The gradient of a mathbbC^n to mathbbR function is a mathbbC^n to mathbbC^n map. Even if it is differentiable when seen as a function of mathbbR^2n to mathbbR^2n, it might not be complex-differentiable. For instance, take f(z) = mboxRe(z)^2. Then g(z) = 2 mboxRe(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a mathbbC^n to mathbbR function is in general not well-defined as a n times n complex matrix (only as a 2n times 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.","category":"page"},{"location":"algo/complex/#Examples","page":"Complex optimization","title":"Examples","text":"","category":"section"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"We show how to minimize a quadratic plus quartic function with the LBFGS optimization algorithm.","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"using Random\nRandom.seed!(0) # Set the seed for reproducibility\n# μ is the strength of the quartic. μ = 0 is just a quadratic problem\nn = 4\nA = randn(n,n) + im*randn(n,n)\nA = A'A + I\nb = randn(n) + im*randn(n)\nμ = 1.0\n\nfcomplex(x) = real(dot(x,A*x)/2 - dot(b,x)) + μ*sum(abs.(x).^4)\ngcomplex(x) = A*x-b + 4μ*(abs.(x).^2).*x\ngcomplex!(stor,x) = copyto!(stor,gcomplex(x))\n\nx0 = randn(n)+im*randn(n)\n\nres = optimize(fcomplex, gcomplex!, x0, LBFGS())","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"The output of the optimization is","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"Results of Optimization Algorithm\n * Algorithm: L-BFGS\n * Starting Point: [0.48155603952425174 - 1.477880724921868im,-0.3219431528959694 - 0.18542418173298963im, ...]\n * Minimizer: [0.14163543901272568 - 0.034929496785515886im,-0.1208600058040362 - 0.6125620908171383im, ...]\n * Minimum: -1.568997e+00\n * Iterations: 16\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 3.28e-09\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = -4.25e-16 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 6.33e-11\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 48\n * Gradient Calls: 48","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"Similarly, with ConjugateGradient.","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"res = optimize(fcomplex, gcomplex!, x0, ConjugateGradient())","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"Results of Optimization Algorithm\n * Algorithm: Conjugate Gradient\n * Starting Point: [0.48155603952425174 - 1.477880724921868im,-0.3219431528959694 - 0.18542418173298963im, ...]\n * Minimizer: [0.1416354378490425 - 0.034929499492595516im,-0.12086000949769983 - 0.6125620892675705im, ...]\n * Minimum: -1.568997e+00\n * Iterations: 23\n * Convergence: false\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 8.54e-10\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = -4.25e-16 |f(x)|\n * |g(x)| ≤ 1.0e-08: false\n |g(x)| = 3.72e-08\n * Stopped by an increasing objective: true\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 51\n * Gradient Calls: 29","category":"page"},{"location":"algo/complex/#Differentation","page":"Complex optimization","title":"Differentation","text":"","category":"section"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"The finite difference methods used by Optim support real functions with complex inputs.","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"res = optimize(fcomplex, x0, LBFGS())","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"Results of Optimization Algorithm\n * Algorithm: L-BFGS\n * Starting Point: [0.48155603952425174 - 1.477880724921868im,-0.3219431528959694 - 0.18542418173298963im, ...]\n * Minimizer: [0.1416354390108624 - 0.034929496786122484im,-0.12086000580073922 - 0.6125620908025359im, ...]\n * Minimum: -1.568997e+00\n * Iterations: 16\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 3.28e-09\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true\n |f(x) - f(x')| = 0.00e+00 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 1.04e-10\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 48\n * Gradient Calls: 48","category":"page"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"Automatic differentiation support for complex inputs may come when Cassete.jl is ready.","category":"page"},{"location":"algo/complex/#References","page":"Complex optimization","title":"References","text":"","category":"section"},{"location":"algo/complex/","page":"Complex optimization","title":"Complex optimization","text":"Sorber, L., Barel, M. V., & Lathauwer, L. D. (2012). Unconstrained optimization of real functions in complex variables. SIAM Journal on Optimization, 22(3), 879-898.\nKreutz-Delgado, K. (2009). The complex gradient operator and the CR-calculus. arXiv preprint arXiv:0906.4835.","category":"page"},{"location":"algo/nelder_mead/#Nelder-Mead","page":"Nelder Mead","title":"Nelder-Mead","text":"","category":"section"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"Nelder-Mead is currently the standard algorithm when no derivatives are provided.","category":"page"},{"location":"algo/nelder_mead/#Constructor","page":"Nelder Mead","title":"Constructor","text":"","category":"section"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"NelderMead(; parameters = AdaptiveParameters(),\n initial_simplex = AffineSimplexer())","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"The keywords in the constructor are used to control the following parts of the solver:","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"parameters is a an instance of either AdaptiveParameters or FixedParameters, and is","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"used to generate parameters for the Nelder-Mead Algorithm.","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"initial_simplex is an instance of AffineSimplexer. See more","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"details below.","category":"page"},{"location":"algo/nelder_mead/#Description","page":"Nelder Mead","title":"Description","text":"","category":"section"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"Our current implementation of the Nelder-Mead algorithm is based on Nelder and Mead (1965) and Gao and Han (2010). Gradient free methods can be a bit sensitive to starting values and tuning parameters, so it is a good idea to be careful with the defaults provided in Optim.","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"Instead of using gradient information, Nelder-Mead is a direct search method. It keeps track of the function value at a number of points in the search space. Together, the points form a simplex. Given a simplex, we can perform one of four actions: reflect, expand, contract, or shrink. Basically, the goal is to iteratively replace the worst point with a better point. More information can be found in Nelder and Mead (1965), Lagarias, et al (1998) or Gao and Han (2010).","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"The stopping rule is the same as in the original paper, and is the standard error of the function values at the vertices. To set the tolerance level for this convergence criterion, set the g_tol level as described in the Configurable Options section.","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"When the solver finishes, we return a minimizer which is either the centroid or one of the vertices. The function value at the centroid adds a function evaluation, as we need to evaluate the objection at the centroid to choose the smallest function value. However, even if the function value at the centroid can be returned as the minimum, we do not trace it during the optimization iterations. This is to avoid too many evaluations of the objective function which can be computationally expensive. Typically, there should be no more than twice as many f_calls than iterations. Adding an evaluation at the centroid when tracing could considerably increase the total run-time of the algorithm.","category":"page"},{"location":"algo/nelder_mead/#Specifying-the-initial-simplex","page":"Nelder Mead","title":"Specifying the initial simplex","text":"","category":"section"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"The default choice of initial_simplex is AffineSimplexer(). A simplex is represented by an (n+1)-dimensional vector of n-dimensional vectors. It is used together with the initial x to create the initial simplex. To construct the ith vertex, it simply multiplies entry i in the initial vector with a constant b, and adds a constant a. This means that the ith of the n additional vertices is of the form","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"(x_0^1 x_0^2 ldots x_0^i ldots 00) + (0 0 ldots x_0^icdot b+aldots 00)","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"If an x_0^i is zero, we need the a to make sure all vertices are unique. Generally, it is advised to start with a relatively large simplex.","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"If a specific simplex is wanted, it is possible to construct the (n+1)-vector of n-dimensional vectors, and pass it to the solver using a new type definition and a new method for the function simplexer. For example, let us minimize the two-dimensional Rosenbrock function, and choose three vertices that have elements that are simply standard uniform draws.","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"using Optim\nstruct MySimplexer <: Optim.Simplexer end\nOptim.simplexer(S::MySimplexer, initial_x) = [rand(length(initial_x)) for i = 1:length(initial_x)+1]\nf(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\noptimize(f, [.0, .0], NelderMead(initial_simplex = MySimplexer()))","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"Say we want to implement the initial simplex as in Matlab's fminsearch. This is very close to the AffineSimplexer above, but with a small twist. Instead of always adding the a, a constant is only added to entries that are zero. If the entry is non-zero, five percent of the level is added. This might be implemented (by the user) as","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"struct MatlabSimplexer{T} <: Optim.Simplexer\n a::T\n b::T\nend\nMatlabSimplexer(;a = 0.00025, b = 0.05) = MatlabSimplexer(a, b)\n\nfunction Optim.simplexer(S::MatlabSimplexer, initial_x::AbstractArray{T, N}) where {T, N}\n n = length(initial_x)\n initial_simplex = Array{T, N}[copy(initial_x) for i = 1:n+1]\n for j = 1:n\n initial_simplex[j+1][j] += initial_simplex[j+1][j] != zero(T) ? S.b * initial_simplex[j+1][j] : S.a\n end\n initial_simplex\nend","category":"page"},{"location":"algo/nelder_mead/#The-parameters-of-Nelder-Mead","page":"Nelder Mead","title":"The parameters of Nelder-Mead","text":"","category":"section"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"The different types of steps in the algorithm are governed by four parameters: alpha for the reflection, beta for the expansion, gamma for the contraction, and delta for the shrink step. We default to the adaptive parameters scheme in Gao and Han (2010). These are based on the dimensionality of the problem, and are given by","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"alpha = 1 quad beta = 1+2nquad gamma =075 - 12nquad delta = 1-1n","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"It is also possible to specify the original parameters from Nelder and Mead (1965)","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"alpha = 1quad beta = 2 quadgamma = 12 quaddelta = 12","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"by specifying parameters = Optim.FixedParameters(). For specifying custom values, parameters = Optim.FixedParameters(α = a, β = b, γ = g, δ = d) is used, where a, b, g, d are the chosen values. If another parameter specification is wanted, it is possible to create a custom sub-type ofOptim.NMParameters, and add a method to the parameters function. It should take the new type as the first positional argument, and the dimensionality of x as the second positional argument, and return a 4-tuple of parameters. However, it will often be easier to simply supply the wanted parameters to FixedParameters.","category":"page"},{"location":"algo/nelder_mead/#References","page":"Nelder Mead","title":"References","text":"","category":"section"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"Nelder, John A. and R. Mead (1965). \"A simplex method for function minimization\". Computer Journal 7: 308–313. doi:10.1093/comjnl/7.4.308.","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"Lagarias, Jeffrey C., et al. \"Convergence properties of the Nelder–Mead simplex method in low dimensions.\" SIAM Journal on optimization 9.1 (1998): 112-147.","category":"page"},{"location":"algo/nelder_mead/","page":"Nelder Mead","title":"Nelder Mead","text":"Gao, Fuchang and Lixing Han (2010). \"Implementing the Nelder-Mead simplex algorithm with adaptive parameters\". Computational Optimization and Applications [DOI 10.1007/s10589-010-9329-3]","category":"page"},{"location":"dev/contributing/#Notes-for-contributing","page":"Contributing","title":"Notes for contributing","text":"","category":"section"},{"location":"dev/contributing/","page":"Contributing","title":"Contributing","text":"We are always happy to get help from people who normally do not contribute to the package. However, to make the process run smoothly, we ask you to read this page before creating your pull request. That way it is more probable that your changes will be incorporated, and in the end it will mean less work for everyone.","category":"page"},{"location":"dev/contributing/#Things-to-consider","page":"Contributing","title":"Things to consider","text":"","category":"section"},{"location":"dev/contributing/","page":"Contributing","title":"Contributing","text":"When proposing a change to Optim.jl, there are a few things to consider. If you're in doubt feel free to reach out. A simple way to get in touch, is to join our gitter channel.","category":"page"},{"location":"dev/contributing/","page":"Contributing","title":"Contributing","text":"Before submitting a pull request, please consider the following bullets:","category":"page"},{"location":"dev/contributing/","page":"Contributing","title":"Contributing","text":"Did you remember to provide tests for your changes? If not, please do so, or ask for help.\nDid your change add new functionality? Remember to add a section in the documentation.\nDid you change existing code in a breaking way? Then remember to use Julia's deprecation tools to help users migrate to the new syntax.\nAdd a note in the NEWS.md file, so we can keep track of changes between versions.","category":"page"},{"location":"dev/contributing/#Adding-a-solver","page":"Contributing","title":"Adding a solver","text":"","category":"section"},{"location":"dev/contributing/","page":"Contributing","title":"Contributing","text":"If you're contributing a new solver, you shouldn't need to touch any of the code in src/optimize.jl. You should rather add a file named (solver is the name of the solver) solver.jl in src, and make sure that you define an Optimizer subtype struct Solver <: Optimizer end with appropriate fields, a default constructor with a keyword for each field, a state type that holds all variables that are (re)used throughout the iterative procedure, an initial_state that initializes such a state, and an update! method that does the actual work. Say you want to contribute a solver called Minim, then your src/minim.jl file would look something like","category":"page"},{"location":"dev/contributing/","page":"Contributing","title":"Contributing","text":"struct Minim{IF, F<:Function, T} <: Optimizer\n alphaguess!::IF\n linesearch!::F\n minim_parameter::T\nend\n\nMinim(; alphaguess = LineSearches.InitialStatic(), linesearch = LineSearches.HagerZhang(), minim_parameter = 1.0) =\n Minim(linesearch, minim_parameter)\n\ntype MinimState{T,N,G}\n x::AbstractArray{T,N}\n x_previous::AbstractArray{T,N}\n f_x_previous::T\n s::AbstractArray{T,N}\n @add_linesearch_fields()\nend\n\nfunction initial_state(method::Minim, options, d, initial_x)\n# prepare cache variables etc here\n\nend\n\nfunction update!{T}(d, state::MinimState{T}, method::Minim)\n # code for Minim here\n false # should the procedure force quit?\nend","category":"page"},{"location":"user/minimization/#Unconstrained-Optimization","page":"Minimizing a function","title":"Unconstrained Optimization","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"To show how the Optim package can be used, we minimize the Rosenbrock function, a classical test problem for numerical optimization. We'll assume that you've already installed the Optim package using Julia's package manager. First, we load Optim and define the Rosenbrock function:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"using Optim\nf(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Once we've defined this function, we can find the minimizer (the input that minimizes the objective) and the minimum (the value of the objective at the minimizer) using any of our favorite optimization algorithms. With a function defined, we just specify an initial point x and call optimize with a starting point x0:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"x0 = [0.0, 0.0]\noptimize(f, x0)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Note: it is important to pass initial_x as an array. If your problem is one-dimensional, you have to wrap it in an array. An easy way to do so is to write optimize(x->f(first(x)), [initial_x]) which make sure the input is an array, but the anonymous function automatically passes the first (and only) element onto your given f.","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Optim will default to using the Nelder-Mead method in the multivariate case, as we did not provide a gradient. This can also be explicitly specified using:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, x0, NelderMead())","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Other solvers are available. Below, we use L-BFGS, a quasi-Newton method that requires a gradient. If we pass f alone, Optim will construct an approximate gradient for us using central finite differencing:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, x0, LBFGS())","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"For better performance and greater precision, you can pass your own gradient function. If your objective is written in all Julia code with no special calls to external (that is non-Julia) libraries, you can also use automatic differentiation, by using the autodiff keyword and setting it to :forward:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, x0, LBFGS(); autodiff = :forward)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"For the Rosenbrock example, the analytical gradient can be shown to be:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"function g!(G, x)\n G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]\n G[2] = 200.0 * (x[2] - x[1]^2)\nend","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Note, that the functions we're using to calculate the gradient (and later the Hessian h!) of the Rosenbrock function mutate a fixed-sized storage array, which is passed as an additional argument called G (or H for the Hessian) in these examples. By mutating a single array over many iterations, this style of function definition removes the sometimes considerable costs associated with allocating a new array during each call to the g! or h! functions. If you prefer to have your gradients simply accept an x, you can still use optimize by setting the inplace keyword to false:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, g, x0; inplace = false)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"where g is a function of x only.","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Returning to our in-place version, you simply pass g! together with f from before to use the gradient:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, g!, x0, LBFGS())","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"For some methods, like simulated annealing, the gradient will be ignored:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, g!, x0, SimulatedAnnealing())","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"In addition to providing gradients, you can provide a Hessian function h! as well. In our current case this is:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"function h!(H, x)\n H[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2\n H[1, 2] = -400.0 * x[1]\n H[2, 1] = -400.0 * x[1]\n H[2, 2] = 200.0\nend","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Now we can use Newton's method for optimization by running:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, g!, h!, x0)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Which defaults to Newton() since a Hessian function was provided. Like gradients, the Hessian function will be ignored if you use a method that does not require it:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, g!, h!, x0, LBFGS())","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Note that Optim will not generate approximate Hessians using finite differencing because of the potentially low accuracy of approximations to the Hessians. Other than Newton's method, none of the algorithms provided by the Optim package employ exact Hessians.","category":"page"},{"location":"user/minimization/#Box-Constrained-Optimization","page":"Minimizing a function","title":"Box Constrained Optimization","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"A primal interior-point algorithm for simple \"box\" constraints (lower and upper bounds) is available. Reusing our Rosenbrock example from above, boxed minimization is performed as follows:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"lower = [1.25, -2.1]\nupper = [Inf, Inf]\ninitial_x = [2.0, 2.0]\ninner_optimizer = GradientDescent()\nresults = optimize(f, g!, lower, upper, initial_x, Fminbox(inner_optimizer))","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"This performs optimization with a barrier penalty, successively scaling down the barrier coefficient and using the chosen inner_optimizer (GradientDescent() above) for convergence at each step. To change algorithm specific options, such as the line search algorithm, specify it directly in the inner_optimizer constructor:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"lower = [1.25, -2.1]\nupper = [Inf, Inf]\ninitial_x = [2.0, 2.0]\n# requires using LineSearches\ninner_optimizer = GradientDescent(linesearch=LineSearches.BackTracking(order=3))\nresults = optimize(f, g!, lower, upper, initial_x, Fminbox(inner_optimizer))","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"This algorithm uses diagonal preconditioning to improve the accuracy, and hence is a good example of how to use ConjugateGradient or LBFGS with preconditioning. Other methods will currently not use preconditioning. Only the box constraints are used. If you can analytically compute the diagonal of the Hessian of your objective function, you may want to consider writing your own preconditioner.","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"There are two iterations parameters: an outer iterations parameter used to control Fminbox and an inner iterations parameter used to control the inner optimizer. For example, the following restricts the optimization to 2 major iterations","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"results = optimize(f, g!, lower, upper, initial_x, Fminbox(GradientDescent()), Optim.Options(outer_iterations = 2))","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"In contrast, the following sets the maximum number of iterations for each GradientDescent() optimization to 2","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"results = optimize(f, g!, lower, upper, initial_x, Fminbox(GradientDescent()), Optim.Options(iterations = 2))","category":"page"},{"location":"user/minimization/#Using-second-order-information","page":"Minimizing a function","title":"Using second order information","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"When the Hessian of the objective function is available it is possible to use the primal-dual algorithm implemented in IPNewton. The interface is similar","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"results = optimize(f, lower, upper, initial_x, IPNewton())\nresults = optimize(f, g!, lower, upper, initial_x, IPNewton())\nresults = optimize(f, g!, h!, lower, upper, initial_x, IPNewton())","category":"page"},{"location":"user/minimization/#Minimizing-a-univariate-function-on-a-bounded-interval","page":"Minimizing a function","title":"Minimizing a univariate function on a bounded interval","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Minimization of univariate functions without derivatives is available through the optimize interface:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"optimize(f, lower, upper, method; kwargs...)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Notice the lack of initial x. A specific example is the following quadratic function.","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"julia> f_univariate(x) = 2x^2+3x+1\nf_univariate (generic function with 1 method)\n\njulia> optimize(f_univariate, -2.0, 1.0)\nResults of Optimization Algorithm\n * Algorithm: Brent's Method\n * Search Interval: [-2.000000, 1.000000]\n * Minimizer: -7.500000e-01\n * Minimum: -1.250000e-01\n * Iterations: 7\n * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true\n * Objective Function Calls: 8","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"The output shows that we provided an initial lower and upper bound, that there is a final minimizer and minimum, and that it used seven major iterations. Importantly, we also see that convergence was declared. The default method is Brent's method, which is one out of two available methods:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Brent's method, the default (can be explicitly selected with Brent()).\nGolden section search, available with GoldenSection().","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"If we want to manually specify this method, we use the usual syntax as for multivariate optimization.","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":" optimize(f, lower, upper, Brent(); kwargs...)\n optimize(f, lower, upper, GoldenSection(); kwargs...)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Keywords are used to set options for this special type of optimization. In addition to the iterations, store_trace, show_trace, show_warnings, and extended_trace options, the following options are also available:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"rel_tol: The relative tolerance used for determining convergence. Defaults to sqrt(eps(T)).\nabs_tol: The absolute tolerance used for determining convergence. Defaults to eps(T).","category":"page"},{"location":"user/minimization/#Obtaining-results","page":"Minimizing a function","title":"Obtaining results","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"After we have our results in res, we can use the API for getting optimization results. This consists of a collection of functions. They are not exported, so they have to be prefixed by Optim.. Say we do the following optimization:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"res = optimize(x->dot(x,[1 0. 0; 0 3 0; 0 0 1]*x), zeros(3))","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"If we can't remember what method we used, we simply use","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"summary(res)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"which will return \"Nelder Mead\". A bit more useful information is the minimizer and minimum of the objective functions, which can be found using","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"julia> Optim.minimizer(res)\n3-element Array{Float64,1}:\n -0.499921\n -0.3333\n -1.49994\n\njulia> Optim.minimum(res)\n -2.8333333205768865","category":"page"},{"location":"user/minimization/#Complete-list-of-functions","page":"Minimizing a function","title":"Complete list of functions","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"A complete list of functions can be found below.","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Defined for all methods:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"summary(res)\nminimizer(res)\nminimum(res)\niterations(res)\niteration_limit_reached(res)\ntrace(res)\nx_trace(res)\nf_trace(res)\nf_calls(res)\nconverged(res)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Defined for univariate optimization:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"lower_bound(res)\nupper_bound(res)\nx_lower_trace(res)\nx_upper_trace(res)\nrel_tol(res)\nabs_tol(res)","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Defined for multivariate optimization:","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"g_norm_trace(res)\ng_calls(res)\nx_converged(res)\nf_converged(res)\ng_converged(res)\ninitial_state(res)","category":"page"},{"location":"user/minimization/#Input-types","page":"Minimizing a function","title":"Input types","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Most users will input Vector's as their initial_x's, and get an Optim.minimizer(res) out that is also a vector. For zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays. The only restriction imposed by leaving the Vector case is, that it is no longer possible to use finite difference approximations or automatic differentiation. Second order methods (variants of Newton's method) do not support this more general input type.","category":"page"},{"location":"user/minimization/#Notes-on-convergence-flags-and-checks","page":"Minimizing a function","title":"Notes on convergence flags and checks","text":"","category":"section"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"Currently, it is possible to access a minimizer using Optim.minimizer(result) even if all convergence flags are false. This means that the user has to be a bit careful when using the output from the solvers. It is advised to include checks for convergence if the minimizer or minimum is used to carry out further calculations.","category":"page"},{"location":"user/minimization/","page":"Minimizing a function","title":"Minimizing a function","text":"A related note is that first and second order methods makes a convergence check on the gradient before entering the optimization loop. This is done to prevent line search errors if initial_x is a stationary point. Notice, that this is only a first order check. If initial_x is any type of stationary point, g_converged will be true. This includes local minima, saddle points, and local maxima. If iterations is 0 and g_converged is true, the user needs to keep this point in mind.","category":"page"},{"location":"algo/ngmres/#Acceleration-methods:-N-GMRES-and-O-ACCEL","page":"Acceleration","title":"Acceleration methods: N-GMRES and O-ACCEL","text":"","category":"section"},{"location":"algo/ngmres/#Constructors","page":"Acceleration","title":"Constructors","text":"","category":"section"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"NGMRES(;\n alphaguess = LineSearches.InitialStatic(),\n linesearch = LineSearches.HagerZhang(),\n manifold = Flat(),\n wmax::Int = 10,\n ϵ0 = 1e-12,\n nlprecon = GradientDescent(\n alphaguess = LineSearches.InitialStatic(alpha=1e-4,scaled=true),\n linesearch = LineSearches.Static(),\n manifold = manifold),\n nlpreconopts = Options(iterations = 1, allow_f_increases = true),\n )","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"OACCEL(;manifold::Manifold = Flat(),\n alphaguess = LineSearches.InitialStatic(),\n linesearch = LineSearches.HagerZhang(),\n nlprecon = GradientDescent(\n alphaguess = LineSearches.InitialStatic(alpha=1e-4,scaled=true),\n linesearch = LineSearches.Static(),\n manifold = manifold),\n nlpreconopts = Options(iterations = 1, allow_f_increases = true),\n ϵ0 = 1e-12,\n wmax::Int = 10)","category":"page"},{"location":"algo/ngmres/#Description","page":"Acceleration","title":"Description","text":"","category":"section"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"These algorithms take a step given by the nonlinear preconditioner nlprecon and proposes an accelerated step on a subspace spanned by the previous wmax iterates.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"N-GMRES accelerates based on a minimization of an approximation to the ell_2 norm of the","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"gradient.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"O-ACCEL accelerates based on a minimization of a n approximation to the objective.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"N-GMRES was originally developed for solving nonlinear systems [1], and reduces to GMRES for linear problems. Application of the algorithm to optimization is covered, for example, in [2]. A description of O-ACCEL and its connection to N-GMRES can be found in [3].","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"We recommend trying LBFGS on your problem before N-GMRES or O-ACCEL. All three algorithms have similar computational cost and memory requirements, however, L-BFGS is more efficient for many problems.","category":"page"},{"location":"algo/ngmres/#Example","page":"Acceleration","title":"Example","text":"","category":"section"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"This example shows how to accelerate GradientDescent on the Extended Rosenbrock problem. First, we try to optimize using GradientDescent.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"using Optim, OptimTestProblems\nUP = UnconstrainedProblems\nprob = UP.examples[\"Extended Rosenbrock\"]\noptimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, GradientDescent())","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"The algorithm does not converge within 1000 iterations.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"Results of Optimization Algorithm\n * Algorithm: Gradient Descent\n * Starting Point: [-1.2,1.0, ...]\n * Minimizer: [0.8923389282461412,0.7961268644300445, ...]\n * Minimum: 2.898230e-01\n * Iterations: 1000\n * Convergence: false\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 4.02e-04\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = 2.38e-03 |f(x)|\n * |g(x)| ≤ 1.0e-08: false\n |g(x)| = 8.23e-02\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: true\n * Objective Calls: 2525\n * Gradient Calls: 2525","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"Now, we use OACCEL to accelerate GradientDescent.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"# Default nonlinear procenditioner for `OACCEL`\nnlprecon = GradientDescent(alphaguess=LineSearches.InitialStatic(alpha=1e-4,scaled=true),\n linesearch=LineSearches.Static())\n# Default size of subspace that OACCEL accelerates over is `wmax = 10`\noacc10 = OACCEL(nlprecon=nlprecon, wmax=10)\noptimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, oacc10)","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"This drastically improves the GradientDescent algorithm, converging in 87 iterations.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"Results of Optimization Algorithm\n * Algorithm: O-ACCEL preconditioned with Gradient Descent\n * Starting Point: [-1.2,1.0, ...]\n * Minimizer: [1.0000000011361219,1.0000000022828495, ...]\n * Minimum: 3.255053e-17\n * Iterations: 87\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 6.51e-08\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = 7.56e+02 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 1.06e-09\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 285\n * Gradient Calls: 285","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"We can improve the acceleration further by changing the acceleration subspace size wmax.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"oacc5 = OACCEL(nlprecon=nlprecon, wmax=5)\noptimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, oacc5)","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"Now, the O-ACCEL algorithm has accelerated GradientDescent to converge in 50 iterations.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"Results of Optimization Algorithm\n * Algorithm: O-ACCEL preconditioned with Gradient Descent\n * Starting Point: [-1.2,1.0, ...]\n * Minimizer: [0.9999999999392858,0.9999999998784691, ...]\n * Minimum: 9.218164e-20\n * Iterations: 50\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 2.76e-07\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = 5.18e+06 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 4.02e-11\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 181\n * Gradient Calls: 181","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"As a final comparison, we can do the same with N-GMRES.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"ngmres5 = NGMRES(nlprecon=nlprecon, wmax=5)\noptimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, ngmres5)","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"Again, this significantly improves the GradientDescent algorithm, and converges in 63 iterations.","category":"page"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"Results of Optimization Algorithm\n * Algorithm: Nonlinear GMRES preconditioned with Gradient Descent\n * Starting Point: [-1.2,1.0, ...]\n * Minimizer: [0.9999999998534468,0.9999999997063993, ...]\n * Minimum: 5.375569e-19\n * Iterations: 63\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 9.94e-09\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = 1.29e+03 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 4.94e-11\n * Stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 222\n * Gradient Calls: 222","category":"page"},{"location":"algo/ngmres/#References","page":"Acceleration","title":"References","text":"","category":"section"},{"location":"algo/ngmres/","page":"Acceleration","title":"Acceleration","text":"[1] De Sterck. Steepest descent preconditioning for nonlinear GMRES optimization. NLAA, 2013. [2] Washio and Oosterlee. Krylov subspace acceleration for nonlinear multigrid schemes. ETNA, 1997. [3] Riseth. Objective acceleration for unconstrained optimization. 2018.","category":"page"},{"location":"algo/manifolds/#Manifold-optimization","page":"Manifolds","title":"Manifold optimization","text":"","category":"section"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"Optim.jl supports the minimization of functions defined on Riemannian manifolds, i.e. with simple constraints such as normalization and orthogonality. The basic idea of such algorithms is to project back (\"retract\") each iterate of an unconstrained minimization method onto the manifold. This is used by passing a manifold keyword argument to the optimizer.","category":"page"},{"location":"algo/manifolds/#Howto","page":"Manifolds","title":"Howto","text":"","category":"section"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"Here is a simple test case where we minimize the Rayleigh quotient <x, A x> of a symmetric matrix A under the constraint ||x|| = 1, finding an eigenvector associated with the lowest eigenvalue of A.","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"n = 10\nA = Diagonal(range(1, stop=2, length=n))\nf(x) = dot(x,A*x)/2\ng(x) = A*x\ng!(stor,x) = copyto!(stor,g(x))\nx0 = randn(n)\n\nmanif = Optim.Sphere()\nOptim.optimize(f, g!, x0, Optim.ConjugateGradient(manifold=manif))","category":"page"},{"location":"algo/manifolds/#Supported-solvers-and-manifolds","page":"Manifolds","title":"Supported solvers and manifolds","text":"","category":"section"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"All first-order optimization methods are supported.","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"The following manifolds are currently supported:","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"Flat: Euclidean space, default. Standard unconstrained optimization.\nSphere: spherical constraint ||x|| = 1, where x is a real or complex array of any dimension.\nStiefel: Stiefel manifold of N by n matrices with orthogonal columns, i.e. X'*X = I","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"The following meta-manifolds construct manifolds out of pre-existing ones:","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"PowerManifold: identical copies of a specified manifold\nProductManifold: product of two (potentially different) manifolds","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"See test/multivariate/manifolds.jl for usage examples.","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"Implementing new manifolds is as simple as adding methods project_tangent!(M::YourManifold,g,x) and retract!(M::YourManifold,x). If you implement another manifold or optimization method, please contribute a PR!","category":"page"},{"location":"algo/manifolds/#References","page":"Manifolds","title":"References","text":"","category":"section"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"The Geometry of Algorithms with Orthogonality Constraints, Alan Edelman, Tomás A. Arias, Steven T. Smith, SIAM. J. Matrix Anal. & Appl., 20(2), 303–353","category":"page"},{"location":"algo/manifolds/","page":"Manifolds","title":"Manifolds","text":"Optimization Algorithms on Matrix Manifolds, P.-A. Absil, R. Mahony, R. Sepulchre, Princeton University Press, 2008","category":"page"},{"location":"algo/newton/#Newton's-Method","page":"Newton","title":"Newton's Method","text":"","category":"section"},{"location":"algo/newton/#Constructor","page":"Newton","title":"Constructor","text":"","category":"section"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"Newton(; alphaguess = LineSearches.InitialStatic(),\n linesearch = LineSearches.HagerZhang())","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"The constructor takes two keywords:","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"linesearch = a(d, x, p, x_new, g_new, phi0, dphi0, c), a function performing line search, see the line search section.\nalphaguess = a(state, dphi0, d), a function for setting the initial guess for the line search algorithm, see the line search section.","category":"page"},{"location":"algo/newton/#Description","page":"Newton","title":"Description","text":"","category":"section"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"Newton's method for optimization has a long history, and is in some sense the gold standard in unconstrained optimization of smooth functions, at least from a theoretical viewpoint. The main benefit is that it has a quadratic rate of convergence near a local optimum. The main disadvantage is that the user has to provide a Hessian. This can be difficult, complicated, or simply annoying. It can also be computationally expensive to calculate it.","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"Newton's method for optimization consists of applying Newton's method for solving systems of equations, where the equations are the first order conditions, saying that the gradient should equal the zero vector.","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"nabla f(x) = 0","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"A second order Taylor expansion of the left-hand side leads to the iterative scheme","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"x_n+1 = x_n - H(x_n)^-1nabla f(x_n)","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"where the inverse is not calculated directly, but the step size is instead calculated by solving","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"H(x) textbfs = nabla f(x_n)","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"This is equivalent to minimizing a quadratic model, m_k around the current x_n","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"m_k(s) = f(x_n) + nabla f(x_n)^top textbfs + frac12 textbfs^top H(x_n) textbfs","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"For functions where H(x_n) is difficult, or computationally expensive to obtain, we might replace the Hessian with another positive definite matrix that approximates it. Such methods are called Quasi-Newton methods; see (L-)BFGS and Gradient Descent.","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"In a sufficiently small neighborhood around the minimizer, Newton's method has quadratic convergence, but globally it might have slower convergence, or it might even diverge. To ensure convergence, a line search is performed for each textbfs. This amounts to replacing the step formula above with","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"x_n+1 = x_n - alpha textbfs","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"and finding a scalar alpha such that we get sufficient descent; see the line search section for more information.","category":"page"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"Additionally, if the function is locally concave, the step taken in the formulas above will go in a direction of ascent, as the Hessian will not be positive (semi)definite. To avoid this, we use a specialized method to calculate the step direction. If the Hessian is positive semidefinite then the method used is standard, but if it is not, a correction is made using the functionality in PositiveFactorizations.jl.","category":"page"},{"location":"algo/newton/#Example","page":"Newton","title":"Example","text":"","category":"section"},{"location":"algo/newton/","page":"Newton","title":"Newton","text":"show the example from the issue","category":"page"},{"location":"algo/newton/#References","page":"Newton","title":"References","text":"","category":"section"},{"location":"algo/linesearch/#Line-search","page":"Linesearch","title":"Line search","text":"","category":"section"},{"location":"algo/linesearch/#Description","page":"Linesearch","title":"Description","text":"","category":"section"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"The line search functionality has been moved to LineSearches.jl.","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"Line search is used to decide the step length along the direction computed by an optimization algorithm.","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"The following Optim algorithms use line search:","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"Accelerated Gradient Descent\n(L-)BFGS\nConjugate Gradient\nGradient Descent\nMomentum Gradient Descent\nNewton","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"By default Optim calls the line search algorithm HagerZhang() provided by LineSearches. Different line search algorithms can be assigned with the linesearch keyword argument to the given algorithm.","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"LineSearches also allows the user to decide how the initial step length for the line search algorithm is chosen. This is set with the alphaguess keyword argument for the Optim algorithm. The default procedure varies.","category":"page"},{"location":"algo/linesearch/#Example","page":"Linesearch","title":"Example","text":"","category":"section"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"This example compares two different line search algorithms on the Rosenbrock problem.","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"First, run Newton with the default line search algorithm:","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"using Optim, LineSearches\nprob = Optim.UnconstrainedProblems.examples[\"Rosenbrock\"]\n\nalgo_hz = Newton(;alphaguess = LineSearches.InitialStatic(), linesearch = LineSearches.HagerZhang())\nres_hz = Optim.optimize(prob.f, prob.g!, prob.h!, prob.initial_x, method=algo_hz)","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"This gives the result","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":" * Algorithm: Newton's Method\n * Starting Point: [0.0,0.0]\n * Minimizer: [0.9999999999999994,0.9999999999999989]\n * Minimum: 3.081488e-31\n * Iterations: 14\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 3.06e-09\n * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false\n |f(x) - f(x')| = 2.94e+13 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 1.11e-15\n * stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 44\n * Gradient Calls: 44\n * Hessian Calls: 14","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"Now we can try Newton with the More-Thuente line search:","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"algo_mt = Newton(;alphaguess = LineSearches.InitialStatic(), linesearch = LineSearches.MoreThuente())\nres_mt = Optim.optimize(prob.f, prob.g!, prob.h!, prob.initial_x, method=algo_mt)","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"This gives the following result, reducing the number of function and gradient calls:","category":"page"},{"location":"algo/linesearch/","page":"Linesearch","title":"Linesearch","text":"Results of Optimization Algorithm\n * Algorithm: Newton's Method\n * Starting Point: [0.0,0.0]\n * Minimizer: [0.9999999999999992,0.999999999999998]\n * Minimum: 2.032549e-29\n * Iterations: 14\n * Convergence: true\n * |x - x'| ≤ 0.0e+00: false\n |x - x'| = 3.67e-08\n * |f(x) - f(x')| ≤ 0.0e00 |f(x)|: false\n |f(x) - f(x')| = 1.66e+13 |f(x)|\n * |g(x)| ≤ 1.0e-08: true\n |g(x)| = 1.76e-13\n * stopped by an increasing objective: false\n * Reached Maximum Number of Iterations: false\n * Objective Calls: 17\n * Gradient Calls: 17\n * Hessian Calls: 14","category":"page"},{"location":"algo/linesearch/#References","page":"Linesearch","title":"References","text":"","category":"section"},{"location":"user/algochoice/#Algorithm-choice","page":"Algorithm choice","title":"Algorithm choice","text":"","category":"section"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"There are two main settings you must choose in Optim: the algorithm and the linesearch.","category":"page"},{"location":"user/algochoice/#Algorithms","page":"Algorithm choice","title":"Algorithms","text":"","category":"section"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"The first choice to be made is that of the order of the method. Zeroth-order methods do not have gradient information, and are very slow to converge, especially in high dimension. First-order methods do not have access to curvature information and can take a large number of iterations to converge for badly conditioned problems. Second-order methods can converge very quickly once in the vicinity of a minimizer. Of course, this enhanced performance comes at a cost: the objective function has to be differentiable, you have to supply gradients and Hessians, and, for second order methods, a linear system has to be solved at each step.","category":"page"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"If you can provide analytic gradients and Hessians, and the dimension of the problem is not too large, then second order methods are very efficient. The Newton method with trust region is the method of choice. ","category":"page"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"When you do not have an explicit Hessian or when the dimension becomes large enough that the linear solve in the Newton method becomes the bottleneck, first order methods should be preferred. BFGS is a very efficient method, but also requires a linear system solve. LBFGS usually has a performance very close to that of BFGS, and avoids linear system solves (the parameter m can be tweaked: increasing it can improve the convergence, at the expense of memory and time spent in linear algebra operations). The conjugate gradient method usually converges less quickly than LBFGS, but requires less memory. Gradient descent should only be used for testing. Acceleration methods are experimental.","category":"page"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"When the objective function is non-differentiable or you do not want to use gradients, use zeroth-order methods. Nelder-Mead is currently the most robust.","category":"page"},{"location":"user/algochoice/#Linesearches","page":"Algorithm choice","title":"Linesearches","text":"","category":"section"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"Linesearches are used in every first- and second-order method except for the trust-region Newton method. Linesearch routines attempt to locate quickly an approximate minimizer of the univariate function alpha to f(x+ alpha d), where d is the descent direction computed by the algorithm. They vary in how accurate this minimization is. Two good linesearches are BackTracking and HagerZhang, the former being less stringent than the latter. For well-conditioned objective functions and methods where the step is usually well-scaled (such as LBFGS or Newton), a rough linesearch such as BackTracking is usually the most performant. For badly behaved problems or when extreme accuracy is needed (gradients below the square root of the machine epsilon, about 10^-8 with Float64), the HagerZhang method proves more robust. An exception is the conjugate gradient method which requires an accurate linesearch to be efficient, and should be used with the HagerZhang linesearch.","category":"page"},{"location":"user/algochoice/#Summary","page":"Algorithm choice","title":"Summary","text":"","category":"section"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"As a very crude heuristic:","category":"page"},{"location":"user/algochoice/","page":"Algorithm choice","title":"Algorithm choice","text":"For a low-dimensional problem with analytic gradients and Hessians, use the Newton method with trust region. For larger problems or when there is no analytic Hessian, use LBFGS, and tweak the parameter m if needed. If the function is non-differentiable, use Nelder-Mead. Use the HagerZhang linesearch for robustness and BackTracking for speed.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"EditURL = \"../maxlikenlm.jl\"","category":"page"},{"location":"examples/generated/maxlikenlm/#Maximum-Likelihood-Estimation:-The-Normal-Linear-Model","page":"Maximum likelihood estimation","title":"Maximum Likelihood Estimation: The Normal Linear Model","text":"","category":"section"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"tip: Tip\nThis example is also available as a Jupyter notebook: maxlikenlm.ipynb","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The following tutorial will introduce maximum likelihood estimation in Julia for the normal linear model.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The normal linear model (sometimes referred to as the OLS model) is the workhorse of regression modeling and is utilized across a number of diverse fields. In this tutorial, we will utilize simulated data to demonstrate how Julia can be used to recover the parameters of interest.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The first order of business is to use the Optim package and also include the NLSolversBase routine:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"using Optim, NLSolversBase\nusing LinearAlgebra: diag\nusing ForwardDiff","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"tip: Tip\nAdd Optim with the following command at the Julia command prompt: Pkg.add(\"Optim\")","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The first item that needs to be addressed is the data generating process or DGP. The following code will produce data from a normal linear model:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"n = 40 # Number of observations\nnvar = 2 # Number of variables\nβ = ones(nvar) * 3.0 # True coefficients\nx = [ 1.0 0.156651\t\t\t\t# X matrix of explanatory variables plus constant\n 1.0 -1.34218\n 1.0 0.238262\n 1.0 -0.496572\n 1.0 1.19352\n 1.0 0.300229\n 1.0 0.409127\n 1.0 -0.88967\n 1.0 -0.326052\n 1.0 -1.74367\n 1.0 -0.528113\n 1.0 1.42612\n 1.0 -1.08846\n 1.0 -0.00972169\n 1.0 -0.85543\n 1.0 1.0301\n 1.0 1.67595\n 1.0 -0.152156\n 1.0 0.26666\n 1.0 -0.668618\n 1.0 -0.36883\n 1.0 -0.301392\n 1.0 0.0667779\n 1.0 -0.508801\n 1.0 -0.352346\n 1.0 0.288688\n 1.0 -0.240577\n 1.0 -0.997697\n 1.0 -0.362264\n 1.0 0.999308\n 1.0 -1.28574\n 1.0 -1.91253\n 1.0 0.825156\n 1.0 -0.136191\n 1.0 1.79925\n 1.0 -1.10438\n 1.0 0.108481\n 1.0 0.847916\n 1.0 0.594971\n 1.0 0.427909]\n\nε = [0.5539830489065279 # Errors\n -0.7981494315544392\n 0.12994853889935182\n 0.23315434715658184\n -0.1959788033050691\n -0.644463980478783\n -0.04055657880388486\n -0.33313251280917094\n -0.315407370840677\n 0.32273952815870866\n 0.56790436131181\n 0.4189982390480762\n -0.0399623088796998\n -0.2900421677961449\n -0.21938513655749814\n -0.2521429229103657\n 0.0006247891825243118\n -0.694977951759846\n -0.24108791530910414\n 0.1919989647431539\n 0.15632862280544485\n -0.16928298502504732\n 0.08912288359190582\n 0.0037707641031662006\n -0.016111044809837466\n 0.01852191562589722\n -0.762541135294584\n -0.7204431774719634\n -0.04394527523005201\n -0.11956323865320413\n -0.6713329013627437\n -0.2339928433338628\n -0.6200532213195297\n -0.6192380993792371\n 0.08834918731846135\n -0.5099307915921438\n 0.41527207925609494\n -0.7130133329859893\n -0.531213372742777\n -0.09029672309221337]\n\ny = x * β + ε; # Generate Data\nnothing #hide","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"In the above example, we have 500 observations, 2 explanatory variables plus an intercept, an error variance equal to 0.5, coefficients equal to 3.0, and all of these are subject to change by the user. Since we know the true value of these parameters, we should obtain these values when we maximize the likelihood function.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The next step in our tutorial is to define a Julia function for the likelihood function. The following function defines the likelihood function for the normal linear model:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"function Log_Likelihood(X, Y, β, log_σ)\n σ = exp(log_σ)\n llike = -n/2*log(2π) - n/2* log(σ^2) - (sum((Y - X * β).^2) / (2σ^2))\n llike = -llike\nend","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The log likelihood function accepts 4 inputs: the matrix of explanatory variables (X), the dependent variable (Y), the β's, and the error varicance. Note that we exponentiate the error variance in the second line of the code because the error variance cannot be negative and we want to avoid this situation when maximizing the likelihood.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The next step in our tutorial is to optimize our function. We first use the TwiceDifferentiable command in order to obtain the Hessian matrix later on, which will be used to help form t-statistics:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"func = TwiceDifferentiable(vars -> Log_Likelihood(x, y, vars[1:nvar], vars[nvar + 1]),\n ones(nvar+1); autodiff=:forward);\nnothing #hide","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The above statment accepts 4 inputs: the x matrix, the dependent variable y, and a vector of β's and the error variance. The vars[1:nvar] is how we pass the vector of β's and the vars[nvar + 1] is how we pass the error variance. You can think of this as a vector of parameters with the first 2 being β's and the last one is the error variance.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The ones(nvar+1) are the starting values for the parameters and the autodiff=:forward command performs forward mode automatic differentiation.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The actual optimization of the likelihood function is accomplished with the following command:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"opt = optimize(func, ones(nvar+1))","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The first input to the command is the function we wish to optimize and the second input are the starting values.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"After a brief period of time, you should see output of the optimization routine, with the parameter estimates being very close to our simulated values.","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"The optimization routine stores several quantities and we can obtain the maximim likelihood estimates with the following command:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"parameters = Optim.minimizer(opt)","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"!!! Note Fieldnames for all of the quantities can be obtained with the following command: fieldnames(opt)","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"In order to obtain the correct Hessian matrix, we have to \"push\" the actual parameter values that maximizes the likelihood function since the TwiceDifferentiable command uses the next to last values to calculate the Hessian:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"numerical_hessian = hessian!(func,parameters)","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"Let's find the estimated value of σ, rather than log σ, and it's standard error To do this, we will use the Delta Method: https://en.wikipedia.org/wiki/Delta_method","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"this function exponetiates log σ","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"function transform(parameters)\n parameters[end] = exp(parameters[end])\n parameters\nend","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"get the Jacobian of the transformation","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"J = ForwardDiff.jacobian(transform, parameters)'\nparameters = transform(parameters)","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"We can now invert our Hessian matrix and use the Delta Method, to obtain the variance-covariance matrix:","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"var_cov_matrix = J*inv(numerical_hessian)*J'","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"test the estimated parameters and t-stats for correctness","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"t_stats = parameters./sqrt.(diag(var_cov_matrix))","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"see the results","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"println(\"parameter estimates:\", parameters)\nprintln(\"t-statsitics: \", t_stats)","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"From here, one may examine other statistics of interest using the output from the optimization routine.","category":"page"},{"location":"examples/generated/maxlikenlm/#maxlikenlm-plain-program","page":"Maximum likelihood estimation","title":"Plain Program","text":"","category":"section"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"Below follows a version of the program without any comments. The file is also available here: maxlikenlm.jl","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"using Optim, NLSolversBase\nusing LinearAlgebra: diag\nusing ForwardDiff\n\nn = 40 # Number of observations\nnvar = 2 # Number of variables\nβ = ones(nvar) * 3.0 # True coefficients\nx = [ 1.0 0.156651\t\t\t\t# X matrix of explanatory variables plus constant\n 1.0 -1.34218\n 1.0 0.238262\n 1.0 -0.496572\n 1.0 1.19352\n 1.0 0.300229\n 1.0 0.409127\n 1.0 -0.88967\n 1.0 -0.326052\n 1.0 -1.74367\n 1.0 -0.528113\n 1.0 1.42612\n 1.0 -1.08846\n 1.0 -0.00972169\n 1.0 -0.85543\n 1.0 1.0301\n 1.0 1.67595\n 1.0 -0.152156\n 1.0 0.26666\n 1.0 -0.668618\n 1.0 -0.36883\n 1.0 -0.301392\n 1.0 0.0667779\n 1.0 -0.508801\n 1.0 -0.352346\n 1.0 0.288688\n 1.0 -0.240577\n 1.0 -0.997697\n 1.0 -0.362264\n 1.0 0.999308\n 1.0 -1.28574\n 1.0 -1.91253\n 1.0 0.825156\n 1.0 -0.136191\n 1.0 1.79925\n 1.0 -1.10438\n 1.0 0.108481\n 1.0 0.847916\n 1.0 0.594971\n 1.0 0.427909]\n\nε = [0.5539830489065279 # Errors\n -0.7981494315544392\n 0.12994853889935182\n 0.23315434715658184\n -0.1959788033050691\n -0.644463980478783\n -0.04055657880388486\n -0.33313251280917094\n -0.315407370840677\n 0.32273952815870866\n 0.56790436131181\n 0.4189982390480762\n -0.0399623088796998\n -0.2900421677961449\n -0.21938513655749814\n -0.2521429229103657\n 0.0006247891825243118\n -0.694977951759846\n -0.24108791530910414\n 0.1919989647431539\n 0.15632862280544485\n -0.16928298502504732\n 0.08912288359190582\n 0.0037707641031662006\n -0.016111044809837466\n 0.01852191562589722\n -0.762541135294584\n -0.7204431774719634\n -0.04394527523005201\n -0.11956323865320413\n -0.6713329013627437\n -0.2339928433338628\n -0.6200532213195297\n -0.6192380993792371\n 0.08834918731846135\n -0.5099307915921438\n 0.41527207925609494\n -0.7130133329859893\n -0.531213372742777\n -0.09029672309221337]\n\ny = x * β + ε; # Generate Data\n\nfunction Log_Likelihood(X, Y, β, log_σ)\n σ = exp(log_σ)\n llike = -n/2*log(2π) - n/2* log(σ^2) - (sum((Y - X * β).^2) / (2σ^2))\n llike = -llike\nend\n\nfunc = TwiceDifferentiable(vars -> Log_Likelihood(x, y, vars[1:nvar], vars[nvar + 1]),\n ones(nvar+1); autodiff=:forward);\n\nopt = optimize(func, ones(nvar+1))\n\nparameters = Optim.minimizer(opt)\n\nnumerical_hessian = hessian!(func,parameters)\n\nfunction transform(parameters)\n parameters[end] = exp(parameters[end])\n parameters\nend\n\nJ = ForwardDiff.jacobian(transform, parameters)'\nparameters = transform(parameters)\n\nvar_cov_matrix = J*inv(numerical_hessian)*J'\n\nt_stats = parameters./sqrt.(diag(var_cov_matrix))\n\nprintln(\"parameter estimates:\", parameters)\nprintln(\"t-statsitics: \", t_stats)\n\n# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"","category":"page"},{"location":"examples/generated/maxlikenlm/","page":"Maximum likelihood estimation","title":"Maximum likelihood estimation","text":"This page was generated using Literate.jl.","category":"page"},{"location":"user/gradientsandhessians/#Gradients-and-Hessians","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"","category":"section"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"To use first- and second-order methods, you need to provide gradients and Hessians, either in-place or out-of-place. There are three main ways of specifying derivatives: analytic, finite-difference and automatic differentiation.","category":"page"},{"location":"user/gradientsandhessians/#Analytic","page":"Gradients and Hessians","title":"Analytic","text":"","category":"section"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"This results in the fastest run times, but requires the user to perform the often tedious task of computing the derivatives by hand. The gradient of complicated objective functions (e.g. involving the solution of algebraic equations, differential equations, eigendecompositions, etc.) can be computed efficiently using the adjoint method (see e.g. these lecture notes). In particular, assuming infinite memory, the gradient of a mathbbR^N to mathbbR function f can always be computed with a runtime comparable with only one evaluation of f, no matter how large N.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"To use analytic derivatives, simply pass g! and h! functions to optimize.","category":"page"},{"location":"user/gradientsandhessians/#Finite-differences","page":"Gradients and Hessians","title":"Finite differences","text":"","category":"section"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"This uses the functionality in DiffEqDiffTools.jl to compute gradients and Hessians through central finite differences: f(x) approx fracf(x+h)-f(x-h)2h. For a mathbbR^N to mathbbR objective function f, this requires 2N evaluations of f. It is therefore efficient in low dimensions but slow when N is large. It is also inaccurate: h is chosen equal to epsilon^13 where epsilon is the machine epsilon (about 10^-16 for Float64) to balance the truncation and rounding errors, resulting in an error of epsilon^23 (about 10^-11 for Float64) for the derivative.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Finite differences are on by default if gradients and Hessians are not supplied to the optimize call.","category":"page"},{"location":"user/gradientsandhessians/#Automatic-differentiation","page":"Gradients and Hessians","title":"Automatic differentiation","text":"","category":"section"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Automatic differentiation techniques are a middle ground between finite differences and analytic computations. They are exact up to machine precision, and do not require intervention from the user. They come in two main flavors: forward and reverse mode. Forward-mode automatic differentiation is relatively straightforward to implement by propagating the sensitivities of the input variables, and is often faster than finite differences. The disadvantage is that the objective function has to be written using only Julia code. Forward-mode automatic differentiation still requires a runtime comparable to N evaluations of f, and is therefore costly in large dimensions, like finite differences.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Reverse-mode automatic differentiation can be seen as an automatic implementation of the adjoint method mentioned above, and requires a runtime comparable to only one evaluation of f. It is however considerably more complex to implement, requiring to record the execution of the program to then run it backwards, and incurs a larger overhead.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Forward-mode automatic differentiation is supported through the ForwardDiff.jl package by providing the autodiff=:forward keyword to optimize. Reverse-mode automatic differentiation is not supported explicitly yet (although you can use it by writing your own g! function). There are a number of implementations in Julia, such as ReverseDiff.jl.","category":"page"},{"location":"user/gradientsandhessians/#Example","page":"Gradients and Hessians","title":"Example","text":"","category":"section"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Let us consider the Rosenbrock example again.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"function f(x)\n return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\nend\n\nfunction g!(G, x)\n G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]\n G[2] = 200.0 * (x[2] - x[1]^2)\nend\n\nfunction h!(H, x)\n H[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2\n H[1, 2] = -400.0 * x[1]\n H[2, 1] = -400.0 * x[1]\n H[2, 2] = 200.0\nend\n\ninitial_x = zeros(2)","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Let us see if BFGS and Newton's Method can solve this problem with the functions provided.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"julia> Optim.minimizer(optimize(f, g!, h!, initial_x, BFGS()))\n2-element Array{Float64,1}:\n 1.0\n 1.0\n\njulia> Optim.minimizer(optimize(f, g!, h!, initial_x, Newton()))\n\n2-element Array{Float64,1}:\n 1.0\n 1.0","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"This is indeed the case. Now let us use finite differences for BFGS.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"julia> Optim.minimizer(optimize(f, initial_x, BFGS()))\n2-element Array{Float64,1}:\n 1.0\n 1.0","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Still looks good. Returning to automatic differentiation, let us try both solvers using this method. We enable forward mode automatic differentiation by using the autodiff = :forward keyword.","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"julia> Optim.minimizer(optimize(f, initial_x, BFGS(); autodiff = :forward))\n2-element Array{Float64,1}:\n 1.0\n 1.0\n\njulia> Optim.minimizer(optimize(f, initial_x, Newton(); autodiff = :forward))\n2-element Array{Float64,1}:\n 1.0\n 1.0","category":"page"},{"location":"user/gradientsandhessians/","page":"Gradients and Hessians","title":"Gradients and Hessians","text":"Indeed, the minimizer was found, without providing any gradients or Hessians.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"EditURL = \"../ipnewton_basics.jl\"","category":"page"},{"location":"examples/generated/ipnewton_basics/#Nonlinear-constrained-optimization","page":"Interior point Newton","title":"Nonlinear constrained optimization","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"tip: Tip\nThis example is also available as a Jupyter notebook: ipnewton_basics.ipynb","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"The nonlinear constrained optimization interface in Optim assumes that the user can write the optimization problem in the following way.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"min_xinmathbbR^n f(x) quad textsuch that\nl_x leq phantomc(xphantom) leq u_x \nl_c leq c(x) leq u_c","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"For equality constraints on x_j or c(x)_j you set those particular entries of bounds to be equal, l_j=u_j. Likewise, setting l_j=-infty or u_j=infty means that the constraint is unbounded from below or above respectively.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"using Optim, NLSolversBase #hide\nimport NLSolversBase: clear! #hide","category":"page"},{"location":"examples/generated/ipnewton_basics/#Constrained-optimization-with-IPNewton","page":"Interior point Newton","title":"Constrained optimization with IPNewton","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We will go through examples on how to use the constraints interface with the interior-point Newton optimization algorithm IPNewton.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"Throughout these examples we work with the standard Rosenbrock function. The objective and its derivatives are given by","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"fun(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n\nfunction fun_grad!(g, x)\ng[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]\ng[2] = 200.0 * (x[2] - x[1]^2)\nend\n\nfunction fun_hess!(h, x)\nh[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2\nh[1, 2] = -400.0 * x[1]\nh[2, 1] = -400.0 * x[1]\nh[2, 2] = 200.0\nend;\nnothing #hide","category":"page"},{"location":"examples/generated/ipnewton_basics/#Optimization-interface","page":"Interior point Newton","title":"Optimization interface","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"To solve a constrained optimization problem we call the optimize method","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"optimize(d::AbstractObjective, constraints::AbstractConstraints, initial_x::Tx, method::ConstrainedOptimizer, options::Options)","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We can create instances of AbstractObjective and AbstractConstraints using the types TwiceDifferentiable and TwiceDifferentiableConstraints from the package NLSolversBase.jl.","category":"page"},{"location":"examples/generated/ipnewton_basics/#Box-minimization","page":"Interior point Newton","title":"Box minimization","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We want to optimize the Rosenbrock function in the box -05 leq x leq 05, starting from the point x_0=(00). Box constraints are defined using, for example, TwiceDifferentiableConstraints(lx, ux).","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"x0 = [0.0, 0.0]\ndf = TwiceDifferentiable(fun, fun_grad!, fun_hess!, x0)\n\nlx = [-0.5, -0.5]; ux = [0.5, 0.5]\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nres = optimize(df, dfc, x0, IPNewton())","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"Like the rest of Optim, you can also use autodiff=:forward and just pass in fun.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"If we only want to set lower bounds, use ux = fill(Inf, 2)","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"ux = fill(Inf, 2)\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nclear!(df)\nres = optimize(df, dfc, x0, IPNewton())","category":"page"},{"location":"examples/generated/ipnewton_basics/#Defining-\"unconstrained\"-problems","page":"Interior point Newton","title":"Defining \"unconstrained\" problems","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"An unconstrained problem can be defined either by passing Inf bounds or empty arrays. Note that we must pass the correct type information to the empty lx and ux","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"lx = fill(-Inf, 2); ux = fill(Inf, 2)\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nclear!(df)\nres = optimize(df, dfc, x0, IPNewton())\n\nlx = Float64[]; ux = Float64[]\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nclear!(df)\nres = optimize(df, dfc, x0, IPNewton())","category":"page"},{"location":"examples/generated/ipnewton_basics/#Generic-nonlinear-constraints","page":"Interior point Newton","title":"Generic nonlinear constraints","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We now consider the Rosenbrock problem with a constraint on","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":" c(x)_1 = x_1^2 + x_2^2","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We pass the information about the constraints to optimize by defining a vector function c(x) and its Jacobian J(x).","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"The Hessian information is treated differently, by considering the Lagrangian of the corresponding slack-variable transformed optimization problem. This is similar to how the CUTEst library works. Let H_j(x) represent the Hessian of the jth component c(x)_j of the generic constraints. and lambda_j the corresponding dual variable in the Lagrangian. Then we want the constraint object to add the values of H_j(x) to the Hessian of the objective, weighted by lambda_j.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"The Julian form for the supplied function c(x) and the derivative information is then added in the following way.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"con_c!(c, x) = (c[1] = x[1]^2 + x[2]^2; c)\nfunction con_jacobian!(J, x)\n J[1,1] = 2*x[1]\n J[1,2] = 2*x[2]\n J\nend\nfunction con_h!(h, x, λ)\n h[1,1] += λ[1]*2\n h[2,2] += λ[1]*2\nend;\nnothing #hide","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"Note that con_h! adds the λ-weighted Hessian value of each element of c(x) to the Hessian of fun.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We can then optimize the Rosenbrock function inside the ball of radius 05.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"lx = Float64[]; ux = Float64[]\nlc = [-Inf]; uc = [0.5^2]\ndfc = TwiceDifferentiableConstraints(con_c!, con_jacobian!, con_h!,\n lx, ux, lc, uc)\nres = optimize(df, dfc, x0, IPNewton())","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We can add a lower bound on the constraint, and thus optimize the objective on the annulus with inner and outer radii 01 and 05 respectively.","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"lc = [0.1^2]\ndfc = TwiceDifferentiableConstraints(con_c!, con_jacobian!, con_h!,\n lx, ux, lc, uc)\nres = optimize(df, dfc, x0, IPNewton())","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"Note that the algorithm warns that the Initial guess is not an interior point. IPNewton can often handle this, however, if the initial guess is such that c(x) = u_c, then the algorithm currently fails. We may fix this in the future.","category":"page"},{"location":"examples/generated/ipnewton_basics/#Multiple-constraints","page":"Interior point Newton","title":"Multiple constraints","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"The following example illustrates how to add an additional constraint. In particular, we add a constraint function","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":" c(x)_2 = x_2sin(x_1)-x_1","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"function con2_c!(c, x)\n c[1] = x[1]^2 + x[2]^2 ## First constraint\n c[2] = x[2]*sin(x[1])-x[1] ## Second constraint\n c\nend\nfunction con2_jacobian!(J, x)\n # First constraint\n J[1,1] = 2*x[1]\n J[1,2] = 2*x[2]\n # Second constraint\n J[2,1] = x[2]*cos(x[1])-1.0\n J[2,2] = sin(x[1])\n J\nend\nfunction con2_h!(h, x, λ)\n # First constraint\n h[1,1] += λ[1]*2\n h[2,2] += λ[1]*2\n # Second constraint\n h[1,1] += λ[2]*x[2]*-sin(x[1])\n h[1,2] += λ[2]*cos(x[1])\n # Symmetrize h\n h[2,1] = h[1,2]\n h\nend;\nnothing #hide","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"We generate the constraint objects and call IPNewton with initial guess x_0 = (025025).","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"x0 = [0.25, 0.25]\nlc = [-Inf, 0.0]; uc = [0.5^2, 0.0]\ndfc = TwiceDifferentiableConstraints(con2_c!, con2_jacobian!, con2_h!,\n lx, ux, lc, uc)\nres = optimize(df, dfc, x0, IPNewton())","category":"page"},{"location":"examples/generated/ipnewton_basics/#ipnewton_basics-plain-program","page":"Interior point Newton","title":"Plain Program","text":"","category":"section"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"Below follows a version of the program without any comments. The file is also available here: ipnewton_basics.jl","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"using Optim, NLSolversBase #hide\nimport NLSolversBase: clear! #hide\n\nfun(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n\nfunction fun_grad!(g, x)\ng[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]\ng[2] = 200.0 * (x[2] - x[1]^2)\nend\n\nfunction fun_hess!(h, x)\nh[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2\nh[1, 2] = -400.0 * x[1]\nh[2, 1] = -400.0 * x[1]\nh[2, 2] = 200.0\nend;\n\nx0 = [0.0, 0.0]\ndf = TwiceDifferentiable(fun, fun_grad!, fun_hess!, x0)\n\nlx = [-0.5, -0.5]; ux = [0.5, 0.5]\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nres = optimize(df, dfc, x0, IPNewton())\n\nux = fill(Inf, 2)\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nclear!(df)\nres = optimize(df, dfc, x0, IPNewton())\n\nlx = fill(-Inf, 2); ux = fill(Inf, 2)\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nclear!(df)\nres = optimize(df, dfc, x0, IPNewton())\n\nlx = Float64[]; ux = Float64[]\ndfc = TwiceDifferentiableConstraints(lx, ux)\n\nclear!(df)\nres = optimize(df, dfc, x0, IPNewton())\n\ncon_c!(c, x) = (c[1] = x[1]^2 + x[2]^2; c)\nfunction con_jacobian!(J, x)\n J[1,1] = 2*x[1]\n J[1,2] = 2*x[2]\n J\nend\nfunction con_h!(h, x, λ)\n h[1,1] += λ[1]*2\n h[2,2] += λ[1]*2\nend;\n\nlx = Float64[]; ux = Float64[]\nlc = [-Inf]; uc = [0.5^2]\ndfc = TwiceDifferentiableConstraints(con_c!, con_jacobian!, con_h!,\n lx, ux, lc, uc)\nres = optimize(df, dfc, x0, IPNewton())\n\nlc = [0.1^2]\ndfc = TwiceDifferentiableConstraints(con_c!, con_jacobian!, con_h!,\n lx, ux, lc, uc)\nres = optimize(df, dfc, x0, IPNewton())\n\nfunction con2_c!(c, x)\n c[1] = x[1]^2 + x[2]^2 ## First constraint\n c[2] = x[2]*sin(x[1])-x[1] ## Second constraint\n c\nend\nfunction con2_jacobian!(J, x)\n # First constraint\n J[1,1] = 2*x[1]\n J[1,2] = 2*x[2]\n # Second constraint\n J[2,1] = x[2]*cos(x[1])-1.0\n J[2,2] = sin(x[1])\n J\nend\nfunction con2_h!(h, x, λ)\n # First constraint\n h[1,1] += λ[1]*2\n h[2,2] += λ[1]*2\n # Second constraint\n h[1,1] += λ[2]*x[2]*-sin(x[1])\n h[1,2] += λ[2]*cos(x[1])\n # Symmetrize h\n h[2,1] = h[1,2]\n h\nend;\n\nx0 = [0.25, 0.25]\nlc = [-Inf, 0.0]; uc = [0.5^2, 0.0]\ndfc = TwiceDifferentiableConstraints(con2_c!, con2_jacobian!, con2_h!,\n lx, ux, lc, uc)\nres = optimize(df, dfc, x0, IPNewton())\n\n# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"","category":"page"},{"location":"examples/generated/ipnewton_basics/","page":"Interior point Newton","title":"Interior point Newton","text":"This page was generated using Literate.jl.","category":"page"},{"location":"user/tipsandtricks/#Dealing-with-constant-parameters","page":"Tips and tricks","title":"Dealing with constant parameters","text":"","category":"section"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"In many applications, there may be factors that are relevant to the function evaluations, but are fixed throughout the optimization. An obvious example is using data in a likelihood function, but it could also be parameters we wish to hold constant.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Consider a squared error loss function that depends on some data x and y, and parameters betas. As far as the solver is concerned, there should only be one input argument to the function we want to minimize, call it sqerror.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"The problem is that we want to optimize a function sqerror that really depends on three inputs, and two of them are constant throughout the optimization procedure. To do this, we need to define the variables x and y","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"x = [1.0, 2.0, 3.0]\ny = 1.0 .+ 2.0 .* x .+ [-0.3, 0.3, -0.1]","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"We then simply define a function in three variables","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"function sqerror(betas, X, Y)\n err = 0.0\n for i in 1:length(X)\n pred_i = betas[1] + betas[2] * X[i]\n err += (Y[i] - pred_i)^2\n end\n return err\nend","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"and then optimize the following anonymous function","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"res = optimize(b -> sqerror(b, x, y), [0.0, 0.0])","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Alternatively, we can define a closure sqerror(betas) that is aware of the variables we just defined","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"function sqerror(betas)\n err = 0.0\n for i in 1:length(x)\n pred_i = betas[1] + betas[2] * x[i]\n err += (y[i] - pred_i)^2\n end\n return err\nend","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"We can then optimize the sqerror function just like any other function","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"res = optimize(sqerror, [0.0, 0.0])","category":"page"},{"location":"user/tipsandtricks/#Avoid-repeating-computations","page":"Tips and tricks","title":"Avoid repeating computations","text":"","category":"section"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Say you are optimizing a function","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"f(x) = x[1]^2+x[2]^2\ng!(storage, x) = copyto!(storage, [2x[1], 2x[2]])","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"In this situation, no calculations from f could be reused in g!. However, sometimes there is a substantial similarity between the objective function, and gradient, and some calculations can be reused.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"To avoid repeating calculations, define functions fg! or fgh! that compute the objective function, the gradient and the Hessian (if needed) simultaneously. These functions internally can be written to avoid repeating common calculations.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"For example, here we define a function fg! to compute the objective function and the gradient, as required:","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"function fg!(F, G, x)\n # do common computations here\n # ...\n if G !== nothing\n # code to compute gradient here\n # writing the result to the vector G\n # G .= ...\n end\n if F !== nothing\n # value = ... code to compute objective function\n return value\n end\nend","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Optim will only call this function with an argument G that is nothing (if the gradient is not required) or a Vector that should be filled (in-place) with the gradient. This flexibility is convenient for algorithms that only use the gradient in some iterations but not in others.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Now we call optimize with the following syntax:","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Optim.optimize(Optim.only_fg!(fg!), [0., 0.], Optim.LBFGS())","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Similarly, for a computation that requires the Hessian, we can write:","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"function fgh!(F, G, H, x)\n G === nothing || # compute gradient and store in G\n H === nothing || # compute Hessian and store in H\n F === nothing || return f(x)\n nothing\nend\n\nOptim.optimize(Optim.only_fgh!(fgh!), [0., 0.], Optim.Newton())","category":"page"},{"location":"user/tipsandtricks/#Provide-gradients","page":"Tips and tricks","title":"Provide gradients","text":"","category":"section"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"As mentioned in the general introduction, passing analytical gradients can have an impact on performance. To show an example of this, consider the separable extension of the Rosenbrock function in dimension 5000, see SROSENBR in CUTEst.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Below, we use the gradients and objective functions from mastsif through CUTEst.jl. We only show the first five iterations of an attempt to minimize the function using Gradient Descent.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"julia> @time optimize(f, initial_x, GradientDescent(),\n Optim.Options(show_trace=true, iterations = 5))\nIter Function value Gradient norm\n 0 4.850000e+04 2.116000e+02\n 1 1.018734e+03 2.704951e+01\n 2 3.468449e+00 5.721261e-01\n 3 2.966899e+00 2.638790e-02\n 4 2.511859e+00 5.237768e-01\n 5 2.107853e+00 1.020287e-01\n 21.731129 seconds (1.61 M allocations: 63.434 MB, 0.03% gc time)\nResults of Optimization Algorithm\n * Algorithm: Gradient Descent\n * Starting Point: [1.2,1.0, ...]\n * Minimizer: [1.0287767703731154,1.058769439356144, ...]\n * Minimum: 2.107853e+00\n * Iterations: 5\n * Convergence: false\n * |x - x'| < 0.0: false\n * |f(x) - f(x')| / |f(x)| < 0.0: false\n * |g(x)| < 1.0e-08: false\n * Reached Maximum Number of Iterations: true\n * Objective Function Calls: 23\n * Gradient Calls: 23\n\njulia> @time optimize(f, g!, initial_x, GradientDescent(),\n Optim.Options(show_trace=true, iterations = 5))\nIter Function value Gradient norm\n 0 4.850000e+04 2.116000e+02\n 1 1.018769e+03 2.704998e+01\n 2 3.468488e+00 5.721481e-01\n 3 2.966900e+00 2.638792e-02\n 4 2.511828e+00 5.237919e-01\n 5 2.107802e+00 1.020415e-01\n 0.009889 seconds (915 allocations: 270.266 KB)\nResults of Optimization Algorithm\n * Algorithm: Gradient Descent\n * Starting Point: [1.2,1.0, ...]\n * Minimizer: [1.0287763814102757,1.05876866832087, ...]\n * Minimum: 2.107802e+00\n * Iterations: 5\n * Convergence: false\n * |x - x'| < 0.0: false\n * |f(x) - f(x')| / |f(x)| < 0.0: false\n * |g(x)| < 1.0e-08: false\n * Reached Maximum Number of Iterations: true\n * Objective Function Calls: 23\n * Gradient Calls: 23","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"The objective has obtained a value that is very similar between the two runs, but the run with the analytical gradient is way faster. It is possible that the finite differences code can be improved, but generally the optimization will be slowed down by all the function evaluations required to do the central finite differences calculations.","category":"page"},{"location":"user/tipsandtricks/#Separating-time-spent-in-Optim's-code-and-user-provided-functions","page":"Tips and tricks","title":"Separating time spent in Optim's code and user provided functions","text":"","category":"section"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Consider the Rosenbrock problem.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"using Optim, OptimTestProblems\nprob = UnconstrainedProblems.examples[\"Rosenbrock\"];","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Say we optimize this function, and look at the total run time of optimize using the Newton Trust Region method, and we are surprised that it takes a long time to run. We then wonder if time is spent in Optim's own code (solving the sub-problem for example) or in evaluating the objective, gradient or hessian that we provided. Then it can be very useful to use the TimerOutputs.jl package. This package allows us to run an over-all timer for optimize, and add individual timers for f, g!, and h!. Consider the example below, that is due to the author of the package (Kristoffer Carlsson).","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"using TimerOutputs\nconst to = TimerOutput()\n\nf(x ) = @timeit to \"f\" prob.f(x)\ng!(x, g) = @timeit to \"g!\" prob.g!(x, g)\nh!(x, h) = @timeit to \"h!\" prob.h!(x, h)\n\nbegin\nreset_timer!(to)\n@timeit to \"Trust Region\" begin\n res = Optim.optimize(f, g!, h!, prob.initial_x, NewtonTrustRegion())\nend\nshow(to; allocations = false)\nend","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"We see that the time is actually not spent in our provided functions, but most of the time is spent in the code for the trust region method.","category":"page"},{"location":"user/tipsandtricks/#Early-stopping","page":"Tips and tricks","title":"Early stopping","text":"","category":"section"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"Sometimes it might be of interest to stop the optimizer early. The simplest way to do this is to set the iterations keyword in Optim.Options to some number. This will prevent the iteration counter exceeding some limit, with the standard value being 1000. Alternatively, it is possible to put a soft limit on the run time of the optimization procedure by setting the time_limit keyword in the Optim.Options constructor.","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"using Optim, OptimTestProblems\nproblem = UnconstrainedProblems.examples[\"Rosenbrock\"]\n\nf = problem.f\ninitial_x = problem.initial_x\n\nfunction slow(x)\n sleep(0.1)\n f(x)\nend\n\nstart_time = time()\n\noptimize(slow, zeros(2), NelderMead(), Optim.Options(time_limit = 3.0))","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"This will stop after about three seconds. If it is more important that we stop before the limit is reached, it is possible to use a callback with a simple model for predicting how much time will have passed when the next iteration is over. Consider the following code","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"using Optim, OptimTestProblems\nproblem = UnconstrainedProblems.examples[\"Rosenbrock\"]\n\nf = problem.f\ninitial_x = problem.initial_x\n\nfunction very_slow(x)\n sleep(.5)\n f(x)\nend\n\nstart_time = time()\ntime_to_setup = zeros(1)\nfunction advanced_time_control(x)\n println(\" * Iteration: \", x.iteration)\n so_far = time()-start_time\n println(\" * Time so far: \", so_far)\n if x.iteration == 0\n time_to_setup .= time()-start_time\n else\n expected_next_time = so_far + (time()-start_time-time_to_setup[1])/(x.iteration)\n println(\" * Next iteration ≈ \", expected_next_time)\n println()\n return expected_next_time < 13 ? false : true\n end\n println()\n false\nend\noptimize(very_slow, zeros(2), NelderMead(), Optim.Options(callback = advanced_time_control))","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"It will try to predict the elapsed time after the next iteration is over, and stop now if it is expected to exceed the limit of 13 seconds. Running it, we get something like the following output","category":"page"},{"location":"user/tipsandtricks/","page":"Tips and tricks","title":"Tips and tricks","text":"julia> optimize(very_slow, zeros(2), NelderMead(), Optim.Options(callback = advanced_time_control))\n * Iteration: 0\n * Time so far: 2.219298839569092\n\n * Iteration: 1\n * Time so far: 3.4006409645080566\n * Next iteration ≈ 4.5429909229278564\n\n * Iteration: 2\n * Time so far: 4.403923988342285\n * Next iteration ≈ 5.476739525794983\n\n * Iteration: 3\n * Time so far: 5.407265901565552\n * Next iteration ≈ 6.4569235642751055\n\n * Iteration: 4\n * Time so far: 5.909044027328491\n * Next iteration ≈ 6.821732044219971\n\n * Iteration: 5\n * Time so far: 6.912338972091675\n * Next iteration ≈ 7.843148183822632\n\n * Iteration: 6\n * Time so far: 7.9156060218811035\n * Next iteration ≈ 8.85849153995514\n\n * Iteration: 7\n * Time so far: 8.918903827667236\n * Next iteration ≈ 9.870419979095459\n\n * Iteration: 8\n * Time so far: 9.922197818756104\n * Next iteration ≈ 10.880185931921005\n\n * Iteration: 9\n * Time so far: 10.925468921661377\n * Next iteration ≈ 11.888488478130764\n\n * Iteration: 10\n * Time so far: 11.92870283126831\n * Next iteration ≈ 12.895747828483582\n\n * Iteration: 11\n * Time so far: 12.932114839553833\n * Next iteration ≈ 13.902462200684981\n\nResults of Optimization Algorithm\n * Algorithm: Nelder-Mead\n * Starting Point: [0.0,0.0]\n * Minimizer: [0.23359374999999996,0.042187499999999996, ...]\n * Minimum: 6.291677e-01\n * Iterations: 11\n * Convergence: false\n * √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: false\n * Reached Maximum Number of Iterations: false\n * Objective Function Calls: 24","category":"page"},{"location":"algo/goldensection/#Golden-Section","page":"Golden Section","title":"Golden Section","text":"","category":"section"},{"location":"algo/goldensection/#Constructor","page":"Golden Section","title":"Constructor","text":"","category":"section"},{"location":"algo/goldensection/#Description","page":"Golden Section","title":"Description","text":"","category":"section"},{"location":"algo/goldensection/#Example","page":"Golden Section","title":"Example","text":"","category":"section"},{"location":"algo/goldensection/#References","page":"Golden Section","title":"References","text":"","category":"section"},{"location":"algo/simulated_annealing/#Simulated-Annealing","page":"Simulated Annealing","title":"Simulated Annealing","text":"","category":"section"},{"location":"algo/simulated_annealing/#Constructor","page":"Simulated Annealing","title":"Constructor","text":"","category":"section"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"SimulatedAnnealing(; neighbor = default_neighbor!,\n T = default_temperature,\n p = kirkpatrick)","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"The constructor takes three keywords:","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"neighbor = a!(x_proposed, x_current), a mutating function of the current x, and the proposed x\nT = b(iteration), a function of the current iteration that returns a temperature\np = c(f_proposal, f_current, T), a function of the current temperature, current function value and proposed function value that returns an acceptance probability","category":"page"},{"location":"algo/simulated_annealing/#Description","page":"Simulated Annealing","title":"Description","text":"","category":"section"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"Simulated Annealing is a derivative free method for optimization. It is based on the Metropolis-Hastings algorithm that was originally used to generate samples from a thermodynamics system, and is often used to generate draws from a posterior when doing Bayesian inference. As such, it is a probabilistic method for finding the minimum of a function, often over a quite large domains. For the historical reasons given above, the algorithm uses terms such as cooling, temperature, and acceptance probabilities.","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"As the constructor shows, a simulated annealing implementation is characterized by a temperature, a neighbor function, and an acceptance probability. The temperature controls how volatile the changes in minimizer candidates are allowed to be, as it enters the acceptance probability. For example, the original Kirkpatrick et al. acceptance probability function can be written as follows","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"p(f_proposal, f_current, T) = exp(-(f_proposal - f_current)/T)","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"A high temperature makes it more likely that a draw is accepted, by pushing acceptance probability to 1. As in the Metropolis-Hastings algorithm, we always accept a smaller function value, but we also sometimes accept a larger value. As the temperature decreases, we're more and more likely to only accept candidate x's that lowers the function value. To obtain a new f_proposal, we need a neighbor function. A simple neighbor function adds a standard normal draw to each dimension of x","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"function neighbor!(x_proposal::Array, x::Array)\n for i in eachindex(x)\n x_proposal[i] = x[i]+randn()\n end\nend","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"As we see, it is not really possible to disentangle the role of the different components of the algorithm. For example, both the functional form of the acceptance function, the temperature and (indirectly) the neighbor function determine if the next draw of x is accepted or not.","category":"page"},{"location":"algo/simulated_annealing/","page":"Simulated Annealing","title":"Simulated Annealing","text":"The current implementation of Simulated Annealing is very rough. It lacks quite a few features which are normally part of a proper SA implementation. A better implementation is under way, see this issue.","category":"page"},{"location":"algo/simulated_annealing/#Example","page":"Simulated Annealing","title":"Example","text":"","category":"section"},{"location":"algo/simulated_annealing/#References","page":"Simulated Annealing","title":"References","text":"","category":"section"},{"location":"LICENSE/","page":"License","title":"License","text":"Optim.jl is licensed under the MIT License:","category":"page"},{"location":"LICENSE/","page":"License","title":"License","text":"Copyright (c) 2012: John Myles White, Tim Holy, and other contributors. Copyright (c) 2016: Patrick Kofod Mogensen, John Myles White, Tim Holy, and other contributors. Copyright (c) 2017: Patrick Kofod Mogensen, Asbjørn Nilsen Riseth, John Myles White, Tim Holy, and other contributors.Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","category":"page"},{"location":"algo/precondition/#Preconditioning","page":"Preconditioners","title":"Preconditioning","text":"","category":"section"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"The GradientDescent, ConjugateGradient and LBFGS methods support preconditioning. A preconditioner can be thought of as a change of coordinates under which the Hessian is better conditioned. With a good preconditioner substantially improved convergence is possible.","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"A preconditioner Pcan be of any type as long as the following two methods are implemented:","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"A_ldiv_B!(pgr, P, gr) : apply P to a vector gr and store in pgr (intuitively, pgr = P \\ gr)\ndot(x, P, y) : the inner product induced by P (intuitively, dot(x, P * y))","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"Precisely what these operations mean, depends on how P is stored. Commonly, we store a matrix P which approximates the Hessian in some vague sense. In this case,","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"A_ldiv_B!(pgr, P, gr) = copyto!(pgr, P \\ A)\ndot(x, P, y) = dot(x, P * y)","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"Finally, it is possible to update the preconditioner as the state variable x changes. This is done through precondprep! which is passed to the optimizers as kw-argument, e.g.,","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":" method=ConjugateGradient(P = precond(100), precondprep! = precond(100))","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"though in this case it would always return the same matrix. (See fminbox.jl for a more natural example.)","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"Apart from preconditioning with matrices, Optim.jl provides a type InverseDiagonal, which represents a diagonal matrix by its inverse elements.","category":"page"},{"location":"algo/precondition/#Example","page":"Preconditioners","title":"Example","text":"","category":"section"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"Below, we see an example where a function is minimized without and with a preconditioner applied.","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"using ForwardDiff, Optim, SparseArrays\ninitial_x = zeros(100)\nplap(U; n = length(U)) = (n-1)*sum((0.1 .+ diff(U).^2).^2 ) - sum(U) / (n-1)\nplap1(x) = ForwardDiff.gradient(plap,x)\nprecond(n) = spdiagm(-1 => -ones(n-1), 0 => 2ones(n), 1 => -ones(n-1)) * (n+1)\nf(x) = plap([0; x; 0])\ng!(G, x) = copyto!(G, (plap1([0; x; 0]))[2:end-1])\nresult = Optim.optimize(f, g!, initial_x, method = ConjugateGradient(P = nothing))\nresult = Optim.optimize(f, g!, initial_x, method = ConjugateGradient(P = precond(100)))","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"The former optimize call converges at a slower rate than the latter. Looking at a plot of the 2D version of the function shows the problem.","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"(Image: plap)","category":"page"},{"location":"algo/precondition/","page":"Preconditioners","title":"Preconditioners","text":"The contours are shaped like ellipsoids, but we would rather want them to be circles. Using the preconditioner effectively changes the coordinates such that the contours becomes less ellipsoid-like. Benchmarking shows that using preconditioning provides an approximate speed-up factor of 15 in this 100 dimensional case.","category":"page"},{"location":"algo/precondition/#References","page":"Preconditioners","title":"References","text":"","category":"section"},{"location":"algo/adam_adamax/#Adam-and-AdaMax","page":"Adam and AdaMax","title":"Adam and AdaMax","text":"","category":"section"},{"location":"algo/adam_adamax/","page":"Adam and AdaMax","title":"Adam and AdaMax","text":"This page contains information about Adam and AdaMax. Notice, that these algorithms do not use line search algorithms, so some tuning of alpha may be necessary to obtain sufficiently fast convergence on your specific problem.","category":"page"},{"location":"algo/adam_adamax/#Constructors","page":"Adam and AdaMax","title":"Constructors","text":"","category":"section"},{"location":"algo/adam_adamax/","page":"Adam and AdaMax","title":"Adam and AdaMax","text":"Adam(; alpha=0.0001,\n beta_mean=0.9,\n beta_var=0.999,\n epsilon=1e-8)","category":"page"},{"location":"algo/adam_adamax/","page":"Adam and AdaMax","title":"Adam and AdaMax","text":"where alpha is the step length or learning parameter. beta_mean and beta_var are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more. epsilon should rarely be changed, and just exists to avoid a division by 0.","category":"page"},{"location":"algo/adam_adamax/","page":"Adam and AdaMax","title":"Adam and AdaMax","text":"AdaMax(; alpha=0.002,\n beta_mean=0.9,\n beta_var=0.999,\n epsilon=1e-8)","category":"page"},{"location":"algo/adam_adamax/","page":"Adam and AdaMax","title":"Adam and AdaMax","text":"where alpha is the step length or learning parameter. beta_mean and beta_var are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more.","category":"page"},{"location":"algo/adam_adamax/#References","page":"Adam and AdaMax","title":"References","text":"","category":"section"},{"location":"algo/adam_adamax/","page":"Adam and AdaMax","title":"Adam and AdaMax","text":"Kingma, Diederik P., and Jimmy Ba. \"Adam: A method for stochastic optimization.\" arXiv preprint arXiv:1412.6980 (2014).","category":"page"},{"location":"#Optim.jl","page":"Home","title":"Optim.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Univariate and multivariate optimization in Julia.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Optim.jl is part of the JuliaNLSolvers family.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Source Build Status Social References to cite\n(Image: Source) (Image: Build Status) (Image: ) (Image: JOSS)\n(Image: Codecov branch) (Image: Build Status) (Image: DOI)","category":"page"},{"location":"#What","page":"Home","title":"What","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Optim is a Julia package for optimizing functions of various kinds. While there is some support for box constrained and Riemannian optimization, most of the solvers try to find an x that minimizes a function f(x) without any constraints. Thus, the main focus is on unconstrained optimization. The provided solvers, under certain conditions, will converge to a local minimum. In the case where a global minimum is desired we supply some methods such as (bounded) simulated annealing and particle swarm. For a dedicated package for global optimization techniques, see e.g. BlackBoxOptim.","category":"page"},{"location":"#Why","page":"Home","title":"Why","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"There are many solvers available from both free and commercial sources, and many of them are accessible from Julia. Few of them are written in Julia. Performance-wise this is rarely a problem, as they are often written in either Fortran or C. However, solvers written directly in Julia does come with some advantages.","category":"page"},{"location":"","page":"Home","title":"Home","text":"When writing Julia software (packages) that require something to be optimized, the programmer can either choose to write their own optimization routine, or use one of the many available solvers. For example, this could be something from the NLopt suite. This means adding a dependency which is not written in Julia, and more assumptions have to be made as to the environment the user is in. Does the user have the proper compilers? Is it possible to use GPL'ed code in the project? Optim is released under the MIT license, and installation is a simple Pkg.add, so it really doesn't get much freer, easier, and lightweight than that.","category":"page"},{"location":"","page":"Home","title":"Home","text":"It is also true, that using a solver written in C or Fortran makes it impossible to leverage one of the main benefits of Julia: multiple dispatch. Since Optim is entirely written in Julia, we can currently use the dispatch system to ease the use of custom preconditioners. A planned feature along these lines is to allow for user controlled choice of solvers for various steps in the algorithm, entirely based on dispatch, and not predefined possibilities chosen by the developers of Optim.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Being a Julia package also means that Optim has access to the automatic differentiation features through the packages in JuliaDiff.","category":"page"},{"location":"#How","page":"Home","title":"How","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The package is a registered package, and can be installed with Pkg.add.","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using Pkg; Pkg.add(\"Optim\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"or through the pkg REPL mode by typing","category":"page"},{"location":"","page":"Home","title":"Home","text":"] add Optim","category":"page"},{"location":"algo/ipnewton/#Interior-point-Newton-method","page":"Interior point Newton","title":"Interior point Newton method","text":"","category":"section"},{"location":"algo/ipnewton/","page":"Interior point Newton","title":"Interior point Newton","text":"IPNewton","category":"page"},{"location":"algo/ipnewton/#Optim.IPNewton","page":"Interior point Newton","title":"Optim.IPNewton","text":"Interior-point Newton\n\nConstructor\n\nIPNewton(; linesearch::Function = Optim.backtrack_constrained_grad,\n μ0::Union{Symbol,Number} = :auto,\n show_linesearch::Bool = false)\n\nThe initial barrier penalty coefficient μ0 can be chosen as a number, or set to :auto to let the algorithm decide its value, see initialize_μ_λ!.\n\nNote: For constrained optimization problems, we recommend always enabling allow_f_increases and successive_f_tol in the options passed to optimize. The default is set to Optim.Options(allow_f_increases = true, successive_f_tol = 2).\n\nAs of February 2018, the line search algorithm is specialised for constrained interior-point methods. In future we hope to support more algorithms from LineSearches.jl.\n\nDescription\n\nThe IPNewton method implements an interior-point primal-dual Newton algorithm for solving nonlinear, constrained optimization problems. See Nocedal and Wright (Ch. 19, 2006) for a discussion of interior-point methods for constrained optimization.\n\nReferences\n\nThe algorithm was originally written by Tim Holy (@timholy, tim.holy@gmail.com).\n\nJ Nocedal, SJ Wright (2006), Numerical optimization, second edition. Springer.\nA Wächter, LT Biegler (2006), On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming 106 (1), 25-57.\n\n\n\n\n\n","category":"type"},{"location":"algo/ipnewton/#Examples","page":"Interior point Newton","title":"Examples","text":"","category":"section"},{"location":"algo/ipnewton/","page":"Interior point Newton","title":"Interior point Newton","text":"Nonlinear constrained optimization in Optim","category":"page"}]
}