release-0.12/search/search_index.json

{
    "docs": [
        {
            "location": "/", 
            "text": "Optim.jl\n\n\n\n\nWhat\n\n\nOptim is a Julia package for optimizing functions of various kinds. While there is some support for box constrained and Riemannian optimization, most of the solvers try to find an $x$ that minimizes a function $f(x)$ without any constraints. Thus, the main focus is on unconstrained optimization. The provided solvers, under certain conditions, will converge to a local minimum. In the case where a global minimum is desired, global optimization techniques should be employed instead (see e.g. \nBlackBoxOptim\n).\n\n\n\n\nWhy\n\n\nThere are many solvers available from both free and commercial sources, and many of them are accessible from Julia. Few of them are written in Julia. Performance-wise this is rarely a problem, as they are often written in either Fortran or C. However, solvers written directly in Julia does come with some advantages.\n\n\nWhen writing Julia software (packages) that require something to be optimized, the programmer can either choose to write their own optimization routine, or use one of the many available solvers. For example, this could be something from the \nNLOpt\n suite. This means adding a dependency which is not written in Julia, and more assumptions have to be made as to the environment the user is in. Does the user have the proper compilers? Is it possible to use GPL'ed code in the project? Optim is released under the MIT license, and installation is a simple \nPkg.add\n, so it really doesn't get much freer, easier, and lightweight than that.\n\n\nIt is also true, that using a solver written in C or Fortran makes it impossible to leverage one of the main benefits of Julia: multiple dispatch. Since Optim is entirely written in Julia, we can currently use the dispatch system to ease the use of custom preconditioners. A planned feature along these lines is to allow for user controlled choice of solvers for various steps in the algorithm, entirely based on dispatch, and not predefined possibilities chosen by the developers of Optim.\n\n\nBeing a Julia package also means that Optim has access to the automatic differentiation features through the packages in \nJuliaDiff\n.\n\n\n\n\nHow\n\n\nOptim is registered in \nMETADATA.jl\n. This means that all you need to do to install Optim, is to run\n\n\nPkg\n.\nadd\n(\nOptim\n)\n\n\n\n\n\n\n\n\nBut...\n\n\nOptim is a work in progress. There are still some rough edges to be sanded down, and features we want to implement. There are also planned breaking changes that are good to be aware of. Please see the section on Planned Changes.", 
            "title": "Home"
        }, 
        {
            "location": "/#optimjl", 
            "text": "", 
            "title": "Optim.jl"
        }, 
        {
            "location": "/#what", 
            "text": "Optim is a Julia package for optimizing functions of various kinds. While there is some support for box constrained and Riemannian optimization, most of the solvers try to find an $x$ that minimizes a function $f(x)$ without any constraints. Thus, the main focus is on unconstrained optimization. The provided solvers, under certain conditions, will converge to a local minimum. In the case where a global minimum is desired, global optimization techniques should be employed instead (see e.g.  BlackBoxOptim ).", 
            "title": "What"
        }, 
        {
            "location": "/#why", 
            "text": "There are many solvers available from both free and commercial sources, and many of them are accessible from Julia. Few of them are written in Julia. Performance-wise this is rarely a problem, as they are often written in either Fortran or C. However, solvers written directly in Julia does come with some advantages.  When writing Julia software (packages) that require something to be optimized, the programmer can either choose to write their own optimization routine, or use one of the many available solvers. For example, this could be something from the  NLOpt  suite. This means adding a dependency which is not written in Julia, and more assumptions have to be made as to the environment the user is in. Does the user have the proper compilers? Is it possible to use GPL'ed code in the project? Optim is released under the MIT license, and installation is a simple  Pkg.add , so it really doesn't get much freer, easier, and lightweight than that.  It is also true, that using a solver written in C or Fortran makes it impossible to leverage one of the main benefits of Julia: multiple dispatch. Since Optim is entirely written in Julia, we can currently use the dispatch system to ease the use of custom preconditioners. A planned feature along these lines is to allow for user controlled choice of solvers for various steps in the algorithm, entirely based on dispatch, and not predefined possibilities chosen by the developers of Optim.  Being a Julia package also means that Optim has access to the automatic differentiation features through the packages in  JuliaDiff .", 
            "title": "Why"
        }, 
        {
            "location": "/#how", 
            "text": "Optim is registered in  METADATA.jl . This means that all you need to do to install Optim, is to run  Pkg . add ( Optim )", 
            "title": "How"
        }, 
        {
            "location": "/#but", 
            "text": "Optim is a work in progress. There are still some rough edges to be sanded down, and features we want to implement. There are also planned breaking changes that are good to be aware of. Please see the section on Planned Changes.", 
            "title": "But..."
        }, 
        {
            "location": "/user/minimization/", 
            "text": "Minimizing a multivariate function\n\n\nTo show how the Optim package can be used, we implement the \nRosenbrock function\n, a classic problem in numerical optimization. We'll assume that you've already installed the Optim package using Julia's package manager. First, we load Optim and define the Rosenbrock function:\n\n\nusing\n \nOptim\n\n\nf\n(\nx\n)\n \n=\n \n(\n1.0\n \n-\n \nx\n[\n1\n])\n^\n2\n \n+\n \n100.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n^\n2\n\n\n\n\n\n\nOnce we've defined this function, we can find the minimum of the Rosenbrock function using any of our favorite optimization algorithms. With a function defined, we just specify an initial point \nx\n and run:\n\n\noptimize\n(\nf\n,\n \n[\n0.0\n,\n \n0.0\n])\n\n\n\n\n\n\n!!! note\n    It is important to pass \ninitial_x\n as an array. If your problem is one-dimensional, you have to wrap it in an array. An easy way to do so is to write \noptimize(x-\nf(first(x)), [initial_x], ...)\n\n\nOptim will default to using the Nelder-Mead method in this case, as we did not provide a gradient. This can also be explicitly specified using:\n\n\noptimize\n(\nf\n,\n \n[\n0.0\n,\n \n0.0\n],\n \nNelderMead\n())\n\n\n\n\n\n\nOther solvers are available. Below, we use L-BFGS, a quasi-Newton method that requires a gradient. If we pass \nf\n alone, Optim will construct an approximate gradient for us using central finite differencing:\n\n\noptimize\n(\nf\n,\n \n[\n0.0\n,\n \n0.0\n],\n \nLBFGS\n())\n\n\n\n\n\n\nFor better performance and greater precision, you can pass your own gradient function. For the Rosenbrock example, the analytical gradient can be shown to be:\n\n\nfunction\n \ng!\n(\nstorage\n,\n \nx\n)\n\n\nstorage\n[\n1\n]\n \n=\n \n-\n2.0\n \n*\n \n(\n1.0\n \n-\n \nx\n[\n1\n])\n \n-\n \n400.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n \n*\n \nx\n[\n1\n]\n\n\nstorage\n[\n2\n]\n \n=\n \n200.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n\n\nend\n\n\n\n\n\n\nNote that the functions we're using to calculate the gradient (and later the Hessian \nh!\n) of the Rosenbrock function mutate a fixed-sized storage array, which is passed as an additional argument called \nstorage\n. By mutating a single array over many iterations, this style of function definition removes the sometimes considerable costs associated with allocating a new array during each call to the \ng!\n or \nh!\n functions. You can use \nOptim\n without manually defining a gradient or Hessian function, but if you do define these functions, they must take these two arguments in this order. Returning to our optimization problem, you simply pass \ng!\n together with \nf\n from before to use the gradient:\n\n\noptimize\n(\nf\n,\n \ng!\n,\n \n[\n0.0\n,\n \n0.0\n],\n \nLBFGS\n())\n\n\n\n\n\n\nFor some methods, like simulated annealing, the gradient will be ignored:\n\n\noptimize\n(\nf\n,\n \ng!\n,\n \n[\n0.0\n,\n \n0.0\n],\n \nSimulatedAnnealing\n())\n\n\n\n\n\n\nIn addition to providing gradients, you can provide a Hessian function \nh!\n as well. In our current case this is:\n\n\nfunction\n \nh!\n(\nstorage\n,\n \nx\n)\n\n    \nstorage\n[\n1\n,\n \n1\n]\n \n=\n \n2.0\n \n-\n \n400.0\n \n*\n \nx\n[\n2\n]\n \n+\n \n1200.0\n \n*\n \nx\n[\n1\n]\n^\n2\n\n    \nstorage\n[\n1\n,\n \n2\n]\n \n=\n \n-\n400.0\n \n*\n \nx\n[\n1\n]\n\n    \nstorage\n[\n2\n,\n \n1\n]\n \n=\n \n-\n400.0\n \n*\n \nx\n[\n1\n]\n\n    \nstorage\n[\n2\n,\n \n2\n]\n \n=\n \n200.0\n\n\nend\n\n\n\n\n\n\nNow we can use Newton's method for optimization by running:\n\n\noptimize\n(\nf\n,\n \ng!\n,\n \nh!\n,\n \n[\n0.0\n,\n \n0.0\n])\n\n\n\n\n\n\nWhich defaults to \nNewton()\n since a Hessian was provided. Like gradients, the Hessian function will be ignored if you use a method that does not require it:\n\n\noptimize\n(\nf\n,\n \ng!\n,\n \nh!\n,\n \n[\n0.0\n,\n \n0.0\n],\n \nLBFGS\n())\n\n\n\n\n\n\nNote that Optim will not generate approximate Hessians using finite differencing because of the potentially low accuracy of approximations to the Hessians. Other than Newton's method, none of the algorithms provided by the Optim package employ exact Hessians.\n\n\n\n\nBox minimization\n\n\nA primal interior-point algorithm for simple \"box\" constraints (lower and upper bounds) is also available. Reusing our Rosenbrock example from above, boxed minimization is performed as follows:\n\n\nlower\n \n=\n \n[\n1.25\n,\n \n-\n2.1\n]\n\n\nupper\n \n=\n \n[\nInf\n,\n \nInf\n]\n\n\ninitial_x\n \n=\n \n[\n2.0\n,\n \n2.0\n]\n\n\nod\n \n=\n \nOnceDifferentiable\n(\nf\n,\n \ng!\n,\n \ninitial_x\n)\n\n\nresults\n \n=\n \noptimize\n(\nod\n,\n \ninitial_x\n,\n \nlower\n,\n \nupper\n,\n \nFminbox\n{\nGradientDescent\n}())\n\n\n\n\n\n\nThis performs optimization with a barrier penalty, successively scaling down the barrier coefficient and using the chosen \noptimizer\n (\nGradientDescent\n above) for convergence at each step. Notice that the \nOptimizer\n type, not an instance should be passed (\nGradientDescent\n, not \nGradientDescent()\n).\n\n\nThis algorithm uses diagonal preconditioning to improve the accuracy, and hence is a good example of how to use \nConjugateGradient\n or \nLBFGS\n with preconditioning. Other methods will currently not use preconditioning. Only the box constraints are used. If you can analytically compute the diagonal of the Hessian of your objective function, you may want to consider writing your own preconditioner.\n\n\nThere are two iterations parameters: an outer iterations parameter used to control \nFminbox\n and an inner iterations parameter used to control the inner optimizer. For this reason, the options syntax is a bit different from the rest of the package. All parameters regarding the outer iterations are passed as keyword arguments, and options for the interior optimizer is passed as an \nOptim.Options\n type using the keyword \noptimizer_o\n.\n\n\nFor example, the following restricts the optimization to 2 major iterations\n\n\nod\n \n=\n \nOnceDifferentiable\n(\nf\n,\n \ng!\n,\n \ninitial_x\n)\n\n\nresults\n \n=\n \noptimize\n(\nod\n,\n \ninitial_x\n,\n \nlower\n,\n \nupper\n,\n \nFminbox\n{\nGradientDescent\n}();\n \niterations\n \n=\n \n2\n)\n\n\n\n\n\n\nIn contrast, the following sets the maximum number of iterations for each \nConjugateGradient\n optimization to 2\n\n\nod\n \n=\n \nOnceDifferentiable\n(\nf\n,\n \ng!\n,\n \ninitial_x\n)\n\n\nresults\n \n=\n \nOptim\n.\noptimize\n(\nod\n,\n \ninitial_x\n,\n \nlower\n,\n \nupper\n,\n \nFminbox\n{\nGradientDescent\n}();\n \noptimizer_o\n \n=\n \nOptim\n.\nOptions\n(\niterations\n \n=\n \n2\n))\n\n\n\n\n\n\n\n\nMinimizing a univariate function on a bounded interval\n\n\nMinimization of univariate functions without derivatives is available through the \noptimize\n interface:\n\n\n    \noptimize\n(\nf\n,\n \nlower\n,\n \nupper\n,\n \nmethod\n;\n \nkwargs\n...\n)\n\n\n\n\n\n\nNotice the lack of initial \nx\n. A specific example is the following quadratic function.\n\n\njulia\n \nf_univariate\n(\nx\n)\n \n=\n \n2\nx\n^\n2\n+\n3\nx\n+\n1\n\n\nf_univariate\n \n(\ngeneric\n \nfunction\n \nwith\n \n1\n \nmethod\n)\n\n\n\njulia\n \noptimize\n(\nf_univariate\n,\n \n-\n2.0\n,\n \n1.0\n)\n\n\nResults\n \nof\n \nOptimization\n \nAlgorithm\n\n \n*\n \nAlgorithm\n:\n \nBrent\ns\n \nMethod\n\n \n*\n \nSearch\n \nInterval\n:\n \n[\n-\n2.000000\n,\n \n1.000000\n]\n\n \n*\n \nMinimizer\n:\n \n-\n7.500000e-01\n\n \n*\n \nMinimum\n:\n \n-\n1.250000e-01\n\n \n*\n \nIterations\n:\n \n7\n\n \n*\n \nConvergence\n:\n \nmax\n(\n|\nx\n \n-\n \nx_upper\n|\n,\n \n|\nx\n \n-\n \nx_lower\n|\n)\n \n=\n \n2\n*\n(\n1.5e-08\n*|\nx\n|+\n2.2e-16\n)\n:\n \ntrue\n\n \n*\n \nObjective\n \nFunction\n \nCalls\n:\n \n8\n\n\n\n\n\n\nThe output shows that we provided an initial lower and upper bound, that there is a final minimizer and minimum, and that it used seven major iterations. Importantly, we also see that convergence was declared. The default method is Brent's method, which is one out of two available methods:\n\n\n\n\nBrent's method, the default (can be explicitly selected with \nBrent()\n).\n\n\nGolden section search, available with \nGoldenSection()\n.\n\n\n\n\nIf we want to manually specify this method, we use the usual syntax as for multivariate optimization.\n\n\n    \noptimize\n(\nf\n,\n \nlower\n,\n \nupper\n,\n \nBrent\n();\n \nkwargs\n...\n)\n\n    \noptimize\n(\nf\n,\n \nlower\n,\n \nupper\n,\n \nGoldenSection\n();\n \nkwargs\n...\n)\n\n\n\n\n\n\nKeywords are used to set options for this special type of optimization. In addition to the \niterations\n, \nstore_trace\n, \nshow_trace\n and \nextended_trace\n options, the following options are also available:\n\n\n\n\nrel_tol\n: The relative tolerance used for determining convergence. Defaults to \nsqrt(eps(T))\n.\n\n\nabs_tol\n: The absolute tolerance used for determining convergence. Defaults to \neps(T)\n.\n\n\n\n\n\n\nObtaining results\n\n\nAfter we have our results in \nres\n, we can use the API for getting optimization results. This consists of a collection of functions. They are not exported, so they have to be prefixed by \nOptim.\n. Say we do the following optimization:\n\n\nres\n \n=\n \noptimize\n(\nx\n-\ndot\n(\nx\n,[\n1\n \n0.\n \n0\n;\n \n0\n \n3\n \n0\n;\n \n0\n \n0\n \n1\n]\n*\nx\n),\n \nzeros\n(\n3\n))\n\n\n\n\n\n\nIf we can't remember what method we used, we simply use\n\n\nOptim\n.\nsummary\n(\nres\n)\n\n\n\n\n\n\nwhich will return \n\"Nelder Mead\"\n. A bit more useful information is the minimizer and minimum of the objective functions, which can be found using\n\n\njulia\n \nOptim\n.\nminimizer\n(\nres\n)\n\n\n3-element Array{Float64,1}:\n\n\n -0.499921\n\n\n -0.3333\n\n\n -1.49994\n\n\n\njulia\n \nOptim\n.\nminimum\n(\nres\n)\n\n\n -2.8333333205768865\n\n\n\n\n\n\n\n\nComplete list of functions\n\n\nA complete list of functions can be found below.\n\n\nDefined for all methods:\n\n\n\n\nsummary(res)\n\n\nminimizer(res)\n\n\nminimum(res)\n\n\niterations(res)\n\n\niteration_limit_reached(res)\n\n\ntrace(res)\n\n\nx_trace(res)\n\n\nf_trace(res)\n\n\nf_calls(res)\n\n\nconverged(res)\n\n\n\n\nDefined for univariate optimization:\n\n\n\n\nlower_bound(res)\n\n\nupper_bound(res)\n\n\nx_lower_trace(res)\n\n\nx_upper_trace(res)\n\n\nrel_tol(res)\n\n\nabs_tol(res)\n\n\n\n\nDefined for multivariate optimization:\n\n\n\n\ng_norm_trace(res)\n\n\ng_calls(res)\n\n\nx_converged(res)\n\n\nf_converged(res)\n\n\ng_converged(res)\n\n\ninitial_state(res)\n\n\n\n\n\n\nInput types\n\n\nMost users will input \nVector\n's as their \ninitial_x\n's, and get an \nOptim.minimizer(res)\n out that is also a vector. For zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays. The only restriction imposed by leaving the \nVector\n case is, that it is no longer possible to use finite difference approximations or autmatic differentiation. Second order methods (variants of Newton's method) do not support this more general input type.\n\n\n\n\nNotes on convergence flags and checks\n\n\nCurrently, it is possible to access a minimizer using \nOptim.minimizer(result)\n even if all convergence flags are \nfalse\n. This means that the user has to be a bit careful when using the output from the solvers. It is advised to include checks for convergence if the minimizer or minimum is used to carry out further calculations.\n\n\nA related note is that first and second order methods makes a convergence check on the gradient before entering the optimization loop. This is done to prevent line search errors if \ninitial_x\n is a stationary point. Notice, that this is only a first order check. If \ninitial_x\n is any type of stationary point, \ng_converged\n will be true. This includes local minima, saddle points, and local maxima. If \niterations\n is \n0\n and \ng_converged\n is \ntrue\n, the user needs to keep this point in mind.", 
            "title": "Minimizing a function"
        }, 
        {
            "location": "/user/minimization/#minimizing-a-multivariate-function", 
            "text": "To show how the Optim package can be used, we implement the  Rosenbrock function , a classic problem in numerical optimization. We'll assume that you've already installed the Optim package using Julia's package manager. First, we load Optim and define the Rosenbrock function:  using   Optim  f ( x )   =   ( 1.0   -   x [ 1 ]) ^ 2   +   100.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 ) ^ 2   Once we've defined this function, we can find the minimum of the Rosenbrock function using any of our favorite optimization algorithms. With a function defined, we just specify an initial point  x  and run:  optimize ( f ,   [ 0.0 ,   0.0 ])   !!! note\n    It is important to pass  initial_x  as an array. If your problem is one-dimensional, you have to wrap it in an array. An easy way to do so is to write  optimize(x- f(first(x)), [initial_x], ...)  Optim will default to using the Nelder-Mead method in this case, as we did not provide a gradient. This can also be explicitly specified using:  optimize ( f ,   [ 0.0 ,   0.0 ],   NelderMead ())   Other solvers are available. Below, we use L-BFGS, a quasi-Newton method that requires a gradient. If we pass  f  alone, Optim will construct an approximate gradient for us using central finite differencing:  optimize ( f ,   [ 0.0 ,   0.0 ],   LBFGS ())   For better performance and greater precision, you can pass your own gradient function. For the Rosenbrock example, the analytical gradient can be shown to be:  function   g! ( storage ,   x )  storage [ 1 ]   =   - 2.0   *   ( 1.0   -   x [ 1 ])   -   400.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 )   *   x [ 1 ]  storage [ 2 ]   =   200.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 )  end   Note that the functions we're using to calculate the gradient (and later the Hessian  h! ) of the Rosenbrock function mutate a fixed-sized storage array, which is passed as an additional argument called  storage . By mutating a single array over many iterations, this style of function definition removes the sometimes considerable costs associated with allocating a new array during each call to the  g!  or  h!  functions. You can use  Optim  without manually defining a gradient or Hessian function, but if you do define these functions, they must take these two arguments in this order. Returning to our optimization problem, you simply pass  g!  together with  f  from before to use the gradient:  optimize ( f ,   g! ,   [ 0.0 ,   0.0 ],   LBFGS ())   For some methods, like simulated annealing, the gradient will be ignored:  optimize ( f ,   g! ,   [ 0.0 ,   0.0 ],   SimulatedAnnealing ())   In addition to providing gradients, you can provide a Hessian function  h!  as well. In our current case this is:  function   h! ( storage ,   x ) \n     storage [ 1 ,   1 ]   =   2.0   -   400.0   *   x [ 2 ]   +   1200.0   *   x [ 1 ] ^ 2 \n     storage [ 1 ,   2 ]   =   - 400.0   *   x [ 1 ] \n     storage [ 2 ,   1 ]   =   - 400.0   *   x [ 1 ] \n     storage [ 2 ,   2 ]   =   200.0  end   Now we can use Newton's method for optimization by running:  optimize ( f ,   g! ,   h! ,   [ 0.0 ,   0.0 ])   Which defaults to  Newton()  since a Hessian was provided. Like gradients, the Hessian function will be ignored if you use a method that does not require it:  optimize ( f ,   g! ,   h! ,   [ 0.0 ,   0.0 ],   LBFGS ())   Note that Optim will not generate approximate Hessians using finite differencing because of the potentially low accuracy of approximations to the Hessians. Other than Newton's method, none of the algorithms provided by the Optim package employ exact Hessians.", 
            "title": "Minimizing a multivariate function"
        }, 
        {
            "location": "/user/minimization/#box-minimization", 
            "text": "A primal interior-point algorithm for simple \"box\" constraints (lower and upper bounds) is also available. Reusing our Rosenbrock example from above, boxed minimization is performed as follows:  lower   =   [ 1.25 ,   - 2.1 ]  upper   =   [ Inf ,   Inf ]  initial_x   =   [ 2.0 ,   2.0 ]  od   =   OnceDifferentiable ( f ,   g! ,   initial_x )  results   =   optimize ( od ,   initial_x ,   lower ,   upper ,   Fminbox { GradientDescent }())   This performs optimization with a barrier penalty, successively scaling down the barrier coefficient and using the chosen  optimizer  ( GradientDescent  above) for convergence at each step. Notice that the  Optimizer  type, not an instance should be passed ( GradientDescent , not  GradientDescent() ).  This algorithm uses diagonal preconditioning to improve the accuracy, and hence is a good example of how to use  ConjugateGradient  or  LBFGS  with preconditioning. Other methods will currently not use preconditioning. Only the box constraints are used. If you can analytically compute the diagonal of the Hessian of your objective function, you may want to consider writing your own preconditioner.  There are two iterations parameters: an outer iterations parameter used to control  Fminbox  and an inner iterations parameter used to control the inner optimizer. For this reason, the options syntax is a bit different from the rest of the package. All parameters regarding the outer iterations are passed as keyword arguments, and options for the interior optimizer is passed as an  Optim.Options  type using the keyword  optimizer_o .  For example, the following restricts the optimization to 2 major iterations  od   =   OnceDifferentiable ( f ,   g! ,   initial_x )  results   =   optimize ( od ,   initial_x ,   lower ,   upper ,   Fminbox { GradientDescent }();   iterations   =   2 )   In contrast, the following sets the maximum number of iterations for each  ConjugateGradient  optimization to 2  od   =   OnceDifferentiable ( f ,   g! ,   initial_x )  results   =   Optim . optimize ( od ,   initial_x ,   lower ,   upper ,   Fminbox { GradientDescent }();   optimizer_o   =   Optim . Options ( iterations   =   2 ))", 
            "title": "Box minimization"
        }, 
        {
            "location": "/user/minimization/#minimizing-a-univariate-function-on-a-bounded-interval", 
            "text": "Minimization of univariate functions without derivatives is available through the  optimize  interface:       optimize ( f ,   lower ,   upper ,   method ;   kwargs ... )   Notice the lack of initial  x . A specific example is the following quadratic function.  julia   f_univariate ( x )   =   2 x ^ 2 + 3 x + 1  f_univariate   ( generic   function   with   1   method )  julia   optimize ( f_univariate ,   - 2.0 ,   1.0 )  Results   of   Optimization   Algorithm \n  *   Algorithm :   Brent s   Method \n  *   Search   Interval :   [ - 2.000000 ,   1.000000 ] \n  *   Minimizer :   - 7.500000e-01 \n  *   Minimum :   - 1.250000e-01 \n  *   Iterations :   7 \n  *   Convergence :   max ( | x   -   x_upper | ,   | x   -   x_lower | )   =   2 * ( 1.5e-08 *| x |+ 2.2e-16 ) :   true \n  *   Objective   Function   Calls :   8   The output shows that we provided an initial lower and upper bound, that there is a final minimizer and minimum, and that it used seven major iterations. Importantly, we also see that convergence was declared. The default method is Brent's method, which is one out of two available methods:   Brent's method, the default (can be explicitly selected with  Brent() ).  Golden section search, available with  GoldenSection() .   If we want to manually specify this method, we use the usual syntax as for multivariate optimization.       optimize ( f ,   lower ,   upper ,   Brent ();   kwargs ... ) \n     optimize ( f ,   lower ,   upper ,   GoldenSection ();   kwargs ... )   Keywords are used to set options for this special type of optimization. In addition to the  iterations ,  store_trace ,  show_trace  and  extended_trace  options, the following options are also available:   rel_tol : The relative tolerance used for determining convergence. Defaults to  sqrt(eps(T)) .  abs_tol : The absolute tolerance used for determining convergence. Defaults to  eps(T) .", 
            "title": "Minimizing a univariate function on a bounded interval"
        }, 
        {
            "location": "/user/minimization/#obtaining-results", 
            "text": "After we have our results in  res , we can use the API for getting optimization results. This consists of a collection of functions. They are not exported, so they have to be prefixed by  Optim. . Say we do the following optimization:  res   =   optimize ( x - dot ( x ,[ 1   0.   0 ;   0   3   0 ;   0   0   1 ] * x ),   zeros ( 3 ))   If we can't remember what method we used, we simply use  Optim . summary ( res )   which will return  \"Nelder Mead\" . A bit more useful information is the minimizer and minimum of the objective functions, which can be found using  julia   Optim . minimizer ( res )  3-element Array{Float64,1}:   -0.499921   -0.3333   -1.49994  julia   Optim . minimum ( res )   -2.8333333205768865", 
            "title": "Obtaining results"
        }, 
        {
            "location": "/user/minimization/#complete-list-of-functions", 
            "text": "A complete list of functions can be found below.  Defined for all methods:   summary(res)  minimizer(res)  minimum(res)  iterations(res)  iteration_limit_reached(res)  trace(res)  x_trace(res)  f_trace(res)  f_calls(res)  converged(res)   Defined for univariate optimization:   lower_bound(res)  upper_bound(res)  x_lower_trace(res)  x_upper_trace(res)  rel_tol(res)  abs_tol(res)   Defined for multivariate optimization:   g_norm_trace(res)  g_calls(res)  x_converged(res)  f_converged(res)  g_converged(res)  initial_state(res)", 
            "title": "Complete list of functions"
        }, 
        {
            "location": "/user/minimization/#input-types", 
            "text": "Most users will input  Vector 's as their  initial_x 's, and get an  Optim.minimizer(res)  out that is also a vector. For zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays. The only restriction imposed by leaving the  Vector  case is, that it is no longer possible to use finite difference approximations or autmatic differentiation. Second order methods (variants of Newton's method) do not support this more general input type.", 
            "title": "Input types"
        }, 
        {
            "location": "/user/minimization/#notes-on-convergence-flags-and-checks", 
            "text": "Currently, it is possible to access a minimizer using  Optim.minimizer(result)  even if all convergence flags are  false . This means that the user has to be a bit careful when using the output from the solvers. It is advised to include checks for convergence if the minimizer or minimum is used to carry out further calculations.  A related note is that first and second order methods makes a convergence check on the gradient before entering the optimization loop. This is done to prevent line search errors if  initial_x  is a stationary point. Notice, that this is only a first order check. If  initial_x  is any type of stationary point,  g_converged  will be true. This includes local minima, saddle points, and local maxima. If  iterations  is  0  and  g_converged  is  true , the user needs to keep this point in mind.", 
            "title": "Notes on convergence flags and checks"
        }, 
        {
            "location": "/user/config/", 
            "text": "Configurable options\n\n\nThere are several options that simply take on some default values if the user doensn't supply anything else than a function (and gradient) and a starting point.\n\n\n\n\nSolver options\n\n\nThere quite a few different solvers available in Optim, and they are all listed below. Notice that the constructors are written without input here, but they generally take keywords to tweak the way they work. See the pages describing each solver for more detail.\n\n\nRequires only a function handle:\n\n\n\n\nNelderMead()\n\n\nSimulatedAnnealing()\n\n\n\n\nRequires a function and gradient (will be approximated if omitted):\n\n\n\n\nBFGS()\n\n\nLBFGS()\n\n\nConjugateGradient()\n\n\nGradientDescent()\n\n\nMomentumGradientDescent()\n\n\nAcceleratedGradientDescent()\n\n\n\n\nRequires a function, a gradient, and a Hessian (cannot be omitted):\n\n\n\n\nNewton()\n\n\nNewtonTrustRegion()\n\n\n\n\nBox constrained minimization:\n\n\n\n\nFminbox()\n\n\n\n\nSpecial methods for bounded univariate optimization:\n\n\n\n\nBrent()\n\n\nGoldenSection()\n\n\n\n\n\n\nGeneral Options\n\n\nIn addition to the solver, you can alter the behavior of the Optim package by using the following keywords:\n\n\n\n\nx_tol\n: What is the threshold for determining convergence in the input vector? Defaults to \n1e-32\n.\n\n\nf_tol\n: What is the threshold for determining convergence in the objective value? Defaults to \n1e-32\n.\n\n\ng_tol\n: What is the threshold for determining convergence in the gradient? Defaults to \n1e-8\n. For gradient free methods, this will control the main convergence tolerance, which is solver specific.\n\n\nf_calls_limit\n: A soft upper limit on the number of objective calls. Defaults to \n0\n (unlimited).\n\n\ng_calls_limit\n: A soft upper limit on the number of gradient calls. Defaults to \n0\n (unlimited).\n\n\nh_calls_limit\n: A soft upper limit on the number of Hessian calls. Defaults to \n0\n (unlimited).\n\n\nallow_f_increases\n: Allow steps that increase the objective value. Defaults to \nfalse\n. Note that, when setting this to \ntrue\n, the last iterate will be returned as the minimizer even if the objective increased.\n\n\niterations\n: How many iterations will run before the algorithm gives up? Defaults to \n1_000\n.\n\n\nstore_trace\n: Should a trace of the optimization algorithm's state be stored? Defaults to \nfalse\n.\n\n\nshow_trace\n: Should a trace of the optimization algorithm's state be shown on \nSTDOUT\n? Defaults to \nfalse\n.\n\n\nextended_trace\n: Save additional information. Solver dependent. Defaults to \nfalse\n.\n\n\nshow_every\n: Trace output is printed every \nshow_every\nth iteration.\n\n\ncallback\n: A function to be called during tracing. A return value of \ntrue\n stops the \noptimize\n call.\n\n\ntime_limit\n: A soft upper limit on the total run time. Defaults to \nNaN\n (unlimited).\n\n\n\n\nWe currently recommend the statically dispatched interface by using the \nOptim.Options\n constructor:\n\n\nres\n \n=\n \noptimize\n(\nf\n,\n \ng!\n,\n\n               \n[\n0.0\n,\n \n0.0\n],\n\n               \nGradientDescent\n(),\n\n               \nOptim\n.\nOptions\n(\ng_tol\n \n=\n \n1e-12\n,\n\n                             \niterations\n \n=\n \n10\n,\n\n                             \nstore_trace\n \n=\n \ntrue\n,\n\n                             \nshow_trace\n \n=\n \nfalse\n))\n\n\n\n\n\n\nAnother interface is also available, based directly on keywords:\n\n\nres\n \n=\n \noptimize\n(\nf\n,\n \ng!\n,\n\n               \n[\n0.0\n,\n \n0.0\n],\n\n               \nmethod\n \n=\n \nGradientDescent\n(),\n\n               \ng_tol\n \n=\n \n1e-12\n,\n\n               \niterations\n \n=\n \n10\n,\n\n               \nstore_trace\n \n=\n \ntrue\n,\n\n               \nshow_trace\n \n=\n \nfalse\n)\n\n\n\n\n\n\nNotice the need to specify the method using a keyword if this syntax is used. This approach might be deprecated in the future, and as a result we recommend writing code that has to maintained using the \nOptim.Options\n approach.", 
            "title": "Configurable Options"
        }, 
        {
            "location": "/user/config/#configurable-options", 
            "text": "There are several options that simply take on some default values if the user doensn't supply anything else than a function (and gradient) and a starting point.", 
            "title": "Configurable options"
        }, 
        {
            "location": "/user/config/#solver-options", 
            "text": "There quite a few different solvers available in Optim, and they are all listed below. Notice that the constructors are written without input here, but they generally take keywords to tweak the way they work. See the pages describing each solver for more detail.  Requires only a function handle:   NelderMead()  SimulatedAnnealing()   Requires a function and gradient (will be approximated if omitted):   BFGS()  LBFGS()  ConjugateGradient()  GradientDescent()  MomentumGradientDescent()  AcceleratedGradientDescent()   Requires a function, a gradient, and a Hessian (cannot be omitted):   Newton()  NewtonTrustRegion()   Box constrained minimization:   Fminbox()   Special methods for bounded univariate optimization:   Brent()  GoldenSection()", 
            "title": "Solver options"
        }, 
        {
            "location": "/user/config/#general-options", 
            "text": "In addition to the solver, you can alter the behavior of the Optim package by using the following keywords:   x_tol : What is the threshold for determining convergence in the input vector? Defaults to  1e-32 .  f_tol : What is the threshold for determining convergence in the objective value? Defaults to  1e-32 .  g_tol : What is the threshold for determining convergence in the gradient? Defaults to  1e-8 . For gradient free methods, this will control the main convergence tolerance, which is solver specific.  f_calls_limit : A soft upper limit on the number of objective calls. Defaults to  0  (unlimited).  g_calls_limit : A soft upper limit on the number of gradient calls. Defaults to  0  (unlimited).  h_calls_limit : A soft upper limit on the number of Hessian calls. Defaults to  0  (unlimited).  allow_f_increases : Allow steps that increase the objective value. Defaults to  false . Note that, when setting this to  true , the last iterate will be returned as the minimizer even if the objective increased.  iterations : How many iterations will run before the algorithm gives up? Defaults to  1_000 .  store_trace : Should a trace of the optimization algorithm's state be stored? Defaults to  false .  show_trace : Should a trace of the optimization algorithm's state be shown on  STDOUT ? Defaults to  false .  extended_trace : Save additional information. Solver dependent. Defaults to  false .  show_every : Trace output is printed every  show_every th iteration.  callback : A function to be called during tracing. A return value of  true  stops the  optimize  call.  time_limit : A soft upper limit on the total run time. Defaults to  NaN  (unlimited).   We currently recommend the statically dispatched interface by using the  Optim.Options  constructor:  res   =   optimize ( f ,   g! , \n                [ 0.0 ,   0.0 ], \n                GradientDescent (), \n                Optim . Options ( g_tol   =   1e-12 , \n                              iterations   =   10 , \n                              store_trace   =   true , \n                              show_trace   =   false ))   Another interface is also available, based directly on keywords:  res   =   optimize ( f ,   g! , \n                [ 0.0 ,   0.0 ], \n                method   =   GradientDescent (), \n                g_tol   =   1e-12 , \n                iterations   =   10 , \n                store_trace   =   true , \n                show_trace   =   false )   Notice the need to specify the method using a keyword if this syntax is used. This approach might be deprecated in the future, and as a result we recommend writing code that has to maintained using the  Optim.Options  approach.", 
            "title": "General Options"
        }, 
        {
            "location": "/user/tipsandtricks/", 
            "text": "Dealing with constant parameters\n\n\nIn many applications, there may be factors that are relevant to the function evaluations, but are fixed throughout the optimization. An obvious example is using data in a likelihood function, but it could also be parameters we wish to hold constant.\n\n\nConsider a squared error loss function that depends on some data \nx\n and \ny\n, and parameters \nbetas\n. As far as the solver is concerned, there should only be one input argument to the function we want to minimize, call it \nsqerror\n.\n\n\nThe problem is that we want to optimize a function \nsqerror\n that really depends on three inputs, and two of them are constant throught the optimization procedure. To do this, we need to define the variables \nx\n and \ny\n\n\nx\n \n=\n \n[\n1.0\n,\n \n2.0\n,\n \n3.0\n]\n\n\ny\n \n=\n \n1.0\n \n+\n \n2.0\n \n*\n \nx\n \n+\n \n[\n-\n0.3\n,\n \n0.3\n,\n \n-\n0.1\n]\n\n\n\n\n\n\nWe then simply define a function in three variables\n\n\nfunction\n \nsqerror\n(\nbetas\n,\n \nX\n,\n \nY\n)\n\n    \nerr\n \n=\n \n0.0\n\n    \nfor\n \ni\n \nin\n \n1\n:\nlength\n(\nX\n)\n\n        \npred_i\n \n=\n \nbetas\n[\n1\n]\n \n+\n \nbetas\n[\n2\n]\n \n*\n \nX\n[\ni\n]\n\n        \nerr\n \n+=\n \n(\nY\n[\ni\n]\n \n-\n \npred_i\n)\n^\n2\n\n    \nend\n\n    \nreturn\n \nerr\n\n\nend\n\n\n\n\n\n\nand then optimize the following anonymous function\n\n\nres\n \n=\n \noptimize\n(\nb\n \n-\n \nsqerror\n(\nb\n,\n \nx\n,\n \ny\n),\n \n[\n0.0\n,\n \n0.0\n])\n\n\n\n\n\n\nAlternatively, we can define a closure \nsqerror(betas)\n that is aware of the variables we just defined\n\n\nfunction\n \nsqerror\n(\nbetas\n)\n\n    \nerr\n \n=\n \n0.0\n\n    \nfor\n \ni\n \nin\n \n1\n:\nlength\n(\nx\n)\n\n        \npred_i\n \n=\n \nbetas\n[\n1\n]\n \n+\n \nbetas\n[\n2\n]\n \n*\n \nx\n[\ni\n]\n\n        \nerr\n \n+=\n \n(\ny\n[\ni\n]\n \n-\n \npred_i\n)\n^\n2\n\n    \nend\n\n    \nreturn\n \nerr\n\n\nend\n\n\n\n\n\n\nWe can then optimize the \nsqerror\n function just like any other function\n\n\nres\n \n=\n \noptimize\n(\nsqerror\n,\n \n[\n0.0\n,\n \n0.0\n])\n\n\n\n\n\n\n\n\nAvoid repeating computations\n\n\nSay you are optimizing a function\n\n\nf\n(\nx\n)\n \n=\n \nx\n[\n1\n]\n^\n2\n+\nx\n[\n2\n]\n^\n2\n\n\ng!\n(\nstorage\n,\n \nx\n)\n \n=\n \ncopy!\n(\nstorage\n,\n \n[\n2\nx\n[\n1\n],\n \n2\nx\n[\n2\n]])\n\n\n\n\n\n\nIn this situation, no calculations from \nf\n could be reused in \ng!\n. However, sometimes there is a substantial similarity between the objective function, and gradient, and some calculations can be reused. The trick here is essentially the same as above. We use a closure or an anonymous function. Basically, we define\n\n\nfunction\n \ncalculate_common!\n(\nx\n,\n \nlast_x\n,\n \nbuffer\n)\n\n    \nif\n \nx\n \n!=\n \nlast_x\n\n        \ncopy!\n(\nlast_x\n,\n \nx\n)\n\n        \n#do whatever common calculations and save to buffer\n\n    \nend\n\n\nend\n\n\n\nfunction\n \nf\n(\nx\n,\n \nbuffer\n,\n \nlast_x\n)\n\n    \ncalculate_common!\n(\nx\n,\n \nlast_x\n,\n \nbuffer\n)\n\n    \nf_body\n \n# depends on buffer\n\n\nend\n\n\n\nfunction\n \ng!\n(\nx\n,\n \nstor\n,\n \nbuffer\n,\n \nlast_x\n)\n\n    \ncalculate_common!\n(\nx\n,\n \nlast_x\n,\n \nbuffer\n)\n\n    \ng_body!\n \n# depends on buffer\n\n\nend\n\n\n\n\n\n\nand then the following\n\n\nusing\n \nOptim\n\n\ninitial_x\n \n=\n \n...\n\n\nbuffer\n \n=\n \nArray\n{\neltype\n(\ninitial_x\n)}(\n...\n)\n \n# Preallocate an appropriate buffer\n\n\nlast_x\n \n=\n \nsimilar\n(\ninitial_x\n)\n\n\ndf\n \n=\n \nTwiceDifferentiable\n(\nx\n \n-\n \nf\n(\nx\n,\n \nbuffer\n,\n \ninitial_x\n),\n\n                                \n(\nstor\n,\n \nx\n)\n \n-\n \ng!\n(\nx\n,\n \nstor\n,\n \nbuffer\n,\n \nlast_x\n))\n\n\noptimize\n(\ndf\n,\n \ninitial_x\n)\n\n\n\n\n\n\n\n\nProvide gradients\n\n\nAs mentioned in the general introduction, passing analytical gradients can have an impact on performance. To show an example of this, consider the separable extension of the Rosenbrock function in dimension 5000, see \nSROSENBR\n in CUTEst.\n\n\nBelow, we use the gradients and objective functions from \nmastsif\n through \nCUTEst.jl\n. We only show the first five iterations of an attempt to minimize the function using Gradient Descent.\n\n\njulia\n \n@time\n \noptimize\n(\nf\n,\n \ninitial_x\n,\n \nGradientDescent\n(),\n\n                      \nOptim\n.\nOptions\n(\nshow_trace\n=\ntrue\n,\n \niterations\n \n=\n \n5\n))\n\n\nIter     Function value   Gradient norm\n\n\n     0     4.850000e+04     2.116000e+02\n\n\n     1     1.018734e+03     2.704951e+01\n\n\n     2     3.468449e+00     5.721261e-01\n\n\n     3     2.966899e+00     2.638790e-02\n\n\n     4     2.511859e+00     5.237768e-01\n\n\n     5     2.107853e+00     1.020287e-01\n\n\n 21.731129 seconds (1.61 M allocations: 63.434 MB, 0.03% gc time)\n\n\nResults of Optimization Algorithm\n\n\n * Algorithm: Gradient Descent\n\n\n * Starting Point: [1.2,1.0, ...]\n\n\n * Minimizer: [1.0287767703731154,1.058769439356144, ...]\n\n\n * Minimum: 2.107853e+00\n\n\n * Iterations: 5\n\n\n * Convergence: false\n\n\n   * |x - x\n| \n 1.0e-32: false\n\n\n   * |f(x) - f(x\n)| / |f(x)| \n 1.0e-32: false\n\n\n   * |g(x)| \n 1.0e-08: false\n\n\n   * Reached Maximum Number of Iterations: true\n\n\n * Objective Function Calls: 23\n\n\n * Gradient Calls: 23\n\n\n\njulia\n \n@time\n \noptimize\n(\nf\n,\n \ng!\n,\n \ninitial_x\n,\n \nGradientDescent\n(),\n\n                      \nOptim\n.\nOptions\n(\nshow_trace\n=\ntrue\n,\n \niterations\n \n=\n \n5\n))\n\n\nIter     Function value   Gradient norm\n\n\n     0     4.850000e+04     2.116000e+02\n\n\n     1     1.018769e+03     2.704998e+01\n\n\n     2     3.468488e+00     5.721481e-01\n\n\n     3     2.966900e+00     2.638792e-02\n\n\n     4     2.511828e+00     5.237919e-01\n\n\n     5     2.107802e+00     1.020415e-01\n\n\n  0.009889 seconds (915 allocations: 270.266 KB)\n\n\nResults of Optimization Algorithm\n\n\n * Algorithm: Gradient Descent\n\n\n * Starting Point: [1.2,1.0, ...]\n\n\n * Minimizer: [1.0287763814102757,1.05876866832087, ...]\n\n\n * Minimum: 2.107802e+00\n\n\n * Iterations: 5\n\n\n * Convergence: false\n\n\n   * |x - x\n| \n 1.0e-32: false\n\n\n   * |f(x) - f(x\n)| / |f(x)| \n 1.0e-32: false\n\n\n   * |g(x)| \n 1.0e-08: false\n\n\n   * Reached Maximum Number of Iterations: true\n\n\n * Objective Function Calls: 23\n\n\n * Gradient Calls: 23\n\n\n\n\n\n\nThe objective has obtained a value that is very similar between the two runs, but the run with the analytical gradient is way faster.  It is possible that the finite differences code can be improved, but generally the optimization will be slowed down by all the function evaluations required to do the central finite differences calculations.\n\n\n\n\nSeparating time spent in Optim's code and user provided functions\n\n\nConsider the Rosenbrock problem.\n\n\nusing\n \nOptim\n\n\nprob\n \n=\n \nOptim\n.\nUnconstrainedProblems\n.\nexamples\n[\nRosenbrock\n];\n\n\n\n\n\n\nSay we optimize this function, and look at the total run time of \noptimize\n using the Newton Trust Region method, and we are surprised that it takes a long time to run. We then wonder if time is spent in Optim's own code (solving the sub-problem for example) or in evaluating the objective, gradient or hessian that we provided. Then it can be very useful to use the \nTimerOutputs.jl\n package. This package allows us to run an over-all timer for \noptimize\n, and add individual timers for \nf\n, \ng!\n, and \nh!\n. Consider the example below, that is due to the author of the package (Kristoffer Carlsson).\n\n\nusing\n \nTimerOutputs\n\n\nconst\n \nto\n \n=\n \nTimerOutput\n()\n\n\n\nf\n(\nx\n    \n)\n \n=\n  \n@timeit\n \nto\n \nf\n  \nprob\n.\nf\n(\nx\n)\n\n\ng!\n(\nx\n,\n \ng\n)\n \n=\n  \n@timeit\n \nto\n \ng!\n \nprob\n.\ng!\n(\nx\n,\n \ng\n)\n\n\nh!\n(\nx\n,\n \nh\n)\n \n=\n  \n@timeit\n \nto\n \nh!\n \nprob\n.\nh!\n(\nx\n,\n \nh\n)\n\n\n\nbegin\n\n\nreset_timer!\n(\nto\n)\n\n\n@timeit\n \nto\n \nTrust Region\n \nbegin\n\n    \nres\n \n=\n \nOptim\n.\noptimize\n(\nf\n,\n \ng!\n,\n \nh!\n,\n \nprob\n.\ninitial_x\n,\n \nNewtonTrustRegion\n())\n\n\nend\n\n\nshow\n(\nto\n;\n \nallocations\n \n=\n \nfalse\n)\n\n\nend\n\n\n\n\n\n\nWe see that the time is actually \nnot\n spent in our provided functions, but most of the time is spent in the code for the trust region method.\n\n\n\n\nEarly stopping\n\n\nSometimes it might be of interest to stop the optimizer early. The simplest way to do this is to set the \niterations\n keyword in \nOptim.Options\n to some number. This will prevent the iteration counter exceeding some limit, with the standard value being 1000. Alternatively, it is possible to put a soft limit on the run time of the optimization procedure by setting the \ntime_limit\n keyword in the \nOptim.Options\n constructor.\n\n\nusing\n \nOptim\n\n\nproblem\n \n=\n \nOptim\n.\nUnconstrainedProblems\n.\nexamples\n[\nRosenbrock\n]\n\n\n\nf\n \n=\n \nproblem\n.\nf\n\n\ninitial_x\n \n=\n \nproblem\n.\ninitial_x\n\n\n\nfunction\n \nslow\n(\nx\n)\n\n    \nsleep\n(\n0.1\n)\n\n    \nf\n(\nx\n)\n\n\nend\n\n\n\nstart_time\n \n=\n \ntime\n()\n\n\n\noptimize\n(\nslow\n,\n \nzeros\n(\n2\n),\n \nNelderMead\n(),\n \nOptim\n.\nOptions\n(\ntime_limit\n \n=\n \n3.0\n))\n\n\n\n\n\n\nThis will stop after about three seconds. If it is more important that we stop before the limit is reached, it is possible to use a callback with a simple model for predicting how much time will have passed when the next iteration is over. Consider the following code\n\n\nusing\n \nOptim\n\n\nproblem\n \n=\n \nOptim\n.\nUnconstrainedProblems\n.\nexamples\n[\nRosenbrock\n]\n\n\n\nf\n \n=\n \nproblem\n.\nf\n\n\ninitial_x\n \n=\n \nproblem\n.\ninitial_x\n\n\n\nfunction\n \nvery_slow\n(\nx\n)\n\n    \nsleep\n(\n.\n5\n)\n\n    \nf\n(\nx\n)\n\n\nend\n\n\n\nstart_time\n \n=\n \ntime\n()\n\n\ntime_to_setup\n \n=\n \nzeros\n(\n1\n)\n\n\nfunction\n \nadvanced_time_control\n(\nx\n)\n\n    \nprintln\n(\n * Iteration:       \n,\n \nx\n.\niteration\n)\n\n    \nso_far\n \n=\n  \ntime\n()\n-\nstart_time\n\n    \nprintln\n(\n * Time so far:     \n,\n \nso_far\n)\n\n    \nif\n \nx\n.\niteration\n \n==\n \n0\n\n        \ntime_to_setup\n[\n:\n]\n \n=\n \ntime\n()\n-\nstart_time\n\n    \nelse\n\n        \nexpected_next_time\n \n=\n \nso_far\n \n+\n \n(\ntime\n()\n-\nstart_time\n-\ntime_to_setup\n[\n1\n])\n/\n(\nx\n.\niteration\n)\n\n        \nprintln\n(\n * Next iteration \u2248 \n,\n \nexpected_next_time\n)\n\n        \nprintln\n()\n\n        \nreturn\n \nexpected_next_time\n \n \n13\n \n?\n \nfalse\n \n:\n \ntrue\n\n    \nend\n\n    \nprintln\n()\n\n    \nfalse\n\n\nend\n\n\noptimize\n(\nvery_slow\n,\n \nzeros\n(\n2\n),\n \nNelderMead\n(),\n \nOptim\n.\nOptions\n(\ncallback\n \n=\n \nadvanced_time_control\n))\n\n\n\n\n\n\nIt will try to predict the elapsed time after the next iteration is over, and stop now if it is expected to exceed the limit of 13 seconds. Running it, we get something like the following output\n\n\njulia\n \noptimize\n(\nvery_slow\n,\n \nzeros\n(\n2\n),\n \nNelderMead\n(),\n \nOptim\n.\nOptions\n(\ncallback\n \n=\n \nadvanced_time_control\n))\n\n\n * Iteration:       0\n\n\n * Time so far:     2.219298839569092\n\n\n\n * Iteration:       1\n\n\n * Time so far:     3.4006409645080566\n\n\n * Next iteration \u2248 4.5429909229278564\n\n\n\n * Iteration:       2\n\n\n * Time so far:     4.403923988342285\n\n\n * Next iteration \u2248 5.476739525794983\n\n\n\n * Iteration:       3\n\n\n * Time so far:     5.407265901565552\n\n\n * Next iteration \u2248 6.4569235642751055\n\n\n\n * Iteration:       4\n\n\n * Time so far:     5.909044027328491\n\n\n * Next iteration \u2248 6.821732044219971\n\n\n\n * Iteration:       5\n\n\n * Time so far:     6.912338972091675\n\n\n * Next iteration \u2248 7.843148183822632\n\n\n\n * Iteration:       6\n\n\n * Time so far:     7.9156060218811035\n\n\n * Next iteration \u2248 8.85849153995514\n\n\n\n * Iteration:       7\n\n\n * Time so far:     8.918903827667236\n\n\n * Next iteration \u2248 9.870419979095459\n\n\n\n * Iteration:       8\n\n\n * Time so far:     9.922197818756104\n\n\n * Next iteration \u2248 10.880185931921005\n\n\n\n * Iteration:       9\n\n\n * Time so far:     10.925468921661377\n\n\n * Next iteration \u2248 11.888488478130764\n\n\n\n * Iteration:       10\n\n\n * Time so far:     11.92870283126831\n\n\n * Next iteration \u2248 12.895747828483582\n\n\n\n * Iteration:       11\n\n\n * Time so far:     12.932114839553833\n\n\n * Next iteration \u2248 13.902462200684981\n\n\n\nResults of Optimization Algorithm\n\n\n * Algorithm: Nelder-Mead\n\n\n * Starting Point: [0.0,0.0]\n\n\n * Minimizer: [0.23359374999999996,0.042187499999999996, ...]\n\n\n * Minimum: 6.291677e-01\n\n\n * Iterations: 11\n\n\n * Convergence: false\n\n\n   *  \u221a(\u03a3(y\u1d62-y\u0304)\u00b2)/n \n 1.0e-08: false\n\n\n   * Reached Maximum Number of Iterations: false\n\n\n * Objective Function Calls: 24", 
            "title": "Tips and tricks"
        }, 
        {
            "location": "/user/tipsandtricks/#dealing-with-constant-parameters", 
            "text": "In many applications, there may be factors that are relevant to the function evaluations, but are fixed throughout the optimization. An obvious example is using data in a likelihood function, but it could also be parameters we wish to hold constant.  Consider a squared error loss function that depends on some data  x  and  y , and parameters  betas . As far as the solver is concerned, there should only be one input argument to the function we want to minimize, call it  sqerror .  The problem is that we want to optimize a function  sqerror  that really depends on three inputs, and two of them are constant throught the optimization procedure. To do this, we need to define the variables  x  and  y  x   =   [ 1.0 ,   2.0 ,   3.0 ]  y   =   1.0   +   2.0   *   x   +   [ - 0.3 ,   0.3 ,   - 0.1 ]   We then simply define a function in three variables  function   sqerror ( betas ,   X ,   Y ) \n     err   =   0.0 \n     for   i   in   1 : length ( X ) \n         pred_i   =   betas [ 1 ]   +   betas [ 2 ]   *   X [ i ] \n         err   +=   ( Y [ i ]   -   pred_i ) ^ 2 \n     end \n     return   err  end   and then optimize the following anonymous function  res   =   optimize ( b   -   sqerror ( b ,   x ,   y ),   [ 0.0 ,   0.0 ])   Alternatively, we can define a closure  sqerror(betas)  that is aware of the variables we just defined  function   sqerror ( betas ) \n     err   =   0.0 \n     for   i   in   1 : length ( x ) \n         pred_i   =   betas [ 1 ]   +   betas [ 2 ]   *   x [ i ] \n         err   +=   ( y [ i ]   -   pred_i ) ^ 2 \n     end \n     return   err  end   We can then optimize the  sqerror  function just like any other function  res   =   optimize ( sqerror ,   [ 0.0 ,   0.0 ])", 
            "title": "Dealing with constant parameters"
        }, 
        {
            "location": "/user/tipsandtricks/#avoid-repeating-computations", 
            "text": "Say you are optimizing a function  f ( x )   =   x [ 1 ] ^ 2 + x [ 2 ] ^ 2  g! ( storage ,   x )   =   copy! ( storage ,   [ 2 x [ 1 ],   2 x [ 2 ]])   In this situation, no calculations from  f  could be reused in  g! . However, sometimes there is a substantial similarity between the objective function, and gradient, and some calculations can be reused. The trick here is essentially the same as above. We use a closure or an anonymous function. Basically, we define  function   calculate_common! ( x ,   last_x ,   buffer ) \n     if   x   !=   last_x \n         copy! ( last_x ,   x ) \n         #do whatever common calculations and save to buffer \n     end  end  function   f ( x ,   buffer ,   last_x ) \n     calculate_common! ( x ,   last_x ,   buffer ) \n     f_body   # depends on buffer  end  function   g! ( x ,   stor ,   buffer ,   last_x ) \n     calculate_common! ( x ,   last_x ,   buffer ) \n     g_body!   # depends on buffer  end   and then the following  using   Optim  initial_x   =   ...  buffer   =   Array { eltype ( initial_x )}( ... )   # Preallocate an appropriate buffer  last_x   =   similar ( initial_x )  df   =   TwiceDifferentiable ( x   -   f ( x ,   buffer ,   initial_x ), \n                                 ( stor ,   x )   -   g! ( x ,   stor ,   buffer ,   last_x ))  optimize ( df ,   initial_x )", 
            "title": "Avoid repeating computations"
        }, 
        {
            "location": "/user/tipsandtricks/#provide-gradients", 
            "text": "As mentioned in the general introduction, passing analytical gradients can have an impact on performance. To show an example of this, consider the separable extension of the Rosenbrock function in dimension 5000, see  SROSENBR  in CUTEst.  Below, we use the gradients and objective functions from  mastsif  through  CUTEst.jl . We only show the first five iterations of an attempt to minimize the function using Gradient Descent.  julia   @time   optimize ( f ,   initial_x ,   GradientDescent (), \n                       Optim . Options ( show_trace = true ,   iterations   =   5 ))  Iter     Function value   Gradient norm       0     4.850000e+04     2.116000e+02       1     1.018734e+03     2.704951e+01       2     3.468449e+00     5.721261e-01       3     2.966899e+00     2.638790e-02       4     2.511859e+00     5.237768e-01       5     2.107853e+00     1.020287e-01   21.731129 seconds (1.61 M allocations: 63.434 MB, 0.03% gc time)  Results of Optimization Algorithm   * Algorithm: Gradient Descent   * Starting Point: [1.2,1.0, ...]   * Minimizer: [1.0287767703731154,1.058769439356144, ...]   * Minimum: 2.107853e+00   * Iterations: 5   * Convergence: false     * |x - x |   1.0e-32: false     * |f(x) - f(x )| / |f(x)|   1.0e-32: false     * |g(x)|   1.0e-08: false     * Reached Maximum Number of Iterations: true   * Objective Function Calls: 23   * Gradient Calls: 23  julia   @time   optimize ( f ,   g! ,   initial_x ,   GradientDescent (), \n                       Optim . Options ( show_trace = true ,   iterations   =   5 ))  Iter     Function value   Gradient norm       0     4.850000e+04     2.116000e+02       1     1.018769e+03     2.704998e+01       2     3.468488e+00     5.721481e-01       3     2.966900e+00     2.638792e-02       4     2.511828e+00     5.237919e-01       5     2.107802e+00     1.020415e-01    0.009889 seconds (915 allocations: 270.266 KB)  Results of Optimization Algorithm   * Algorithm: Gradient Descent   * Starting Point: [1.2,1.0, ...]   * Minimizer: [1.0287763814102757,1.05876866832087, ...]   * Minimum: 2.107802e+00   * Iterations: 5   * Convergence: false     * |x - x |   1.0e-32: false     * |f(x) - f(x )| / |f(x)|   1.0e-32: false     * |g(x)|   1.0e-08: false     * Reached Maximum Number of Iterations: true   * Objective Function Calls: 23   * Gradient Calls: 23   The objective has obtained a value that is very similar between the two runs, but the run with the analytical gradient is way faster.  It is possible that the finite differences code can be improved, but generally the optimization will be slowed down by all the function evaluations required to do the central finite differences calculations.", 
            "title": "Provide gradients"
        }, 
        {
            "location": "/user/tipsandtricks/#separating-time-spent-in-optims-code-and-user-provided-functions", 
            "text": "Consider the Rosenbrock problem.  using   Optim  prob   =   Optim . UnconstrainedProblems . examples [ Rosenbrock ];   Say we optimize this function, and look at the total run time of  optimize  using the Newton Trust Region method, and we are surprised that it takes a long time to run. We then wonder if time is spent in Optim's own code (solving the sub-problem for example) or in evaluating the objective, gradient or hessian that we provided. Then it can be very useful to use the  TimerOutputs.jl  package. This package allows us to run an over-all timer for  optimize , and add individual timers for  f ,  g! , and  h! . Consider the example below, that is due to the author of the package (Kristoffer Carlsson).  using   TimerOutputs  const   to   =   TimerOutput ()  f ( x      )   =    @timeit   to   f    prob . f ( x )  g! ( x ,   g )   =    @timeit   to   g!   prob . g! ( x ,   g )  h! ( x ,   h )   =    @timeit   to   h!   prob . h! ( x ,   h )  begin  reset_timer! ( to )  @timeit   to   Trust Region   begin \n     res   =   Optim . optimize ( f ,   g! ,   h! ,   prob . initial_x ,   NewtonTrustRegion ())  end  show ( to ;   allocations   =   false )  end   We see that the time is actually  not  spent in our provided functions, but most of the time is spent in the code for the trust region method.", 
            "title": "Separating time spent in Optim's code and user provided functions"
        }, 
        {
            "location": "/user/tipsandtricks/#early-stopping", 
            "text": "Sometimes it might be of interest to stop the optimizer early. The simplest way to do this is to set the  iterations  keyword in  Optim.Options  to some number. This will prevent the iteration counter exceeding some limit, with the standard value being 1000. Alternatively, it is possible to put a soft limit on the run time of the optimization procedure by setting the  time_limit  keyword in the  Optim.Options  constructor.  using   Optim  problem   =   Optim . UnconstrainedProblems . examples [ Rosenbrock ]  f   =   problem . f  initial_x   =   problem . initial_x  function   slow ( x ) \n     sleep ( 0.1 ) \n     f ( x )  end  start_time   =   time ()  optimize ( slow ,   zeros ( 2 ),   NelderMead (),   Optim . Options ( time_limit   =   3.0 ))   This will stop after about three seconds. If it is more important that we stop before the limit is reached, it is possible to use a callback with a simple model for predicting how much time will have passed when the next iteration is over. Consider the following code  using   Optim  problem   =   Optim . UnconstrainedProblems . examples [ Rosenbrock ]  f   =   problem . f  initial_x   =   problem . initial_x  function   very_slow ( x ) \n     sleep ( . 5 ) \n     f ( x )  end  start_time   =   time ()  time_to_setup   =   zeros ( 1 )  function   advanced_time_control ( x ) \n     println (  * Iteration:        ,   x . iteration ) \n     so_far   =    time () - start_time \n     println (  * Time so far:      ,   so_far ) \n     if   x . iteration   ==   0 \n         time_to_setup [ : ]   =   time () - start_time \n     else \n         expected_next_time   =   so_far   +   ( time () - start_time - time_to_setup [ 1 ]) / ( x . iteration ) \n         println (  * Next iteration \u2248  ,   expected_next_time ) \n         println () \n         return   expected_next_time     13   ?   false   :   true \n     end \n     println () \n     false  end  optimize ( very_slow ,   zeros ( 2 ),   NelderMead (),   Optim . Options ( callback   =   advanced_time_control ))   It will try to predict the elapsed time after the next iteration is over, and stop now if it is expected to exceed the limit of 13 seconds. Running it, we get something like the following output  julia   optimize ( very_slow ,   zeros ( 2 ),   NelderMead (),   Optim . Options ( callback   =   advanced_time_control ))   * Iteration:       0   * Time so far:     2.219298839569092   * Iteration:       1   * Time so far:     3.4006409645080566   * Next iteration \u2248 4.5429909229278564   * Iteration:       2   * Time so far:     4.403923988342285   * Next iteration \u2248 5.476739525794983   * Iteration:       3   * Time so far:     5.407265901565552   * Next iteration \u2248 6.4569235642751055   * Iteration:       4   * Time so far:     5.909044027328491   * Next iteration \u2248 6.821732044219971   * Iteration:       5   * Time so far:     6.912338972091675   * Next iteration \u2248 7.843148183822632   * Iteration:       6   * Time so far:     7.9156060218811035   * Next iteration \u2248 8.85849153995514   * Iteration:       7   * Time so far:     8.918903827667236   * Next iteration \u2248 9.870419979095459   * Iteration:       8   * Time so far:     9.922197818756104   * Next iteration \u2248 10.880185931921005   * Iteration:       9   * Time so far:     10.925468921661377   * Next iteration \u2248 11.888488478130764   * Iteration:       10   * Time so far:     11.92870283126831   * Next iteration \u2248 12.895747828483582   * Iteration:       11   * Time so far:     12.932114839553833   * Next iteration \u2248 13.902462200684981  Results of Optimization Algorithm   * Algorithm: Nelder-Mead   * Starting Point: [0.0,0.0]   * Minimizer: [0.23359374999999996,0.042187499999999996, ...]   * Minimum: 6.291677e-01   * Iterations: 11   * Convergence: false     *  \u221a(\u03a3(y\u1d62-y\u0304)\u00b2)/n   1.0e-08: false     * Reached Maximum Number of Iterations: false   * Objective Function Calls: 24", 
            "title": "Early stopping"
        }, 
        {
            "location": "/algo/nelder_mead/", 
            "text": "Nelder-Mead\n\n\nNelder-Mead is currently the standard algorithm when no derivatives are provided.\n\n\n\n\nConstructor\n\n\nNelderMead\n(;\n \nparameters\n \n=\n \nAdaptiveParameters\n(),\n\n             \ninitial_simplex\n \n=\n \nAffineSimplexer\n())\n\n\n\n\n\n\nThe keywords in the constructor are used to control the following parts of the solver:\n\n\n\n\nparameters\n is a an instance of either \nAdaptiveParameters\n or \nFixedParameters\n, and is\n\n\n\n\nused to generate parameters for the Nelder-Mead Algorithm.\n\n\n\n\ninitial_simplex\n is an instance of \nAffineSimplexer\n. See more\n\n\n\n\ndetails below.\n\n\n\n\nDescription\n\n\nOur current implementation of the Nelder-Mead algorithm is based on Nelder and Mead (1965) and Gao and Han (2010). Gradient free methods can be a bit sensitive to starting values and tuning parameters, so it is a good idea to be careful with the defaults provided in Optim.\n\n\nInstead of using gradient information, Nelder-Mead is a direct search method. It keeps track of the function value at a number of points in the search space. Together, the points form a simplex. Given a simplex, we can perform one of four actions: reflect, expand, contract, or shrink. Basically, the goal is to iteratively replace the worst point with a better point. More information can be found in Nelder and Mead (1965), Lagarias, et al (1998) or Gao and Han (2010).\n\n\nThe stopping rule is the same as in the original paper, and is the standard error of the function values at the vertices. To set the tolerance level for this convergence criterion, set the \ng_tol\n level as described in the Configurable Options section.\n\n\nWhen the solver finishes, we return a minimizer which is either the centroid or one of the vertices. The function value at the centroid adds a function evaluation, as we need to evaluate the objection at the centroid to choose the smallest function value. However, even if the function value at the centroid can be returned as the minimum, we do not trace it during the optimization iterations. This is to avoid too many evaluations of the objective function which can be computationally expensive. Typically, there should be no more than twice as many \nf_calls\n than \niterations\n.  Adding an evaluation at the centroid when tracing could considerably increase the total run-time of the algorithm.\n\n\n\n\nSpecifying the initial simplex\n\n\nThe default choice of \ninitial_simplex\n is \nAffineSimplexer()\n. A simplex is represented by an $(n+1)$-dimensional vector of $n$-dimensional vectors. It is used together  with the initial \nx\n to create the initial simplex. To construct the $i$th vertex, it simply multiplies entry $i$ in the initial vector with a constant \nb\n, and adds a constant \na\n. This means that the $i$th of the $n$ additional vertices is of the form\n\n\n\n\n\n(x_0^1, x_0^2, \\ldots, x_0^i, \\ldots, 0,0) + (0, 0, \\ldots, x_0^i\\cdot b+a,\\ldots, 0,0)\n\n\n\n\n\nIf an $x_0^i$ is zero, we need the $a$ to make sure all vertices are unique. Generally, it is advised to start with a relatively large simplex.\n\n\nIf a specific simplex is wanted, it is possible to construct the $(n+1)$-vector of $n$-dimensional vectors, and pass it to the solver using a new type definition and a new method for the function \nsimplexer\n. For example, let us minimize the two-dimensional Rosenbrock function, and choose three vertices that have elements that are simply standard uniform draws.\n\n\nusing\n \nOptim\n\n\nstruct\n \nMySimplexer\n \n:\n \nOptim\n.\nSimplexer\n \nend\n\n\nOptim\n.\nsimplexer\n(\nS\n::\nMySimplexer\n,\n \ninitial_x\n)\n \n=\n \n[\nrand\n(\nlength\n(\ninitial_x\n))\n \nfor\n \ni\n \n=\n \n1\n:\nlength\n(\ninitial_x\n)\n+\n1\n]\n\n\nf\n(\nx\n)\n \n=\n \n(\n1.0\n \n-\n \nx\n[\n1\n])\n^\n2\n \n+\n \n100.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n^\n2\n\n\noptimize\n(\nf\n,\n \n[\n.\n0\n,\n \n.\n0\n],\n \nNelderMead\n(\ninitial_simplex\n \n=\n \nMySimplexer\n()))\n\n\n\n\n\n\nSay we want to implement the initial simplex as in Matlab's \nfminsearch\n. This is very close to the \nAffineSimplexer\n above, but with a small twist. Instead of always adding the \na\n, a constant is only added to entries that are zero. If the entry is non-zero, five percent of the level is added. This might be implemented (by the user) as\n\n\nstruct\n \nMatlabSimplexer\n \n:\n \nOptim\n.\nSimplexer\n\n    \na\n::\nFloat64\n\n    \nb\n::\nFloat64\n\n\nend\n\n\nMatlabSimplexer\n(;\na\n \n=\n \n0.00025\n,\n \nb\n \n=\n \n0.05\n)\n \n=\n \nMatlabSimplexer\n(\na\n,\n \nb\n)\n\n\n\nfunction\n \nOptim\n.\nsimplexer\n(\nA\n::\nMatlabSimplexer\n,\n \ninitial_x\n::\nArray\n{\nT\n,\n \nN\n})\n \nwhere\n \n{\nT\n,\n \nN\n}\n\n    \nn\n \n=\n \nlength\n(\ninitial_x\n)\n\n    \ninitial_simplex\n \n=\n \nArray\n{\nT\n,\n \nN\n}[\ninitial_x\n \nfor\n \ni\n \n=\n \n1\n:\nn\n+\n1\n]\n\n    \nfor\n \nj\n \n=\n \n1\n:\nn\n\n        \ninitial_simplex\n[\nj\n+\n1\n][\nj\n]\n \n+=\n \ninitial_simplex\n[\nj\n+\n1\n][\nj\n]\n \n==\n \nzero\n(\nT\n)\n \n?\n \nS\n.\nb\n \n*\n \ninitial_simplex\n[\nj\n+\n1\n][\nj\n]\n \n:\n \nS\n.\na\n\n    \nend\n\n    \ninitial_simplex\n\n\nend\n\n\n\n\n\n\n\n\nThe parameters of Nelder-Mead\n\n\nThe different types of steps in the algorithm are governed by four parameters: $\\alpha$ for the reflection, $\\beta$ for the expansion, $\\gamma$ for the contraction, and $\\delta$ for the shrink step. We default to the adaptive parameters scheme in Gao and Han (2010). These are based on the dimensionality of the problem, and are given by\n\n\n\n\n\n\\alpha = 1, \\quad \\beta = 1+2/n,\\quad \\gamma =0.75 + 1/2n,\\quad \\delta = 1-1/n\n\n\n\n\n\nIt is also possible to specify the original parameters from Nelder and Mead (1965)\n\n\n\n\n\n\\alpha = 1,\\quad \\beta = 2, \\quad\\gamma = 1/2, \\quad\\delta = 1/2\n\n\n\n\n\nby specifying \nparameters  = Optim.FixedParameters()\n. For specifying custom values, \nparameters  = Optim.FixedParameters(\u03b1 = a, \u03b2 = b, \u03b3 = g, \u03b4 = d)\n is used, where a, b, g, d are the chosen values. If another parameter specification is wanted, it is possible to create a custom sub-type of\nOptim.NMParameters\n, and add a method to the \nparameters\n function. It should take the new type as the first positional argument, and the dimensionality of \nx\n as the second positional argument, and return a 4-tuple of parameters. However, it will often be easier to simply supply the wanted parameters to \nFixedParameters\n.\n\n\n\n\nReferences\n\n\nNelder, John A. and R. Mead (1965). \"A simplex method for function minimization\". Computer Journal 7: 308\u2013313. doi:10.1093/comjnl/7.4.308.\n\n\nLagarias, Jeffrey C., et al. \"Convergence properties of the Nelder\u2013Mead simplex method in low dimensions.\" SIAM Journal on optimization 9.1 (1998): 112-147.\n\n\nGao, Fuchang and Lixing Han (2010). \"Implementing the Nelder-Mead simplex algorithm with adaptive parameters\". Computational Optimization and Applications [DOI 10.1007/s10589-010-9329-3]", 
            "title": "Nelder Mead"
        }, 
        {
            "location": "/algo/nelder_mead/#nelder-mead", 
            "text": "Nelder-Mead is currently the standard algorithm when no derivatives are provided.", 
            "title": "Nelder-Mead"
        }, 
        {
            "location": "/algo/nelder_mead/#constructor", 
            "text": "NelderMead (;   parameters   =   AdaptiveParameters (), \n              initial_simplex   =   AffineSimplexer ())   The keywords in the constructor are used to control the following parts of the solver:   parameters  is a an instance of either  AdaptiveParameters  or  FixedParameters , and is   used to generate parameters for the Nelder-Mead Algorithm.   initial_simplex  is an instance of  AffineSimplexer . See more   details below.", 
            "title": "Constructor"
        }, 
        {
            "location": "/algo/nelder_mead/#description", 
            "text": "Our current implementation of the Nelder-Mead algorithm is based on Nelder and Mead (1965) and Gao and Han (2010). Gradient free methods can be a bit sensitive to starting values and tuning parameters, so it is a good idea to be careful with the defaults provided in Optim.  Instead of using gradient information, Nelder-Mead is a direct search method. It keeps track of the function value at a number of points in the search space. Together, the points form a simplex. Given a simplex, we can perform one of four actions: reflect, expand, contract, or shrink. Basically, the goal is to iteratively replace the worst point with a better point. More information can be found in Nelder and Mead (1965), Lagarias, et al (1998) or Gao and Han (2010).  The stopping rule is the same as in the original paper, and is the standard error of the function values at the vertices. To set the tolerance level for this convergence criterion, set the  g_tol  level as described in the Configurable Options section.  When the solver finishes, we return a minimizer which is either the centroid or one of the vertices. The function value at the centroid adds a function evaluation, as we need to evaluate the objection at the centroid to choose the smallest function value. However, even if the function value at the centroid can be returned as the minimum, we do not trace it during the optimization iterations. This is to avoid too many evaluations of the objective function which can be computationally expensive. Typically, there should be no more than twice as many  f_calls  than  iterations .  Adding an evaluation at the centroid when tracing could considerably increase the total run-time of the algorithm.", 
            "title": "Description"
        }, 
        {
            "location": "/algo/nelder_mead/#specifying-the-initial-simplex", 
            "text": "The default choice of  initial_simplex  is  AffineSimplexer() . A simplex is represented by an $(n+1)$-dimensional vector of $n$-dimensional vectors. It is used together  with the initial  x  to create the initial simplex. To construct the $i$th vertex, it simply multiplies entry $i$ in the initial vector with a constant  b , and adds a constant  a . This means that the $i$th of the $n$ additional vertices is of the form   \n(x_0^1, x_0^2, \\ldots, x_0^i, \\ldots, 0,0) + (0, 0, \\ldots, x_0^i\\cdot b+a,\\ldots, 0,0)   If an $x_0^i$ is zero, we need the $a$ to make sure all vertices are unique. Generally, it is advised to start with a relatively large simplex.  If a specific simplex is wanted, it is possible to construct the $(n+1)$-vector of $n$-dimensional vectors, and pass it to the solver using a new type definition and a new method for the function  simplexer . For example, let us minimize the two-dimensional Rosenbrock function, and choose three vertices that have elements that are simply standard uniform draws.  using   Optim  struct   MySimplexer   :   Optim . Simplexer   end  Optim . simplexer ( S :: MySimplexer ,   initial_x )   =   [ rand ( length ( initial_x ))   for   i   =   1 : length ( initial_x ) + 1 ]  f ( x )   =   ( 1.0   -   x [ 1 ]) ^ 2   +   100.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 ) ^ 2  optimize ( f ,   [ . 0 ,   . 0 ],   NelderMead ( initial_simplex   =   MySimplexer ()))   Say we want to implement the initial simplex as in Matlab's  fminsearch . This is very close to the  AffineSimplexer  above, but with a small twist. Instead of always adding the  a , a constant is only added to entries that are zero. If the entry is non-zero, five percent of the level is added. This might be implemented (by the user) as  struct   MatlabSimplexer   :   Optim . Simplexer \n     a :: Float64 \n     b :: Float64  end  MatlabSimplexer (; a   =   0.00025 ,   b   =   0.05 )   =   MatlabSimplexer ( a ,   b )  function   Optim . simplexer ( A :: MatlabSimplexer ,   initial_x :: Array { T ,   N })   where   { T ,   N } \n     n   =   length ( initial_x ) \n     initial_simplex   =   Array { T ,   N }[ initial_x   for   i   =   1 : n + 1 ] \n     for   j   =   1 : n \n         initial_simplex [ j + 1 ][ j ]   +=   initial_simplex [ j + 1 ][ j ]   ==   zero ( T )   ?   S . b   *   initial_simplex [ j + 1 ][ j ]   :   S . a \n     end \n     initial_simplex  end", 
            "title": "Specifying the initial simplex"
        }, 
        {
            "location": "/algo/nelder_mead/#the-parameters-of-nelder-mead", 
            "text": "The different types of steps in the algorithm are governed by four parameters: $\\alpha$ for the reflection, $\\beta$ for the expansion, $\\gamma$ for the contraction, and $\\delta$ for the shrink step. We default to the adaptive parameters scheme in Gao and Han (2010). These are based on the dimensionality of the problem, and are given by   \n\\alpha = 1, \\quad \\beta = 1+2/n,\\quad \\gamma =0.75 + 1/2n,\\quad \\delta = 1-1/n   It is also possible to specify the original parameters from Nelder and Mead (1965)   \n\\alpha = 1,\\quad \\beta = 2, \\quad\\gamma = 1/2, \\quad\\delta = 1/2   by specifying  parameters  = Optim.FixedParameters() . For specifying custom values,  parameters  = Optim.FixedParameters(\u03b1 = a, \u03b2 = b, \u03b3 = g, \u03b4 = d)  is used, where a, b, g, d are the chosen values. If another parameter specification is wanted, it is possible to create a custom sub-type of Optim.NMParameters , and add a method to the  parameters  function. It should take the new type as the first positional argument, and the dimensionality of  x  as the second positional argument, and return a 4-tuple of parameters. However, it will often be easier to simply supply the wanted parameters to  FixedParameters .", 
            "title": "The parameters of Nelder-Mead"
        }, 
        {
            "location": "/algo/nelder_mead/#references", 
            "text": "Nelder, John A. and R. Mead (1965). \"A simplex method for function minimization\". Computer Journal 7: 308\u2013313. doi:10.1093/comjnl/7.4.308.  Lagarias, Jeffrey C., et al. \"Convergence properties of the Nelder\u2013Mead simplex method in low dimensions.\" SIAM Journal on optimization 9.1 (1998): 112-147.  Gao, Fuchang and Lixing Han (2010). \"Implementing the Nelder-Mead simplex algorithm with adaptive parameters\". Computational Optimization and Applications [DOI 10.1007/s10589-010-9329-3]", 
            "title": "References"
        }, 
        {
            "location": "/algo/simulated_annealing/", 
            "text": "Simulated Annealing\n\n\n\n\nConstructor\n\n\nSimulatedAnnealing\n(;\n \nneighbor\n \n=\n \ndefault_neighbor!\n,\n\n                    \nT\n \n=\n \ndefault_temperature\n,\n\n                    \np\n \n=\n \nkirkpatrick\n)\n\n\n\n\n\n\nThe constructor takes three keywords:\n\n\n\n\nneighbor = a!(x_proposed, x_current)\n, a mutating function of the current x, and the proposed x\n\n\nT = b(iteration)\n, a function of the current iteration that returns a temperature\n\n\np = c(f_proposal, f_current, T)\n, a function of the current temperature, current function value and proposed function value that returns an acceptance probability\n\n\n\n\n\n\nDescription\n\n\nSimulated Annealing is a derivative free method for optimization. It is based on the Metropolis-Hastings algorithm that was originally used to generate samples from a thermodynamics system, and is often used to generate draws from a posterior when doing Bayesian inference. As such, it is a probabilistic method for finding the minimum of a function, often over a quite large domains. For the historical reasons given above, the algorithm uses terms such as cooling, temperature, and acceptance probabilities.\n\n\nAs the constructor shows, a simulated annealing implementation is characterized by a temperature, a neighbor function, and an acceptance probability. The temperature controls how volatile the changes in minimizer candidates are allowed to be, as it enters the acceptance probability. For example, the original Kirkpatrick et al. acceptance probability function can be written as follows\n\n\np\n(\nf_proposal\n,\n \nf_current\n,\n \nT\n)\n \n=\n \nexp\n(\n-\n(\nf_proposal\n \n-\n \nf_current\n)\n/\nT\n)\n\n\n\n\n\n\nA high temperature makes it more likely that a draw is accepted, by pushing acceptance probability to 1. As in the Metropolis-Hastings algorithm, we always accept a smaller function value, but we also sometimes accept a larger value. As the temperature decreases, we're more and more likely to only accept candidate \nx\n's that lowers the function value. To obtain a new \nf_proposal\n, we need a neighbor function. A simple neighbor function adds a standard normal draw to each dimension of \nx\n\n\nfunction\n \nneighbor!\n(\nx_proposal\n::\nArray\n,\n \nx\n::\nArray\n)\n\n    \nfor\n \ni\n \nin\n \neachindex\n(\nx\n)\n\n        \nx_proposal\n[\ni\n]\n \n=\n \nx\n[\ni\n]\n+\nrandn\n()\n\n    \nend\n\n\nend\n\n\n\n\n\n\nAs we see, it is not really possible to disentangle the role of the different components of the algorithm. For example, both the functional form of the acceptance function, the temperature and (indirectly) the neighbor function determine if the next draw of \nx\n is accepted or not.\n\n\nThe current implementation of Simulated Annealing is very rough.  It lacks quite a few features which are normally part of a proper SA implementation. A better implementation is under way, see \nthis issue\n.\n\n\n\n\nExample\n\n\n\n\nReferences", 
            "title": "Simulated Annealing"
        }, 
        {
            "location": "/algo/simulated_annealing/#simulated-annealing", 
            "text": "", 
            "title": "Simulated Annealing"
        }, 
        {
            "location": "/algo/simulated_annealing/#constructor", 
            "text": "SimulatedAnnealing (;   neighbor   =   default_neighbor! , \n                     T   =   default_temperature , \n                     p   =   kirkpatrick )   The constructor takes three keywords:   neighbor = a!(x_proposed, x_current) , a mutating function of the current x, and the proposed x  T = b(iteration) , a function of the current iteration that returns a temperature  p = c(f_proposal, f_current, T) , a function of the current temperature, current function value and proposed function value that returns an acceptance probability", 
            "title": "Constructor"
        }, 
        {
            "location": "/algo/simulated_annealing/#description", 
            "text": "Simulated Annealing is a derivative free method for optimization. It is based on the Metropolis-Hastings algorithm that was originally used to generate samples from a thermodynamics system, and is often used to generate draws from a posterior when doing Bayesian inference. As such, it is a probabilistic method for finding the minimum of a function, often over a quite large domains. For the historical reasons given above, the algorithm uses terms such as cooling, temperature, and acceptance probabilities.  As the constructor shows, a simulated annealing implementation is characterized by a temperature, a neighbor function, and an acceptance probability. The temperature controls how volatile the changes in minimizer candidates are allowed to be, as it enters the acceptance probability. For example, the original Kirkpatrick et al. acceptance probability function can be written as follows  p ( f_proposal ,   f_current ,   T )   =   exp ( - ( f_proposal   -   f_current ) / T )   A high temperature makes it more likely that a draw is accepted, by pushing acceptance probability to 1. As in the Metropolis-Hastings algorithm, we always accept a smaller function value, but we also sometimes accept a larger value. As the temperature decreases, we're more and more likely to only accept candidate  x 's that lowers the function value. To obtain a new  f_proposal , we need a neighbor function. A simple neighbor function adds a standard normal draw to each dimension of  x  function   neighbor! ( x_proposal :: Array ,   x :: Array ) \n     for   i   in   eachindex ( x ) \n         x_proposal [ i ]   =   x [ i ] + randn () \n     end  end   As we see, it is not really possible to disentangle the role of the different components of the algorithm. For example, both the functional form of the acceptance function, the temperature and (indirectly) the neighbor function determine if the next draw of  x  is accepted or not.  The current implementation of Simulated Annealing is very rough.  It lacks quite a few features which are normally part of a proper SA implementation. A better implementation is under way, see  this issue .", 
            "title": "Description"
        }, 
        {
            "location": "/algo/simulated_annealing/#example", 
            "text": "", 
            "title": "Example"
        }, 
        {
            "location": "/algo/simulated_annealing/#references", 
            "text": "", 
            "title": "References"
        }, 
        {
            "location": "/algo/cg/", 
            "text": "Conjugate Gradient Descent\n\n\n\n\nConstructor\n\n\nConjugateGradient\n(;\n \nalphaguess\n \n=\n \nLineSearches\n.\nInitialHagerZhang\n(),\n\n                    \nlinesearch\n \n=\n \nLineSearches\n.\nHagerZhang\n(),\n\n                    \neta\n \n=\n \n0.4\n,\n\n                    \nP\n \n=\n \nnothing\n,\n\n                    \nprecondprep\n \n=\n \n(\nP\n,\n \nx\n)\n \n-\n \nnothing\n)\n\n\n\n\n\n\n\n\nDescription\n\n\nThe \nConjugateGradient\n method implements Hager and Zhang (2006) and elements from Hager and Zhang (2013). Notice, that the default \nlinesearch\n is \nHagerZhang\n from LineSearches.jl. This line search is exactly the one proposed in Hager and Zhang (2006). The constant $eta$ is used in determining the next step direction, and the default here deviates from the one used in the original paper ($0.01$). It needs to be a strictly positive number.\n\n\n\n\nExample\n\n\nLet's optimize the 2D Rosenbrock function. The function and gradient are given by\n\n\nf\n(\nx\n)\n \n=\n \n(\n1.0\n \n-\n \nx\n[\n1\n])\n^\n2\n \n+\n \n100.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n^\n2\n\n\nfunction\n \ng\n!\n(\nstorage\n,\n \nx\n)\n\n    \nstorage\n[\n1\n]\n \n=\n \n-\n2.0\n \n*\n \n(\n1.0\n \n-\n \nx\n[\n1\n])\n \n-\n \n400.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n \n*\n \nx\n[\n1\n]\n\n    \nstorage\n[\n2\n]\n \n=\n \n200.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n\n\nend\n\n\n\n\n\n\nwe can then try to optimize this function from \nx=[0.0, 0.0]\n\n\njulia\n optimize(f, g!, zeros(2), ConjugateGradient())\nResults of Optimization Algorithm\n * Algorithm: Conjugate Gradient\n * Starting Point: [0.0,0.0]\n * Minimizer: [1.000000002262018,1.0000000045408348]\n * Minimum: 5.144946e-18\n * Iterations: 21\n * Convergence: true\n   * |x - x\n| \n 1.0e-32: false\n     |x - x\n| = 2.09e-10\n   * |f(x) - f(x\n)| / |f(x)| \n 1.0e-32: false\n     |f(x) - f(x\n)| / |f(x)| = 1.55e+00\n   * |g(x)| \n 1.0e-08: true\n     |g(x)| = 3.36e-09\n   * stopped by an increasing objective: false\n   * Reached Maximum Number of Iterations: false\n * Objective Calls: 54\n * Gradient Calls: 39\n\n\n\n\n\nWe can compare this to the default first order solver in Optim.jl\n\n\n julia\n optimize(f, g!, zeros(2))\n\n Results of Optimization Algorithm\n  * Algorithm: L-BFGS\n  * Starting Point: [0.0,0.0]\n  * Minimizer: [0.9999999999373614,0.999999999868622]\n  * Minimum: 7.645684e-21\n  * Iterations: 16\n  * Convergence: true\n    * |x - x\n| \n 1.0e-32: false\n      |x - x\n| = 3.48e-07\n    * |f(x) - f(x\n)| / |f(x)| \n 1.0e-32: false\n      |f(x) - f(x\n)| / |f(x)| = 9.03e+06\n    * |g(x)| \n 1.0e-08: true\n      |g(x)| = 2.32e-09\n    * stopped by an increasing objective: false\n    * Reached Maximum Number of Iterations: false\n  * Objective Calls: 53\n  * Gradient Calls: 53\n\n\n\n\n\nWe see that for this objective and starting point, \nConjugateGradient()\n requires fewer gradient evaluations to reach convergence.\n\n\n\n\nReferences\n\n\n\n\nW. W. Hager and H. Zhang (2006) Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Transactions on Mathematical Software 32: 113-137.\n\n\nW. W. Hager and H. Zhang (2013), The Limited Memory Conjugate Gradient Method. SIAM Journal on Optimization, 23, pp. 2150-2168.", 
            "title": "Conjugate Gradient"
        }, 
        {
            "location": "/algo/cg/#conjugate-gradient-descent", 
            "text": "", 
            "title": "Conjugate Gradient Descent"
        }, 
        {
            "location": "/algo/cg/#constructor", 
            "text": "ConjugateGradient (;   alphaguess   =   LineSearches . InitialHagerZhang (), \n                     linesearch   =   LineSearches . HagerZhang (), \n                     eta   =   0.4 , \n                     P   =   nothing , \n                     precondprep   =   ( P ,   x )   -   nothing )", 
            "title": "Constructor"
        }, 
        {
            "location": "/algo/cg/#description", 
            "text": "The  ConjugateGradient  method implements Hager and Zhang (2006) and elements from Hager and Zhang (2013). Notice, that the default  linesearch  is  HagerZhang  from LineSearches.jl. This line search is exactly the one proposed in Hager and Zhang (2006). The constant $eta$ is used in determining the next step direction, and the default here deviates from the one used in the original paper ($0.01$). It needs to be a strictly positive number.", 
            "title": "Description"
        }, 
        {
            "location": "/algo/cg/#example", 
            "text": "Let's optimize the 2D Rosenbrock function. The function and gradient are given by  f ( x )   =   ( 1.0   -   x [ 1 ]) ^ 2   +   100.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 ) ^ 2  function   g ! ( storage ,   x ) \n     storage [ 1 ]   =   - 2.0   *   ( 1.0   -   x [ 1 ])   -   400.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 )   *   x [ 1 ] \n     storage [ 2 ]   =   200.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 )  end   we can then try to optimize this function from  x=[0.0, 0.0]  julia  optimize(f, g!, zeros(2), ConjugateGradient())\nResults of Optimization Algorithm\n * Algorithm: Conjugate Gradient\n * Starting Point: [0.0,0.0]\n * Minimizer: [1.000000002262018,1.0000000045408348]\n * Minimum: 5.144946e-18\n * Iterations: 21\n * Convergence: true\n   * |x - x |   1.0e-32: false\n     |x - x | = 2.09e-10\n   * |f(x) - f(x )| / |f(x)|   1.0e-32: false\n     |f(x) - f(x )| / |f(x)| = 1.55e+00\n   * |g(x)|   1.0e-08: true\n     |g(x)| = 3.36e-09\n   * stopped by an increasing objective: false\n   * Reached Maximum Number of Iterations: false\n * Objective Calls: 54\n * Gradient Calls: 39  We can compare this to the default first order solver in Optim.jl   julia  optimize(f, g!, zeros(2))\n\n Results of Optimization Algorithm\n  * Algorithm: L-BFGS\n  * Starting Point: [0.0,0.0]\n  * Minimizer: [0.9999999999373614,0.999999999868622]\n  * Minimum: 7.645684e-21\n  * Iterations: 16\n  * Convergence: true\n    * |x - x |   1.0e-32: false\n      |x - x | = 3.48e-07\n    * |f(x) - f(x )| / |f(x)|   1.0e-32: false\n      |f(x) - f(x )| / |f(x)| = 9.03e+06\n    * |g(x)|   1.0e-08: true\n      |g(x)| = 2.32e-09\n    * stopped by an increasing objective: false\n    * Reached Maximum Number of Iterations: false\n  * Objective Calls: 53\n  * Gradient Calls: 53  We see that for this objective and starting point,  ConjugateGradient()  requires fewer gradient evaluations to reach convergence.", 
            "title": "Example"
        }, 
        {
            "location": "/algo/cg/#references", 
            "text": "W. W. Hager and H. Zhang (2006) Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Transactions on Mathematical Software 32: 113-137.  W. W. Hager and H. Zhang (2013), The Limited Memory Conjugate Gradient Method. SIAM Journal on Optimization, 23, pp. 2150-2168.", 
            "title": "References"
        }, 
        {
            "location": "/algo/gradientdescent/", 
            "text": "Gradient Descent\n\n\n\n\nConstructor\n\n\nGradientDescent\n(;\n \nalphaguess\n \n=\n \nLineSearches\n.\nInitialPrevious\n(),\n\n                  \nlinesearch\n \n=\n \nLineSearches\n.\nHagerZhang\n(),\n\n                  \nP\n \n=\n \nnothing\n,\n\n                  \nprecondprep\n \n=\n \n(\nP\n,\n \nx\n)\n \n-\n \nnothing\n)\n\n\n\n\n\n\n\n\nDescription\n\n\nGradient Descent a common name for a quasi-Newton solver. This means that it takes steps according to\n\n\n\n\n\nx_{n+1} = x_n - P^{-1}\\nabla f(x_n)\n\n\n\n\n\nwhere $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In Gradient Descent, $P$ is simply an appropriately dimensioned identity matrix, such that we go in the exact opposite direction of the gradient. This means that we do not use the curvature information from the Hessian, or an approximation of it. While it does seem quite logical to go in the opposite direction of the fastest increase in objective value, the procedure can be very slow if the problem is ill-conditioned. See the section on preconditioners for ways to remedy this when using Gradient Descent.\n\n\nAs with the other quasi-Newton solvers in this package, a scalar $\\alpha$ is introduced as follows\n\n\n\n\n\nx_{n+1} = x_n - \\alpha P^{-1}\\nabla f(x_n)\n\n\n\n\n\nand is chosen by a linesearch algorithm such that each step gives sufficient descent.\n\n\n\n\nExample\n\n\n\n\nReferences", 
            "title": "Gradient Descent"
        }, 
        {
            "location": "/algo/gradientdescent/#gradient-descent", 
            "text": "", 
            "title": "Gradient Descent"
        }, 
        {
            "location": "/algo/gradientdescent/#constructor", 
            "text": "GradientDescent (;   alphaguess   =   LineSearches . InitialPrevious (), \n                   linesearch   =   LineSearches . HagerZhang (), \n                   P   =   nothing , \n                   precondprep   =   ( P ,   x )   -   nothing )", 
            "title": "Constructor"
        }, 
        {
            "location": "/algo/gradientdescent/#description", 
            "text": "Gradient Descent a common name for a quasi-Newton solver. This means that it takes steps according to   \nx_{n+1} = x_n - P^{-1}\\nabla f(x_n)   where $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In Gradient Descent, $P$ is simply an appropriately dimensioned identity matrix, such that we go in the exact opposite direction of the gradient. This means that we do not use the curvature information from the Hessian, or an approximation of it. While it does seem quite logical to go in the opposite direction of the fastest increase in objective value, the procedure can be very slow if the problem is ill-conditioned. See the section on preconditioners for ways to remedy this when using Gradient Descent.  As with the other quasi-Newton solvers in this package, a scalar $\\alpha$ is introduced as follows   \nx_{n+1} = x_n - \\alpha P^{-1}\\nabla f(x_n)   and is chosen by a linesearch algorithm such that each step gives sufficient descent.", 
            "title": "Description"
        }, 
        {
            "location": "/algo/gradientdescent/#example", 
            "text": "", 
            "title": "Example"
        }, 
        {
            "location": "/algo/gradientdescent/#references", 
            "text": "", 
            "title": "References"
        }, 
        {
            "location": "/algo/lbfgs/", 
            "text": "(L-)BFGS\n\n\nThis page contains information about BFGS and its limited memory version L-BFGS.\n\n\n\n\nConstructors\n\n\nBFGS\n(;\n \nalphaguess\n \n=\n \nLineSearches\n.\nInitialStatic\n(),\n\n       \nlinesearch\n \n=\n \nLineSearches\n.\nHagerZhang\n(),\n\n       \nP\n \n=\n \nnothing\n,\n\n       \nprecondprep\n \n=\n \n(\nP\n,\n \nx\n)\n \n-\n \nnothing\n)\n\n\n\n\n\n\nLBFGS\n(;\n \nm\n \n=\n \n10\n,\n\n        \nalphaguess\n \n=\n \nLineSearches\n.\nInitialStatic\n(),\n\n        \nlinesearch\n \n=\n \nLineSearches\n.\nHagerZhang\n(),\n\n        \nP\n \n=\n \nnothing\n,\n\n        \nprecondprep\n \n=\n \n(\nP\n,\n \nx\n)\n \n-\n \nnothing\n,\n\n        \nmanifold\n \n=\n \nFlat\n(),\n\n        \nscaleinvH0\n::\nBool\n \n=\n \ntrue\n \n \n(\ntypeof\n(\nP\n)\n \n:\n \nVoid\n))\n\n\n\n\n\n\n\n\nDescription\n\n\nThis means that it takes steps according to\n\n\n\n\n\nx_{n+1} = x_n - P^{-1}\\nabla f(x_n)\n\n\n\n\n\nwhere $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In (L-)BFGS, the matrix is an approximation to the Hessian built using differences in the gradient across iterations. As long as the initial matrix is positive definite  it is possible to show that all the follow matrices will be as well. The starting matrix could simply be the identity matrix, such that the first step is identical to the Gradient Descent algorithm, or even the actual Hessian.\n\n\nThere are two versions of BFGS in the package: BFGS, and L-BFGS. The latter is different from the former because it doesn't use a complete history of the iterative procedure to construct $P$, but rather only the latest $m$ steps. It doesn't actually build the Hessian approximation matrix either, but computes the direction directly. This makes more suitable for large scale problems, as the memory requirement to store the relevant vectors will grow quickly in large problems.\n\n\nAs with the other quasi-Newton solvers in this package, a scalar $\\alpha$ is introduced as follows\n\n\n\n\n\nx_{n+1} = x_n - \\alpha P^{-1}\\nabla f(x_n)\n\n\n\n\n\nand is chosen by a linesearch algorithm such that each step gives sufficient descent.\n\n\n\n\nExample\n\n\n\n\nReferences\n\n\nWright, Stephen, and Jorge Nocedal (2006) \"Numerical optimization.\" Springer", 
            "title": "(L-)BFGS"
        }, 
        {
            "location": "/algo/lbfgs/#l-bfgs", 
            "text": "This page contains information about BFGS and its limited memory version L-BFGS.", 
            "title": "(L-)BFGS"
        }, 
        {
            "location": "/algo/lbfgs/#constructors", 
            "text": "BFGS (;   alphaguess   =   LineSearches . InitialStatic (), \n        linesearch   =   LineSearches . HagerZhang (), \n        P   =   nothing , \n        precondprep   =   ( P ,   x )   -   nothing )   LBFGS (;   m   =   10 , \n         alphaguess   =   LineSearches . InitialStatic (), \n         linesearch   =   LineSearches . HagerZhang (), \n         P   =   nothing , \n         precondprep   =   ( P ,   x )   -   nothing , \n         manifold   =   Flat (), \n         scaleinvH0 :: Bool   =   true     ( typeof ( P )   :   Void ))", 
            "title": "Constructors"
        }, 
        {
            "location": "/algo/lbfgs/#description", 
            "text": "This means that it takes steps according to   \nx_{n+1} = x_n - P^{-1}\\nabla f(x_n)   where $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In (L-)BFGS, the matrix is an approximation to the Hessian built using differences in the gradient across iterations. As long as the initial matrix is positive definite  it is possible to show that all the follow matrices will be as well. The starting matrix could simply be the identity matrix, such that the first step is identical to the Gradient Descent algorithm, or even the actual Hessian.  There are two versions of BFGS in the package: BFGS, and L-BFGS. The latter is different from the former because it doesn't use a complete history of the iterative procedure to construct $P$, but rather only the latest $m$ steps. It doesn't actually build the Hessian approximation matrix either, but computes the direction directly. This makes more suitable for large scale problems, as the memory requirement to store the relevant vectors will grow quickly in large problems.  As with the other quasi-Newton solvers in this package, a scalar $\\alpha$ is introduced as follows   \nx_{n+1} = x_n - \\alpha P^{-1}\\nabla f(x_n)   and is chosen by a linesearch algorithm such that each step gives sufficient descent.", 
            "title": "Description"
        }, 
        {
            "location": "/algo/lbfgs/#example", 
            "text": "", 
            "title": "Example"
        }, 
        {
            "location": "/algo/lbfgs/#references", 
            "text": "Wright, Stephen, and Jorge Nocedal (2006) \"Numerical optimization.\" Springer", 
            "title": "References"
        }, 
        {
            "location": "/algo/newton/", 
            "text": "Newton's Method\n\n\n\n\nConstructor\n\n\nNewton\n(;\n \nalphaguess\n \n=\n \nLineSearches\n.\nInitialStatic\n(),\n\n         \nlinesearch\n \n=\n \nLineSearches\n.\nHagerZhang\n())\n\n\n\n\n\n\nThe constructor takes two keywords:\n\n\n\n\nlinesearch = a(d, x, p, x_new, g_new, lsr, c, mayterminate)\n, a function performing line search, see the line search section.\n\n\nalphaguess = a(state, dphi0, d)\n, a function for setting the initial guess for the line search algorithm, see the line search section.\n\n\n\n\n\n\nDescription\n\n\nNewton's method for optimization has a long history, and is in some sense the gold standard in unconstrained optimization of smooth functions, at least from a theoretical viewpoint. The main benefit is that it has a quadratic rate of convergence near a local optimum. The main disadvantage is that the user has to provide a Hessian. This can be difficult, complicated, or simply annoying. It can also be computationally expensive to calculate it.\n\n\nNewton's method for optimization consists of applying Newton's method for solving systems of equations, where the equations are the first order conditions, saying that the gradient should equal the zero vector.\n\n\n\n\n\n\\nabla f(x) = 0\n\n\n\n\n\nA second order Taylor expansion of the left-hand side leads to the iterative scheme\n\n\n\n\n\nx_{n+1} = x_n - H(x_n)^{-1}\\nabla f(x_n)\n\n\n\n\n\nwhere the inverse is not calculated directly, but the step size is instead calculated by solving\n\n\n\n\n\nH(x) \\textbf{s} = \\nabla f(x_n).\n\n\n\n\n\nThis is equivalent to minimizing a quadratic model, $m_k$ around the current $x_n$\n\n\n\n\n\nm_k(s) = f(x_n) + \\nabla f(x_n)^\\top \\textbf{s} + \\frac{1}{2} \\textbf{s}^\\top H(x_n) \\textbf{s}\n\n\n\n\n\nFor functions where $H(x_n)$ is difficult, or computationally expensive to obtain, we might replace the Hessian with another positive definite matrix that approximates it. Such methods are called Quasi-Newton methods; see (L-)BFGS and Gradient Descent.\n\n\nIn a sufficiently small neighborhood around the minimizer, Newton's method has quadratic convergence, but globally it might have slower convergence, or it might even diverge. To ensure convergence, a line search is performed for each $\\textbf{s}$. This amounts to replacing the step formula above with\n\n\n\n\n\nx_{n+1} = x_n - \\alpha \\textbf{s}\n\n\n\n\n\nand finding a scalar $\\alpha$ such that we get sufficient descent; see the line search section for more information.\n\n\nAdditionally, if the function is locally concave, the step taken in the formulas above will go in a direction of ascent,  as the Hessian will not be positive (semi)definite. To avoid this, we use a specialized method to calculate the step direction. If the Hessian is positive semidefinite then the method used is standard, but if it is not, a correction is made using the functionality in \nPositiveFactorizations.jl\n.\n\n\n\n\nExample\n\n\nshow the example from the issue\n\n\n\n\nReferences", 
            "title": "Newton"
        }, 
        {
            "location": "/algo/newton/#newtons-method", 
            "text": "", 
            "title": "Newton's Method"
        }, 
        {
            "location": "/algo/newton/#constructor", 
            "text": "Newton (;   alphaguess   =   LineSearches . InitialStatic (), \n          linesearch   =   LineSearches . HagerZhang ())   The constructor takes two keywords:   linesearch = a(d, x, p, x_new, g_new, lsr, c, mayterminate) , a function performing line search, see the line search section.  alphaguess = a(state, dphi0, d) , a function for setting the initial guess for the line search algorithm, see the line search section.", 
            "title": "Constructor"
        }, 
        {
            "location": "/algo/newton/#description", 
            "text": "Newton's method for optimization has a long history, and is in some sense the gold standard in unconstrained optimization of smooth functions, at least from a theoretical viewpoint. The main benefit is that it has a quadratic rate of convergence near a local optimum. The main disadvantage is that the user has to provide a Hessian. This can be difficult, complicated, or simply annoying. It can also be computationally expensive to calculate it.  Newton's method for optimization consists of applying Newton's method for solving systems of equations, where the equations are the first order conditions, saying that the gradient should equal the zero vector.   \n\\nabla f(x) = 0   A second order Taylor expansion of the left-hand side leads to the iterative scheme   \nx_{n+1} = x_n - H(x_n)^{-1}\\nabla f(x_n)   where the inverse is not calculated directly, but the step size is instead calculated by solving   \nH(x) \\textbf{s} = \\nabla f(x_n).   This is equivalent to minimizing a quadratic model, $m_k$ around the current $x_n$   \nm_k(s) = f(x_n) + \\nabla f(x_n)^\\top \\textbf{s} + \\frac{1}{2} \\textbf{s}^\\top H(x_n) \\textbf{s}   For functions where $H(x_n)$ is difficult, or computationally expensive to obtain, we might replace the Hessian with another positive definite matrix that approximates it. Such methods are called Quasi-Newton methods; see (L-)BFGS and Gradient Descent.  In a sufficiently small neighborhood around the minimizer, Newton's method has quadratic convergence, but globally it might have slower convergence, or it might even diverge. To ensure convergence, a line search is performed for each $\\textbf{s}$. This amounts to replacing the step formula above with   \nx_{n+1} = x_n - \\alpha \\textbf{s}   and finding a scalar $\\alpha$ such that we get sufficient descent; see the line search section for more information.  Additionally, if the function is locally concave, the step taken in the formulas above will go in a direction of ascent,  as the Hessian will not be positive (semi)definite. To avoid this, we use a specialized method to calculate the step direction. If the Hessian is positive semidefinite then the method used is standard, but if it is not, a correction is made using the functionality in  PositiveFactorizations.jl .", 
            "title": "Description"
        }, 
        {
            "location": "/algo/newton/#example", 
            "text": "show the example from the issue", 
            "title": "Example"
        }, 
        {
            "location": "/algo/newton/#references", 
            "text": "", 
            "title": "References"
        }, 
        {
            "location": "/algo/newton_trust_region/", 
            "text": "Newton's Method With a Trust Region\n\n\n\n\nConstructor\n\n\nNewtonTrustRegion\n(;\n \ninitial_delta\n \n=\n \n1.0\n,\n\n                    \ndelta_hat\n \n=\n \n100.0\n,\n\n                    \neta\n \n=\n \n0.1\n,\n\n                    \nrho_lower\n \n=\n \n0.25\n,\n\n                    \nrho_upper\n \n=\n \n0.75\n)\n\n\n\n\n\n\nThe constructor takes keywords that determine the initial and maximal size of the trust region, when to grow and shrink the region, and how close the function should be to the quadratic approximation.  The notation follows chapter four of Numerical Optimization.  Below, \nrho\n $=\\rho$ refers to the ratio of the actual function change to the change in the quadratic approximation for a given step.\n\n\n\n\ninitial_delta:\nThe starting trust region radius\n\n\ndelta_hat:\n The largest allowable trust region radius\n\n\neta:\n When \nrho\n is at least \neta\n, accept the step.\n\n\nrho_lower:\n When \nrho\n is less than \nrho_lower\n, shrink the trust region.\n\n\nrho_upper:\n When \nrho\n is greater than \nrho_upper\n, grow the trust region (though no greater than \ndelta_hat\n).\n\n\n\n\n\n\nDescription\n\n\nNewton's method with a trust region is designed to take advantage of the second-order information in a function's Hessian, but with more stability that Newton's method when functions are not globally well-approximated by a quadratic.  This is achieved by repeatedly minimizing quadratic approximations within a dynamically-sized \"trust region\" in which the function is assumed to be locally quadratic [1].\n\n\nNewton's method optimizes a quadratic approximation to a function.  When a function is well approximated by a quadratic (for example, near an optimum), Newton's method converges very quickly by exploiting the second-order information in the Hessian matrix.  However, when the function is not well-approximated by a quadratic, either because the starting point is far from the optimum or the function has a more irregular shape, Newton steps can be erratically large, leading to distant, irrelevant areas of the space.\n\n\nTrust region methods use second-order information but restrict the steps to be within a \"trust region\" where the function is believed to be approximately quadratic.  At iteration $k$, a trust region method chooses a step $p$ to minimize a quadratic approximation to the objective such that the step size is no larger than a given trust region size, $\\Delta_k$.\n\n\n\n\n\n\\underset{p\\in\\mathbb{R}^n}\\min m_k(p) = f_k + g_k^T p + \\frac{1}{2}p^T B_k p \\quad\\textrm{such that } ||p||\\le \\Delta_k\n\n\n\n\n\nHere, $p$ is the step to take at iteration $k$, so that $x_{k+1} = x_k + p$.   In the definition of $m_k(p)$, $f_k = f(x_k)$ is the value at the previous location, $g_k=\\nabla f(x_k)$ is the gradient at the previous location, $B_k = \\nabla^2 f(x_k)$ is the Hessian matrix at the previous iterate, and $||\\cdot||$ is the Euclidian norm.\n\n\nIf the trust region size, $\\Delta_k$, is large enough that the minimizer of the quadratic approximation $m_k(p)$ has $||p|| \\le \\Delta_k$, then the step is the same as an ordinary Newton step.  However, if the unconstrained quadratic minimizer lies outside the trust region, then the minimizer to the constrained problem will occur on the boundary, i.e. we will have $||p|| = \\Delta_k$.  It turns out that when the Cholesky decomposition of $B_k$ can be computed, the optimal $p$ can be found numerically with relative ease.  ([1], section 4.3)  This is the method currently used in Optim.\n\n\nIt makes sense to adapt the trust region size, $\\Delta_k$, as one moves through the space and assesses the quality of the quadratic fit.  This adaptation is controlled by the parameters $\\eta$, $\\rho_{lower}$, and $\\rho_{upper}$, which are parameters to the \nNewtonTrustRegion\n optimization method.  For each step, we calculate\n\n\n\n\n\n\\rho_k := \\frac{f(x_{k+1}) - f(x_k)}{m_k(p) - m_k(0)}\n\n\n\n\n\nIntuitively, $\\rho_k$ measures the quality of the quadratic approximation: if $\\rho_k \\approx 1$, then our quadratic approximation is reasonable.  If  $p$ was on the boundary and $\\rho_k \n \\rho_{upper}$, then perhaps we can benefit from larger steps.  In this case, for the next iteration we grow the trust region geometrically up to a maximum of $\\hat\\Delta$:\n\n\n\n\n\n\\rho_k > \\rho_{upper} \\Rightarrow \\Delta_{k+1} = \\min(2 \\Delta_k, \\hat\\Delta).\n\n\n\n\n\nConversely, if $\\rho_k \n \\rho_{lower}$, then we shrink the trust region geometrically:\n\n\n$\\rho_k \n \\rho_{lower} \\Rightarrow \\Delta_{k+1} = 0.25 \\Delta_k$. Finally, we only accept a point if its decrease is appreciable compared to the quadratic approximation.  Specifically, a step is only accepted $\\rho_k \n \\eta$.  As long as we choose $\\eta$ to be less than $\\rho_{lower}$, we will shrink the trust region whenever we reject a step.  Eventually, if the objective function is locally quadratic, $\\Delta_k$ will become small enough that a quadratic approximation will be accurate enough to make progress again.\n\n\n\n\nExample\n\n\nusing\n \nOptim\n\n\nprob\n \n=\n \nOptim\n.\nUnconstrainedProblems\n.\nexamples\n[\nRosenbrock\n];\n\n\nres\n \n=\n \nOptim\n.\noptimize\n(\nprob\n.\nf\n,\n \nprob\n.\ng!\n,\n \nprob\n.\nh!\n,\n \nprob\n.\ninitial_x\n,\n \nmethod\n=\nNewtonTrustRegion\n())\n\n\n\n\n\n\n\n\nReferences\n\n\n[1] Nocedal, Jorge, and Stephen Wright. Numerical optimization. Springer Science \n Business Media, 2006.", 
            "title": "Newton with Trust Region"
        }, 
        {
            "location": "/algo/newton_trust_region/#newtons-method-with-a-trust-region", 
            "text": "", 
            "title": "Newton's Method With a Trust Region"
        }, 
        {
            "location": "/algo/newton_trust_region/#constructor", 
            "text": "NewtonTrustRegion (;   initial_delta   =   1.0 , \n                     delta_hat   =   100.0 , \n                     eta   =   0.1 , \n                     rho_lower   =   0.25 , \n                     rho_upper   =   0.75 )   The constructor takes keywords that determine the initial and maximal size of the trust region, when to grow and shrink the region, and how close the function should be to the quadratic approximation.  The notation follows chapter four of Numerical Optimization.  Below,  rho  $=\\rho$ refers to the ratio of the actual function change to the change in the quadratic approximation for a given step.   initial_delta: The starting trust region radius  delta_hat:  The largest allowable trust region radius  eta:  When  rho  is at least  eta , accept the step.  rho_lower:  When  rho  is less than  rho_lower , shrink the trust region.  rho_upper:  When  rho  is greater than  rho_upper , grow the trust region (though no greater than  delta_hat ).", 
            "title": "Constructor"
        }, 
        {
            "location": "/algo/newton_trust_region/#description", 
            "text": "Newton's method with a trust region is designed to take advantage of the second-order information in a function's Hessian, but with more stability that Newton's method when functions are not globally well-approximated by a quadratic.  This is achieved by repeatedly minimizing quadratic approximations within a dynamically-sized \"trust region\" in which the function is assumed to be locally quadratic [1].  Newton's method optimizes a quadratic approximation to a function.  When a function is well approximated by a quadratic (for example, near an optimum), Newton's method converges very quickly by exploiting the second-order information in the Hessian matrix.  However, when the function is not well-approximated by a quadratic, either because the starting point is far from the optimum or the function has a more irregular shape, Newton steps can be erratically large, leading to distant, irrelevant areas of the space.  Trust region methods use second-order information but restrict the steps to be within a \"trust region\" where the function is believed to be approximately quadratic.  At iteration $k$, a trust region method chooses a step $p$ to minimize a quadratic approximation to the objective such that the step size is no larger than a given trust region size, $\\Delta_k$.   \n\\underset{p\\in\\mathbb{R}^n}\\min m_k(p) = f_k + g_k^T p + \\frac{1}{2}p^T B_k p \\quad\\textrm{such that } ||p||\\le \\Delta_k   Here, $p$ is the step to take at iteration $k$, so that $x_{k+1} = x_k + p$.   In the definition of $m_k(p)$, $f_k = f(x_k)$ is the value at the previous location, $g_k=\\nabla f(x_k)$ is the gradient at the previous location, $B_k = \\nabla^2 f(x_k)$ is the Hessian matrix at the previous iterate, and $||\\cdot||$ is the Euclidian norm.  If the trust region size, $\\Delta_k$, is large enough that the minimizer of the quadratic approximation $m_k(p)$ has $||p|| \\le \\Delta_k$, then the step is the same as an ordinary Newton step.  However, if the unconstrained quadratic minimizer lies outside the trust region, then the minimizer to the constrained problem will occur on the boundary, i.e. we will have $||p|| = \\Delta_k$.  It turns out that when the Cholesky decomposition of $B_k$ can be computed, the optimal $p$ can be found numerically with relative ease.  ([1], section 4.3)  This is the method currently used in Optim.  It makes sense to adapt the trust region size, $\\Delta_k$, as one moves through the space and assesses the quality of the quadratic fit.  This adaptation is controlled by the parameters $\\eta$, $\\rho_{lower}$, and $\\rho_{upper}$, which are parameters to the  NewtonTrustRegion  optimization method.  For each step, we calculate   \n\\rho_k := \\frac{f(x_{k+1}) - f(x_k)}{m_k(p) - m_k(0)}   Intuitively, $\\rho_k$ measures the quality of the quadratic approximation: if $\\rho_k \\approx 1$, then our quadratic approximation is reasonable.  If  $p$ was on the boundary and $\\rho_k   \\rho_{upper}$, then perhaps we can benefit from larger steps.  In this case, for the next iteration we grow the trust region geometrically up to a maximum of $\\hat\\Delta$:   \n\\rho_k > \\rho_{upper} \\Rightarrow \\Delta_{k+1} = \\min(2 \\Delta_k, \\hat\\Delta).   Conversely, if $\\rho_k   \\rho_{lower}$, then we shrink the trust region geometrically:  $\\rho_k   \\rho_{lower} \\Rightarrow \\Delta_{k+1} = 0.25 \\Delta_k$. Finally, we only accept a point if its decrease is appreciable compared to the quadratic approximation.  Specifically, a step is only accepted $\\rho_k   \\eta$.  As long as we choose $\\eta$ to be less than $\\rho_{lower}$, we will shrink the trust region whenever we reject a step.  Eventually, if the objective function is locally quadratic, $\\Delta_k$ will become small enough that a quadratic approximation will be accurate enough to make progress again.", 
            "title": "Description"
        }, 
        {
            "location": "/algo/newton_trust_region/#example", 
            "text": "using   Optim  prob   =   Optim . UnconstrainedProblems . examples [ Rosenbrock ];  res   =   Optim . optimize ( prob . f ,   prob . g! ,   prob . h! ,   prob . initial_x ,   method = NewtonTrustRegion ())", 
            "title": "Example"
        }, 
        {
            "location": "/algo/newton_trust_region/#references", 
            "text": "[1] Nocedal, Jorge, and Stephen Wright. Numerical optimization. Springer Science   Business Media, 2006.", 
            "title": "References"
        }, 
        {
            "location": "/algo/autodiff/", 
            "text": "Automatic Differentiation\n\n\nAs mentioned in the \nMinimizing a function\n section, it is possible to avoid passing gradients even when using gradient based methods. This is because Optim will call the finite central differences functionality in \nCalculus.jl\n in those cases. The advantages are clear: you do not have to write the gradients yourself, and it works for any function you can pass to Optim. However, there is another good way of making the computer provide gradients: automatic differentiation. Again, the advantage is that you can easily get gradients from the objective function alone. As opposed to finite difference, these gradients are exact and we also get Hessians for Newton's method. They can perform better than a finite differences scheme, depending on the exact problem. The disadvantage is that the objective function has to be written using only Julia code, so no calls to BLAS or Fortran functions.\n\n\nLet us consider the Rosenbrock example again.\n\n\nfunction\n \nf\n(\nx\n)\n\n    \nreturn\n \n(\n1.0\n \n-\n \nx\n[\n1\n])\n^\n2\n \n+\n \n100.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n^\n2\n\n\nend\n\n\n\nfunction\n \ng!\n(\nstorage\n,\n \nx\n)\n\n    \nstorage\n[\n1\n]\n \n=\n \n-\n2.0\n \n*\n \n(\n1.0\n \n-\n \nx\n[\n1\n])\n \n-\n \n400.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n \n*\n \nx\n[\n1\n]\n\n    \nstorage\n[\n2\n]\n \n=\n \n200.0\n \n*\n \n(\nx\n[\n2\n]\n \n-\n \nx\n[\n1\n]\n^\n2\n)\n\n\nend\n\n\n\nfunction\n \nh!\n(\nstorage\n,\n \nx\n)\n\n    \nstorage\n[\n1\n,\n \n1\n]\n \n=\n \n2.0\n \n-\n \n400.0\n \n*\n \nx\n[\n2\n]\n \n+\n \n1200.0\n \n*\n \nx\n[\n1\n]\n^\n2\n\n    \nstorage\n[\n1\n,\n \n2\n]\n \n=\n \n-\n400.0\n \n*\n \nx\n[\n1\n]\n\n    \nstorage\n[\n2\n,\n \n1\n]\n \n=\n \n-\n400.0\n \n*\n \nx\n[\n1\n]\n\n    \nstorage\n[\n2\n,\n \n2\n]\n \n=\n \n200.0\n\n\nend\n\n\n\ninitial_x\n \n=\n \nzeros\n(\n2\n)\n\n\n\n\n\n\nLet us see if BFGS and Newton's Method can solve this problem with the functions provided.\n\n\njulia\n \nOptim\n.\nminimizer\n(\noptimize\n(\nf\n,\n \ng!\n,\n \nh!\n,\n \ninitial_x\n,\n \nBFGS\n()))\n\n\n2-element Array{Float64,1}:\n\n\n 1.0\n\n\n 1.0\n\n\n\njulia\n \nOptim\n.\nminimizer\n(\noptimize\n(\nf\n,\n \ng!\n,\n \nh!\n,\n \ninitial_x\n,\n \nNewton\n()))\n\n\n\n2-element Array{Float64,1}:\n\n\n 1.0\n\n\n 1.0\n\n\n\n\n\n\nThis is indeed the case. Now let us use finite differences for BFGS.\n\n\njulia\n \nOptim\n.\nminimizer\n(\noptimize\n(\nf\n,\n \ninitial_x\n,\n \nBFGS\n()))\n\n\n2-element Array{Float64,1}:\n\n\n 1.0\n\n\n 1.0\n\n\n\n\n\n\nStill looks good. Returning to automatic differentiation, let us try both solvers using this method.  We enable \nforward mode\n automatic differentiation by adding \nautodiff = :forward\n when we construct a \nOnceDifferentiable\n instance.\n\n\njulia\n \nod\n \n=\n \nOnceDifferentiable\n(\nf\n,\n \ninitial_x\n;\n \nautodiff\n \n=\n \n:\nforward\n);\n\n\n\njulia\n \nOptim\n.\nminimizer\n(\noptimize\n(\nod\n,\n \ninitial_x\n,\n \nBFGS\n()))\n\n\n2-element Array{Float64,1}:\n\n\n 1.0\n\n\n 1.0\n\n\n\njulia\n \ntd\n \n=\n \nTwiceDifferentiable\n(\nf\n,\n \ninitial_x\n;\n \nautodiff\n \n=\n \n:\nforward\n)\n\n\n\njulia\n \nOptim\n.\nminimizer\n(\noptimize\n(\ntd\n,\n \ninitial_x\n,\n \nNewton\n()))\n\n\n2-element Array{Float64,1}:\n\n\n 1.0\n\n\n 1.0\n\n\n\n\n\n\nIndeed, the minimizer was found, without providing any gradients or Hessians.", 
            "title": "Automatic Differentiation"
        }, 
        {
            "location": "/algo/autodiff/#automatic-differentiation", 
            "text": "As mentioned in the  Minimizing a function  section, it is possible to avoid passing gradients even when using gradient based methods. This is because Optim will call the finite central differences functionality in  Calculus.jl  in those cases. The advantages are clear: you do not have to write the gradients yourself, and it works for any function you can pass to Optim. However, there is another good way of making the computer provide gradients: automatic differentiation. Again, the advantage is that you can easily get gradients from the objective function alone. As opposed to finite difference, these gradients are exact and we also get Hessians for Newton's method. They can perform better than a finite differences scheme, depending on the exact problem. The disadvantage is that the objective function has to be written using only Julia code, so no calls to BLAS or Fortran functions.  Let us consider the Rosenbrock example again.  function   f ( x ) \n     return   ( 1.0   -   x [ 1 ]) ^ 2   +   100.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 ) ^ 2  end  function   g! ( storage ,   x ) \n     storage [ 1 ]   =   - 2.0   *   ( 1.0   -   x [ 1 ])   -   400.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 )   *   x [ 1 ] \n     storage [ 2 ]   =   200.0   *   ( x [ 2 ]   -   x [ 1 ] ^ 2 )  end  function   h! ( storage ,   x ) \n     storage [ 1 ,   1 ]   =   2.0   -   400.0   *   x [ 2 ]   +   1200.0   *   x [ 1 ] ^ 2 \n     storage [ 1 ,   2 ]   =   - 400.0   *   x [ 1 ] \n     storage [ 2 ,   1 ]   =   - 400.0   *   x [ 1 ] \n     storage [ 2 ,   2 ]   =   200.0  end  initial_x   =   zeros ( 2 )   Let us see if BFGS and Newton's Method can solve this problem with the functions provided.  julia   Optim . minimizer ( optimize ( f ,   g! ,   h! ,   initial_x ,   BFGS ()))  2-element Array{Float64,1}:   1.0   1.0  julia   Optim . minimizer ( optimize ( f ,   g! ,   h! ,   initial_x ,   Newton ()))  2-element Array{Float64,1}:   1.0   1.0   This is indeed the case. Now let us use finite differences for BFGS.  julia   Optim . minimizer ( optimize ( f ,   initial_x ,   BFGS ()))  2-element Array{Float64,1}:   1.0   1.0   Still looks good. Returning to automatic differentiation, let us try both solvers using this method.  We enable  forward mode  automatic differentiation by adding  autodiff = :forward  when we construct a  OnceDifferentiable  instance.  julia   od   =   OnceDifferentiable ( f ,   initial_x ;   autodiff   =   : forward );  julia   Optim . minimizer ( optimize ( od ,   initial_x ,   BFGS ()))  2-element Array{Float64,1}:   1.0   1.0  julia   td   =   TwiceDifferentiable ( f ,   initial_x ;   autodiff   =   : forward )  julia   Optim . minimizer ( optimize ( td ,   initial_x ,   Newton ()))  2-element Array{Float64,1}:   1.0   1.0   Indeed, the minimizer was found, without providing any gradients or Hessians.", 
            "title": "Automatic Differentiation"
        }, 
        {
            "location": "/algo/linesearch/", 
            "text": "Line search\n\n\n\n\nDescription\n\n\nThe line search functionality has been moved to \nLineSearches.jl\n.\n\n\nLine search is used to decide the step length along the direction computed by an optimization algorithm.\n\n\nThe following \nOptim\n algorithms use line search:\n\n\n\n\nAccelerated Gradient Descent\n\n\n(L-)BFGS\n\n\nConjugate Gradient\n\n\nGradient Descent\n\n\nMomentum Gradient Descent\n\n\nNewton\n\n\n\n\nBy default \nOptim\n calls the line search algorithm \nHagerZhang()\n provided by \nLineSearches\n. Different line search algorithms can be assigned with the \nlinesearch\n keyword argument to the given algorithm.\n\n\nLineSearches\n also allows the user to decide how the initial step length for the line search algorithm is chosen. This is set with the \nalphaguess\n keyword argument for the \nOptim\n algorithm. The default procedure varies.\n\n\n\n\nExample\n\n\nThis example compares two different line search algorithms on the Rosenbrock problem.\n\n\nFirst, run \nNewton\n with the default line search algorithm:\n\n\nusing\n \nOptim\n,\n \nLineSearches\n\n\nprob\n \n=\n \nOptim\n.\nUnconstrainedProblems\n.\nexamples\n[\nRosenbrock\n]\n\n\n\nalgo_hz\n \n=\n \nNewton\n(;\nalphaguess\n \n=\n \nLineSearches\n.\nInitialStatic\n(),\n \nlinesearch\n \n=\n \nLineSearches\n.\nHagerZhang\n())\n\n\nres_hz\n \n=\n \nOptim\n.\noptimize\n(\nprob\n.\nf\n,\n \nprob\n.\ng!\n,\n \nprob\n.\nh!\n,\n \nprob\n.\ninitial_x\n,\n \nmethod\n=\nalgo_hz\n)\n\n\n\n\n\n\nThis gives the result\n\n\n \n*\n \nAlgorithm\n:\n \nNewton\ns\n \nMethod\n\n \n*\n \nStarting\n \nPoint\n:\n \n[\n0.0\n,\n0.0\n]\n\n \n*\n \nMinimizer\n:\n \n[\n0.9999999999999994\n,\n0.9999999999999989\n]\n\n \n*\n \nMinimum\n:\n \n3.081488e-31\n\n \n*\n \nIterations\n:\n \n14\n\n \n*\n \nConvergence\n:\n \ntrue\n\n   \n*\n \n|\nx\n \n-\n \nx\n|\n \n \n1.0e-32\n:\n \nfalse\n\n     \n|\nx\n \n-\n \nx\n|\n \n=\n \n3.06e-09\n\n   \n*\n \n|\nf\n(\nx\n)\n \n-\n \nf\n(\nx\n)\n|\n \n/\n \n|\nf\n(\nx\n)\n|\n \n \n1.0e-32\n:\n \nfalse\n\n     \n|\nf\n(\nx\n)\n \n-\n \nf\n(\nx\n)\n|\n \n/\n \n|\nf\n(\nx\n)\n|\n \n=\n \n2.94e+13\n\n   \n*\n \n|\ng\n(\nx\n)\n|\n \n \n1.0e-08\n:\n \ntrue\n\n     \n|\ng\n(\nx\n)\n|\n \n=\n \n1.11e-15\n\n   \n*\n \nstopped\n \nby\n \nan\n \nincreasing\n \nobjective\n:\n \nfalse\n\n   \n*\n \nReached\n \nMaximum\n \nNumber\n \nof\n \nIterations\n:\n \nfalse\n\n \n*\n \nObjective\n \nCalls\n:\n \n44\n\n \n*\n \nGradient\n \nCalls\n:\n \n44\n\n \n*\n \nHessian\n \nCalls\n:\n \n14\n\n\n\n\n\n\nNow we can try \nNewton\n with the More-Thuente line search:\n\n\nalgo_mt\n \n=\n \nNewton\n(;\nalphaguess\n \n=\n \nLineSearches\n.\nInitialStatic\n(),\n \nlinesearch\n \n=\n \nLineSearches\n.\nMoreThuente\n())\n\n\nres_mt\n \n=\n \nOptim\n.\noptimize\n(\nprob\n.\nf\n,\n \nprob\n.\ng!\n,\n \nprob\n.\nh!\n,\n \nprob\n.\ninitial_x\n,\n \nmethod\n=\nalgo_mt\n)\n\n\n\n\n\n\nThis gives the following result, reducing the number of function and gradient calls:\n\n\nResults\n \nof\n \nOptimization\n \nAlgorithm\n\n \n*\n \nAlgorithm\n:\n \nNewton\ns\n \nMethod\n\n \n*\n \nStarting\n \nPoint\n:\n \n[\n0.0\n,\n0.0\n]\n\n \n*\n \nMinimizer\n:\n \n[\n0.9999999999999992\n,\n0.999999999999998\n]\n\n \n*\n \nMinimum\n:\n \n2.032549e-29\n\n \n*\n \nIterations\n:\n \n14\n\n \n*\n \nConvergence\n:\n \ntrue\n\n   \n*\n \n|\nx\n \n-\n \nx\n|\n \n \n1.0e-32\n:\n \nfalse\n\n     \n|\nx\n \n-\n \nx\n|\n \n=\n \n3.67e-08\n\n   \n*\n \n|\nf\n(\nx\n)\n \n-\n \nf\n(\nx\n)\n|\n \n/\n \n|\nf\n(\nx\n)\n|\n \n \n1.0e-32\n:\n \nfalse\n\n     \n|\nf\n(\nx\n)\n \n-\n \nf\n(\nx\n)\n|\n \n/\n \n|\nf\n(\nx\n)\n|\n \n=\n \n1.66e+13\n\n   \n*\n \n|\ng\n(\nx\n)\n|\n \n \n1.0e-08\n:\n \ntrue\n\n     \n|\ng\n(\nx\n)\n|\n \n=\n \n1.76e-13\n\n   \n*\n \nstopped\n \nby\n \nan\n \nincreasing\n \nobjective\n:\n \nfalse\n\n   \n*\n \nReached\n \nMaximum\n \nNumber\n \nof\n \nIterations\n:\n \nfalse\n\n \n*\n \nObjective\n \nCalls\n:\n \n17\n\n \n*\n \nGradient\n \nCalls\n:\n \n17\n\n \n*\n \nHessian\n \nCalls\n:\n \n14\n\n\n\n\n\n\n\n\nReferences", 
            "title": "Linesearch"
        }, 
        {
            "location": "/algo/linesearch/#line-search", 
            "text": "", 
            "title": "Line search"
        }, 
        {
            "location": "/algo/linesearch/#description", 
            "text": "The line search functionality has been moved to  LineSearches.jl .  Line search is used to decide the step length along the direction computed by an optimization algorithm.  The following  Optim  algorithms use line search:   Accelerated Gradient Descent  (L-)BFGS  Conjugate Gradient  Gradient Descent  Momentum Gradient Descent  Newton   By default  Optim  calls the line search algorithm  HagerZhang()  provided by  LineSearches . Different line search algorithms can be assigned with the  linesearch  keyword argument to the given algorithm.  LineSearches  also allows the user to decide how the initial step length for the line search algorithm is chosen. This is set with the  alphaguess  keyword argument for the  Optim  algorithm. The default procedure varies.", 
            "title": "Description"
        }, 
        {
            "location": "/algo/linesearch/#example", 
            "text": "This example compares two different line search algorithms on the Rosenbrock problem.  First, run  Newton  with the default line search algorithm:  using   Optim ,   LineSearches  prob   =   Optim . UnconstrainedProblems . examples [ Rosenbrock ]  algo_hz   =   Newton (; alphaguess   =   LineSearches . InitialStatic (),   linesearch   =   LineSearches . HagerZhang ())  res_hz   =   Optim . optimize ( prob . f ,   prob . g! ,   prob . h! ,   prob . initial_x ,   method = algo_hz )   This gives the result    *   Algorithm :   Newton s   Method \n  *   Starting   Point :   [ 0.0 , 0.0 ] \n  *   Minimizer :   [ 0.9999999999999994 , 0.9999999999999989 ] \n  *   Minimum :   3.081488e-31 \n  *   Iterations :   14 \n  *   Convergence :   true \n    *   | x   -   x |     1.0e-32 :   false \n      | x   -   x |   =   3.06e-09 \n    *   | f ( x )   -   f ( x ) |   /   | f ( x ) |     1.0e-32 :   false \n      | f ( x )   -   f ( x ) |   /   | f ( x ) |   =   2.94e+13 \n    *   | g ( x ) |     1.0e-08 :   true \n      | g ( x ) |   =   1.11e-15 \n    *   stopped   by   an   increasing   objective :   false \n    *   Reached   Maximum   Number   of   Iterations :   false \n  *   Objective   Calls :   44 \n  *   Gradient   Calls :   44 \n  *   Hessian   Calls :   14   Now we can try  Newton  with the More-Thuente line search:  algo_mt   =   Newton (; alphaguess   =   LineSearches . InitialStatic (),   linesearch   =   LineSearches . MoreThuente ())  res_mt   =   Optim . optimize ( prob . f ,   prob . g! ,   prob . h! ,   prob . initial_x ,   method = algo_mt )   This gives the following result, reducing the number of function and gradient calls:  Results   of   Optimization   Algorithm \n  *   Algorithm :   Newton s   Method \n  *   Starting   Point :   [ 0.0 , 0.0 ] \n  *   Minimizer :   [ 0.9999999999999992 , 0.999999999999998 ] \n  *   Minimum :   2.032549e-29 \n  *   Iterations :   14 \n  *   Convergence :   true \n    *   | x   -   x |     1.0e-32 :   false \n      | x   -   x |   =   3.67e-08 \n    *   | f ( x )   -   f ( x ) |   /   | f ( x ) |     1.0e-32 :   false \n      | f ( x )   -   f ( x ) |   /   | f ( x ) |   =   1.66e+13 \n    *   | g ( x ) |     1.0e-08 :   true \n      | g ( x ) |   =   1.76e-13 \n    *   stopped   by   an   increasing   objective :   false \n    *   Reached   Maximum   Number   of   Iterations :   false \n  *   Objective   Calls :   17 \n  *   Gradient   Calls :   17 \n  *   Hessian   Calls :   14", 
            "title": "Example"
        }, 
        {
            "location": "/algo/linesearch/#references", 
            "text": "", 
            "title": "References"
        }, 
        {
            "location": "/algo/precondition/", 
            "text": "Preconditioning\n\n\nThe \nGradientDescent\n, \nConjugateGradient\n and \nLBFGS\n methods support preconditioning. A preconditioner can be thought of as a change of coordinates under which the Hessian is better conditioned. With a good preconditioner substantially improved convergence is possible.\n\n\nA preconditioner \nP\ncan be of any type as long as the following two methods are implemented:\n\n\n\n\nA_ldiv_B!(pgr, P, gr)\n : apply \nP\n to a vector \ngr\n and store in \npgr\n     (intuitively, \npgr = P \\ gr\n)\n\n\ndot(x, P, y)\n : the inner product induced by \nP\n     (intuitively, \ndot(x, P * y)\n)\n\n\n\n\nPrecisely what these operations mean, depends on how \nP\n is stored. Commonly, we store a matrix \nP\n which approximates the Hessian in some vague sense. In this case,\n\n\n\n\nA_ldiv_B!(pgr, P, gr) = copy!(pgr, P \\ A)\n\n\ndot(x, P, y) = dot(x, P * y)\n\n\n\n\nFinally, it is possible to update the preconditioner as the state variable \nx\n changes. This is done through  \nprecondprep!\n which is passed to the optimizers as kw-argument, e.g.,\n\n\n   \nmethod\n=\nConjugateGradient\n(\nP\n \n=\n \nprecond\n(\n100\n),\n \nprecondprep!\n \n=\n \nprecond\n(\n100\n))\n\n\n\n\n\n\nthough in this case it would always return the same matrix. (See \nfminbox.jl\n for a more natural example.)\n\n\nApart from preconditioning with matrices, \nOptim.jl\n provides a type \nInverseDiagonal\n, which represents a diagonal matrix by its inverse elements.\n\n\n\n\nExample\n\n\nBelow, we see an example where a function is minimized without and with a preconditioner applied.\n\n\nusing\n \nForwardDiff\n\n\ninitial_x\n \n=\n \nzeros\n(\n100\n)\n\n\nplap\n(\nU\n;\n \nn\n \n=\n \nlength\n(\nU\n))\n \n=\n \n(\nn\n-\n1\n)\n*\nsum\n((\n0.1\n \n+\n \ndiff\n(\nU\n)\n.^\n2\n)\n.^\n2\n \n)\n \n-\n \nsum\n(\nU\n)\n \n/\n \n(\nn\n-\n1\n)\n\n\nplap1\n(\nx\n)\n \n=\n \nForwardDiff\n.\ngradient\n(\nplap\n,\nx\n)\n\n\nprecond\n(\nn\n)\n \n=\n \nspdiagm\n((\n-\nones\n(\nn\n-\n1\n),\n \n2\n*\nones\n(\nn\n),\n \n-\nones\n(\nn\n-\n1\n)),\n \n(\n-\n1\n,\n0\n,\n1\n),\n \nn\n,\n \nn\n)\n*\n(\nn\n+\n1\n)\n\n\ndf\n \n=\n \nOnceDifferentiable\n(\nx\n \n-\n \nplap\n([\n0\n;\n \nx\n;\n \n0\n]),\n\n                            \n(\ng\n,\n \nx\n)\n \n-\n \ncopy!\n(\ng\n,\n \n(\nplap1\n([\n0\n;\n \nx\n;\n \n0\n]))[\n2\n:\nend\n-\n1\n]))\n\n\nresult\n \n=\n \nOptim\n.\noptimize\n(\ndf\n,\n \ninitial_x\n,\n \nmethod\n \n=\n \nConjugateGradient\n(\nP\n \n=\n \nnothing\n))\n\n\nresult\n \n=\n \nOptim\n.\noptimize\n(\ndf\n,\n \ninitial_x\n,\n \nmethod\n \n=\n \nConjugateGradient\n(\nP\n \n=\n \nprecond\n(\n100\n)))\n\n\n\n\n\n\nThe former optimize call converges at a slower rate than the latter. Looking at a  plot of the 2D version of the function shows the problem.\n\n\n\n\nThe contours are shaped like ellipsoids, but we would rather want them to be circles. Using the preconditioner effectively changes the coordinates such that the contours becomes less ellipsoid-like. Benchmarking shows that using preconditioning provides  an approximate speed-up factor of 15 in this 100 dimensional case.\n\n\n\n\nReferences", 
            "title": "Preconditioners"
        }, 
        {
            "location": "/algo/precondition/#preconditioning", 
            "text": "The  GradientDescent ,  ConjugateGradient  and  LBFGS  methods support preconditioning. A preconditioner can be thought of as a change of coordinates under which the Hessian is better conditioned. With a good preconditioner substantially improved convergence is possible.  A preconditioner  P can be of any type as long as the following two methods are implemented:   A_ldiv_B!(pgr, P, gr)  : apply  P  to a vector  gr  and store in  pgr      (intuitively,  pgr = P \\ gr )  dot(x, P, y)  : the inner product induced by  P      (intuitively,  dot(x, P * y) )   Precisely what these operations mean, depends on how  P  is stored. Commonly, we store a matrix  P  which approximates the Hessian in some vague sense. In this case,   A_ldiv_B!(pgr, P, gr) = copy!(pgr, P \\ A)  dot(x, P, y) = dot(x, P * y)   Finally, it is possible to update the preconditioner as the state variable  x  changes. This is done through   precondprep!  which is passed to the optimizers as kw-argument, e.g.,      method = ConjugateGradient ( P   =   precond ( 100 ),   precondprep!   =   precond ( 100 ))   though in this case it would always return the same matrix. (See  fminbox.jl  for a more natural example.)  Apart from preconditioning with matrices,  Optim.jl  provides a type  InverseDiagonal , which represents a diagonal matrix by its inverse elements.", 
            "title": "Preconditioning"
        }, 
        {
            "location": "/algo/precondition/#example", 
            "text": "Below, we see an example where a function is minimized without and with a preconditioner applied.  using   ForwardDiff  initial_x   =   zeros ( 100 )  plap ( U ;   n   =   length ( U ))   =   ( n - 1 ) * sum (( 0.1   +   diff ( U ) .^ 2 ) .^ 2   )   -   sum ( U )   /   ( n - 1 )  plap1 ( x )   =   ForwardDiff . gradient ( plap , x )  precond ( n )   =   spdiagm (( - ones ( n - 1 ),   2 * ones ( n ),   - ones ( n - 1 )),   ( - 1 , 0 , 1 ),   n ,   n ) * ( n + 1 )  df   =   OnceDifferentiable ( x   -   plap ([ 0 ;   x ;   0 ]), \n                             ( g ,   x )   -   copy! ( g ,   ( plap1 ([ 0 ;   x ;   0 ]))[ 2 : end - 1 ]))  result   =   Optim . optimize ( df ,   initial_x ,   method   =   ConjugateGradient ( P   =   nothing ))  result   =   Optim . optimize ( df ,   initial_x ,   method   =   ConjugateGradient ( P   =   precond ( 100 )))   The former optimize call converges at a slower rate than the latter. Looking at a  plot of the 2D version of the function shows the problem.   The contours are shaped like ellipsoids, but we would rather want them to be circles. Using the preconditioner effectively changes the coordinates such that the contours becomes less ellipsoid-like. Benchmarking shows that using preconditioning provides  an approximate speed-up factor of 15 in this 100 dimensional case.", 
            "title": "Example"
        }, 
        {
            "location": "/algo/precondition/#references", 
            "text": "", 
            "title": "References"
        }, 
        {
            "location": "/algo/complex/", 
            "text": "Complex optimization\n\n\nOptimization of functions defined on complex inputs (C^n to R) is supported by simply passing a complex \nx0\n as input. All zeroth and first order optimization algorithms are supported. For now, only explicit gradients are supported.\n\n\nThe gradient of a complex-to-real function is defined as the only vector \ng\n such that \nf(x+h) = f(x) + real(g' * h) + O(h^2)\n. This is sometimes written \ng = df/d(z*) = df/d(re(z)) + i df/d(im(z))\n.\n\n\nThe gradient of a C^n to R function is a C^n to C^n map. Even if it is differentiable when seen as a function of R^2n to R^2n, it might not be complex-differentiable. For instance, take f(z) = Re(z)^2. Then g(z) = 2 Re(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a C^n to R function is in general not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.", 
            "title": "Complex optimization"
        }, 
        {
            "location": "/algo/complex/#complex-optimization", 
            "text": "Optimization of functions defined on complex inputs (C^n to R) is supported by simply passing a complex  x0  as input. All zeroth and first order optimization algorithms are supported. For now, only explicit gradients are supported.  The gradient of a complex-to-real function is defined as the only vector  g  such that  f(x+h) = f(x) + real(g' * h) + O(h^2) . This is sometimes written  g = df/d(z*) = df/d(re(z)) + i df/d(im(z)) .  The gradient of a C^n to R function is a C^n to C^n map. Even if it is differentiable when seen as a function of R^2n to R^2n, it might not be complex-differentiable. For instance, take f(z) = Re(z)^2. Then g(z) = 2 Re(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a C^n to R function is in general not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.", 
            "title": "Complex optimization"
        }, 
        {
            "location": "/algo/manifolds/", 
            "text": "Manifold optimization\n\n\nOptim.jl supports the minimization of functions defined on Riemannian manifolds, i.e. with simple constraints such as normalization and orthogonality. The basic idea of such algorithms is to project back (\"retract\") each iterate of an unconstrained minimization method onto the manifold. This is used by passing a \nmanifold\n keyword argument to the optimizer.\n\n\n\n\nHowto\n\n\nHere is a simple test case where we minimize the Rayleigh quotient \nx, A x\n of a symmetric matrix \nA\n under the constraint \n||x|| = 1\n, finding an eigenvector associated with the lowest eigenvalue of \nA\n.\n\n\nn\n \n=\n \n10\n\n\nA\n \n=\n \nDiagonal\n(\nlinspace\n(\n1\n,\n2\n,\nn\n))\n\n\nf\n(\nx\n)\n \n=\n \nvecdot\n(\nx\n,\nA\n*\nx\n)\n/\n2\n\n\ng\n(\nx\n)\n \n=\n \nA\n*\nx\n\n\ng!\n(\nstor\n,\nx\n)\n \n=\n \ncopy!\n(\nstor\n,\ng\n(\nx\n))\n\n\nx0\n \n=\n \nrandn\n(\nn\n)\n\n\n\nmanif\n \n=\n \nOptim\n.\nSphere\n()\n\n\nOptim\n.\noptimize\n(\nf\n,\n \ng!\n,\n \nx0\n,\n \nOptim\n.\nConjugateGradient\n(\nmanifold\n=\nmanif\n))\n\n\n\n\n\n\n\n\nSupported solvers and manifolds\n\n\nAll first-order optimization methods are supported.\n\n\nThe following manifolds are currently supported:\n\n\n\n\nFlat: Euclidean space, default. Standard unconstrained optimization.\n\n\nSphere: spherical constraint \n||x|| = 1\n\n\nStiefel: Stiefel manifold of N by n matrices with orthogonal columns, i.e. \nX'*X = I\n\n\n\n\nThe following meta-manifolds construct manifolds out of pre-existing ones:\n\n\n\n\nPowerManifold: identical copies of a specified manifold\n\n\nProductManifold: product of two (potentially different) manifolds\n\n\n\n\nSee \ntest/multivariate/manifolds.jl\n for usage examples.\n\n\nImplementing new manifolds is as simple as adding methods \nproject_tangent!(M::YourManifold,x)\n and \nretract!(M::YourManifold,g,x)\n. If you implement another manifold or optimization method, please contribute a PR!\n\n\n\n\nReferences\n\n\nThe Geometry of Algorithms with Orthogonality Constraints, Alan Edelman, Tom\u00e1s A. Arias, Steven T. Smith, SIAM. J. Matrix Anal. \n Appl., 20(2), 303\u2013353\n\n\nOptimization Algorithms on Matrix Manifolds, P.-A. Absil, R. Mahony, R. Sepulchre, Princeton University Press, 2008", 
            "title": "Manifolds"
        }, 
        {
            "location": "/algo/manifolds/#manifold-optimization", 
            "text": "Optim.jl supports the minimization of functions defined on Riemannian manifolds, i.e. with simple constraints such as normalization and orthogonality. The basic idea of such algorithms is to project back (\"retract\") each iterate of an unconstrained minimization method onto the manifold. This is used by passing a  manifold  keyword argument to the optimizer.", 
            "title": "Manifold optimization"
        }, 
        {
            "location": "/algo/manifolds/#howto", 
            "text": "Here is a simple test case where we minimize the Rayleigh quotient  x, A x  of a symmetric matrix  A  under the constraint  ||x|| = 1 , finding an eigenvector associated with the lowest eigenvalue of  A .  n   =   10  A   =   Diagonal ( linspace ( 1 , 2 , n ))  f ( x )   =   vecdot ( x , A * x ) / 2  g ( x )   =   A * x  g! ( stor , x )   =   copy! ( stor , g ( x ))  x0   =   randn ( n )  manif   =   Optim . Sphere ()  Optim . optimize ( f ,   g! ,   x0 ,   Optim . ConjugateGradient ( manifold = manif ))", 
            "title": "Howto"
        }, 
        {
            "location": "/algo/manifolds/#supported-solvers-and-manifolds", 
            "text": "All first-order optimization methods are supported.  The following manifolds are currently supported:   Flat: Euclidean space, default. Standard unconstrained optimization.  Sphere: spherical constraint  ||x|| = 1  Stiefel: Stiefel manifold of N by n matrices with orthogonal columns, i.e.  X'*X = I   The following meta-manifolds construct manifolds out of pre-existing ones:   PowerManifold: identical copies of a specified manifold  ProductManifold: product of two (potentially different) manifolds   See  test/multivariate/manifolds.jl  for usage examples.  Implementing new manifolds is as simple as adding methods  project_tangent!(M::YourManifold,x)  and  retract!(M::YourManifold,g,x) . If you implement another manifold or optimization method, please contribute a PR!", 
            "title": "Supported solvers and manifolds"
        }, 
        {
            "location": "/algo/manifolds/#references", 
            "text": "The Geometry of Algorithms with Orthogonality Constraints, Alan Edelman, Tom\u00e1s A. Arias, Steven T. Smith, SIAM. J. Matrix Anal.   Appl., 20(2), 303\u2013353  Optimization Algorithms on Matrix Manifolds, P.-A. Absil, R. Mahony, R. Sepulchre, Princeton University Press, 2008", 
            "title": "References"
        }, 
        {
            "location": "/dev/contributing/", 
            "text": "Notes for contributing\n\n\nWe are always happy to get help from people who normally do not contribute to the package. However, to make the process run smoothly, we ask you to read this page before creating your pull request. That way it is more probable that your changes will be incorporated, and in the end it will mean less work for everyone.\n\n\n\n\nThings to consider\n\n\nWhen proposing a change to \nOptim.jl\n, there are a few things to consider. If you're in doubt feel free to reach out. A simple way to get in touch, is to join our \ngitter channel\n.\n\n\nBefore submitting a pull request, please consider the following bullets:\n\n\n\n\nDid you remember to provide tests for your changes? If not, please do so, or ask for help.\n\n\nDid your change add new functionality? Remember to add a section in the documentation.\n\n\nDid you change existing code in a breaking way? Then remember to use Julia's deprecation tools to help users migrate to the new syntax.\n\n\nAdd a note in the NEWS.md file, so we can keep track of changes between versions.\n\n\n\n\n\n\nAdding a solver\n\n\nIf you're contributing a new solver, you shouldn't need to touch any of the code in \nsrc/optimize.jl\n. You should rather add a file named (\nsolver\n is the name of the solver) \nsolver.jl\n in \nsrc\n, and make sure that you define an \nOptimizer\n subtype \nstruct Solver \n: Optimizer end\n with appropriate fields, a default constructor with a keyword for each field, a state type that holds all variables that are (re)used throughout the iterative procedure, an \ninitial_state\n that initializes such a state, and  an \nupdate!\n method that does the actual work. Say you want to contribute a solver called \nMinim\n, then your \nsrc/minim.jl\n file would look something like\n\n\nstruct\n \nMinim\n{\nIF\n,\n \nF\n:\nFunction\n,\n \nT\n}\n \n:\n \nOptimizer\n\n    \nalphaguess\n!::\nIF\n\n    \nlinesearch\n!::\nF\n\n    \nminim_parameter\n::\nT\n\n\nend\n\n\n\nMinim\n(;\n \nalphaguess\n \n=\n \nLineSearches\n.\nInitialStatic\n(),\n \nlinesearch\n \n=\n \nLineSearches\n.\nHagerZhang\n(),\n \nminim_parameter\n \n=\n \n1.0\n)\n \n=\n\n  \nMinim\n(\nlinesearch\n,\n \nminim_parameter\n)\n\n\n\ntype\n \nMinimState\n{\nT\n,\nN\n,\nG\n}\n\n  \nx\n::\nArray\n{\nT\n,\nN\n}\n\n  \nx_previous\n::\nArray\n{\nT\n,\nN\n}\n\n  \nf_x_previous\n::\nT\n\n  \ns\n::\nArray\n{\nT\n,\nN\n}\n\n  \n@add_linesearch_fields\n()\n\n\nend\n\n\n\nfunction\n \ninitial_state\n(\nmethod\n::\nMinim\n,\n \noptions\n,\n \nd\n,\n \ninitial_x\n)\n\n\n#\n \nprepare\n \ncache\n \nvariables\n \netc\n \nhere\n\n\n\nend\n\n\n\nfunction\n \nupdate\n!\n{\nT\n}(\nd\n,\n \nstate\n::\nMinimState\n{\nT\n},\n \nmethod\n::\nMinim\n)\n\n    \n#\n \ncode\n \nfor\n \nMinim\n \nhere\n\n    \nfalse\n \n#\n \nshould\n \nthe\n \nprocedure\n \nforce\n \nquit\n?\n\n\nend", 
            "title": "Contributing"
        }, 
        {
            "location": "/dev/contributing/#notes-for-contributing", 
            "text": "We are always happy to get help from people who normally do not contribute to the package. However, to make the process run smoothly, we ask you to read this page before creating your pull request. That way it is more probable that your changes will be incorporated, and in the end it will mean less work for everyone.", 
            "title": "Notes for contributing"
        }, 
        {
            "location": "/dev/contributing/#things-to-consider", 
            "text": "When proposing a change to  Optim.jl , there are a few things to consider. If you're in doubt feel free to reach out. A simple way to get in touch, is to join our  gitter channel .  Before submitting a pull request, please consider the following bullets:   Did you remember to provide tests for your changes? If not, please do so, or ask for help.  Did your change add new functionality? Remember to add a section in the documentation.  Did you change existing code in a breaking way? Then remember to use Julia's deprecation tools to help users migrate to the new syntax.  Add a note in the NEWS.md file, so we can keep track of changes between versions.", 
            "title": "Things to consider"
        }, 
        {
            "location": "/dev/contributing/#adding-a-solver", 
            "text": "If you're contributing a new solver, you shouldn't need to touch any of the code in  src/optimize.jl . You should rather add a file named ( solver  is the name of the solver)  solver.jl  in  src , and make sure that you define an  Optimizer  subtype  struct Solver  : Optimizer end  with appropriate fields, a default constructor with a keyword for each field, a state type that holds all variables that are (re)used throughout the iterative procedure, an  initial_state  that initializes such a state, and  an  update!  method that does the actual work. Say you want to contribute a solver called  Minim , then your  src/minim.jl  file would look something like  struct   Minim { IF ,   F : Function ,   T }   :   Optimizer \n     alphaguess !:: IF \n     linesearch !:: F \n     minim_parameter :: T  end  Minim (;   alphaguess   =   LineSearches . InitialStatic (),   linesearch   =   LineSearches . HagerZhang (),   minim_parameter   =   1.0 )   = \n   Minim ( linesearch ,   minim_parameter )  type   MinimState { T , N , G } \n   x :: Array { T , N } \n   x_previous :: Array { T , N } \n   f_x_previous :: T \n   s :: Array { T , N } \n   @add_linesearch_fields ()  end  function   initial_state ( method :: Minim ,   options ,   d ,   initial_x )  #   prepare   cache   variables   etc   here  end  function   update ! { T }( d ,   state :: MinimState { T },   method :: Minim ) \n     #   code   for   Minim   here \n     false   #   should   the   procedure   force   quit ?  end", 
            "title": "Adding a solver"
        }, 
        {
            "location": "/LICENSE/", 
            "text": "Optim.jl is licensed under the MIT License:\n\n\nCopyright (c) 2012: John Myles White and other contributors.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \nSoftware\n), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \nAS IS\n, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.", 
            "title": "License"
        }
    ]
}