WIP: constrained optimization #50

Closed
wants to merge 29 commits into
from

Conversation

Projects
None yet
6 participants
@timholy
Contributor

timholy commented Mar 11, 2014

This is not done yet, but I thought I should put it out there so people can comment sooner rather than later, if it looks like this is going in the wrong direction.

When finished, this should implement all the standard inequality constraints; equality constraints may come, but likely will be a little later (I have a pressing need for the inequalities, but not the equalities). The only algorithm I've implemented is a central-path-following interior-point solver---but I'd venture to say that if you could only pick one, interior-point is the way to go.

Using this, a version of nnls (just two lines, objective = linlsq(A, b); result = interior(objective, initial_x, bounds, method=:newton)) is 100x faster on a 50x50 problem compared to the existing implementation of nnls in Optim.

Left to do:

  • Add support for constraints to other linesearch algorithms
  • Implement value, gradient, and hessian for barrier functions of linear and nonlinear constraints
  • Add more tests

In addition to adding support for constrained optimization, this makes several relevant improvements to hz_linesearch, in particular ones that are smarter about some of the odd behaviors of floating-point numbers, and others that will make constrained optimization perform significantly better. Thanks to @JeffBezanson for his quick work on JuliaLang/julia#6097, making it feasible to debug some of the darker corners of hz_linesearch.

One final point: note that this changes how the problems directory is handled. I changed it because it wasn't obvious to me that we wanted this to be a regular part of loading Optim. Perhaps we should go one step further, and only load it when running tests (and not even have testpaths in Optim.jl).

@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Mar 11, 2014

Contributor

Nice! How difficult would it be to support sparse Hessians and constraint Jacobians? We could then pretty easily use this as the first pure-Julia backend for JuMP.

Contributor

mlubin commented Mar 11, 2014

Nice! How difficult would it be to support sparse Hessians and constraint Jacobians? We could then pretty easily use this as the first pure-Julia backend for JuMP.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 11, 2014

Contributor

As you might notice, I began a process of switching declarations over to AbstractMatrix and AbstractVector. So it should not be hard. The actual fill-in will of course be up to the user, just as the user currently has to write the objective function.

For the constraints, my thinking is that the user writes the functions so they add to an existing gradient/Hessian (one computed by the objective function), rather than having the algorithm provide a blank vector/matrix and then copy the results into the combined gradient/Hessian. (See the beginning of interior.jl.) That's precisely so we can better support sparsity (all that copying would destroy any advantage of sparsity).

Contributor

timholy commented Mar 11, 2014

As you might notice, I began a process of switching declarations over to AbstractMatrix and AbstractVector. So it should not be hard. The actual fill-in will of course be up to the user, just as the user currently has to write the objective function.

For the constraints, my thinking is that the user writes the functions so they add to an existing gradient/Hessian (one computed by the objective function), rather than having the algorithm provide a blank vector/matrix and then copy the results into the combined gradient/Hessian. (See the beginning of interior.jl.) That's precisely so we can better support sparsity (all that copying would destroy any advantage of sparsity).

@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Mar 11, 2014

Contributor

The typical interface for these sorts of solvers it to provide a callback to compute the Hessian of the Lagrangian directly, where you provide the weights to the users. Check out http://ipoptjl.readthedocs.org/en/latest/ipopt.html.

We're soon going to be developing a standardized interface for this in MathProgBase, as we have for linear programming.

Contributor

mlubin commented Mar 11, 2014

The typical interface for these sorts of solvers it to provide a callback to compute the Hessian of the Lagrangian directly, where you provide the weights to the users. Check out http://ipoptjl.readthedocs.org/en/latest/ipopt.html.

We're soon going to be developing a standardized interface for this in MathProgBase, as we have for linear programming.

@johnmyleswhite

This comment has been minimized.

Show comment
Hide comment
@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

This seems like a great start. I don't see anything that troubles me in a quick review. Will do a more detailed pass through tomorrow morning.

Contributor

johnmyleswhite commented Mar 11, 2014

This seems like a great start. I don't see anything that troubles me in a quick review. Will do a more detailed pass through tomorrow morning.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 11, 2014

Contributor

I can see passing Lagrange multipliers/coefficients to the user---that's indeed probably a better design than my attempt to hide it from the user. But there are some other things I don't understand (or don't like):

  • Why do you need to store the gradient of each constraint separately? (I.e., why do you even need the full Jacobian?) At least for interior-point problems without equality constraints, the only thing you need is the summed gradient of both the objective and the constraints. But perhaps other solvers benefit from having the full Jacobian? In which case we'd better plan for this.

  • Why should the user have to write a single function that incorporates all the constraints together with the objective? It seems to encourage better design to have the solver make the calls that accumulate each term separately, i.e.:

    objective_gradient!(x, g)                # This initializes g and fills it with values
    constraintA_gradient!(x, g, lambda1)     # Adds to g, doesn't replace g
    constraintB_gradient!(x, g, lambda2)
    ...
    

    The key advantage is that once I implement the functions for constraintA, I can re-use it again without modification for other problems, simply by saying something like

    add_nlineq!(constraints, DifferentiableFunction(constraintA_objective, constraintA_gradient!))
    

    In many fields there are a small handful of standard penalty functions or constraints, which can be used in conjunction with many different objective functions.

  • It seems inconsistent to provide the weights for computing the Hessian but not the Jacobian

In your case, I recognize that you simply have to live with the design choices make by Ipopt, but I'm not sure we should mimic it entirely.

Contributor

timholy commented Mar 11, 2014

I can see passing Lagrange multipliers/coefficients to the user---that's indeed probably a better design than my attempt to hide it from the user. But there are some other things I don't understand (or don't like):

  • Why do you need to store the gradient of each constraint separately? (I.e., why do you even need the full Jacobian?) At least for interior-point problems without equality constraints, the only thing you need is the summed gradient of both the objective and the constraints. But perhaps other solvers benefit from having the full Jacobian? In which case we'd better plan for this.

  • Why should the user have to write a single function that incorporates all the constraints together with the objective? It seems to encourage better design to have the solver make the calls that accumulate each term separately, i.e.:

    objective_gradient!(x, g)                # This initializes g and fills it with values
    constraintA_gradient!(x, g, lambda1)     # Adds to g, doesn't replace g
    constraintB_gradient!(x, g, lambda2)
    ...
    

    The key advantage is that once I implement the functions for constraintA, I can re-use it again without modification for other problems, simply by saying something like

    add_nlineq!(constraints, DifferentiableFunction(constraintA_objective, constraintA_gradient!))
    

    In many fields there are a small handful of standard penalty functions or constraints, which can be used in conjunction with many different objective functions.

  • It seems inconsistent to provide the weights for computing the Hessian but not the Jacobian

In your case, I recognize that you simply have to live with the design choices make by Ipopt, but I'm not sure we should mimic it entirely.

@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Mar 11, 2014

Contributor

Different interior-point approaches do require the full Jacobian. If there are equality constraints, the KKT system is typically formulated as

[ H A^T ]
[ A  0  ]

where A is the Jacobian of the equality constraints (might have a transpose incorrect here). For sparse problems it's reasonable to formulate the KKT conditions using the explicit inequalities instead of eliminating them from the system. I'm mostly familiar with primal-dual interior-point methods, so things might be slightly different with primal methods. Anyway it doesn't make too much sense to have a "summing" interface to provide inequality constraints and another interface for equality constraints. The interface should also try to be uniform over different algorithms.

It's reasonable to have an interface where the terms are summed separately. This is more difficult if you have a sparse hessian, since you can't easily add to a sparse matrix (but the trick here is that solvers let you provide the matrix in triplet format, and duplicate terms are summed together). I don't have any objections to providing a convenient interface like this, as long as there's a way to write a small wrapper to transform Ipopt-style input to use this code.

Note that almost all high-performance nonlinear solvers out there (like KNITRO, MOSEK) have an interface that looks like Ipopt's. They're certainly not user friendly, though they're meant to be used from modeling systems like AMPL, GAMS, and now JuMP that can automatically provide sparse derivatives.

Contributor

mlubin commented Mar 11, 2014

Different interior-point approaches do require the full Jacobian. If there are equality constraints, the KKT system is typically formulated as

[ H A^T ]
[ A  0  ]

where A is the Jacobian of the equality constraints (might have a transpose incorrect here). For sparse problems it's reasonable to formulate the KKT conditions using the explicit inequalities instead of eliminating them from the system. I'm mostly familiar with primal-dual interior-point methods, so things might be slightly different with primal methods. Anyway it doesn't make too much sense to have a "summing" interface to provide inequality constraints and another interface for equality constraints. The interface should also try to be uniform over different algorithms.

It's reasonable to have an interface where the terms are summed separately. This is more difficult if you have a sparse hessian, since you can't easily add to a sparse matrix (but the trick here is that solvers let you provide the matrix in triplet format, and duplicate terms are summed together). I don't have any objections to providing a convenient interface like this, as long as there's a way to write a small wrapper to transform Ipopt-style input to use this code.

Note that almost all high-performance nonlinear solvers out there (like KNITRO, MOSEK) have an interface that looks like Ipopt's. They're certainly not user friendly, though they're meant to be used from modeling systems like AMPL, GAMS, and now JuMP that can automatically provide sparse derivatives.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 11, 2014

Contributor

I did specify in my (edited) comment problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles. Suppose you add linear equality constraints; for such constraints, there's no need to ask the user to supply anything other than the matrix and rhs---all the derivatives can all be handled internally, so again I don't see the need to trouble the user for the full Jacobian. Where one has to think about it is for nonlinear equality constraints, but my suspicion is that there one would be going with an augmented Lagrangian anyway, making it a little different problem. I'll think about it more, however.

The solver here "looks" primal but it exploits duality---eps_gap is the duality gap. One advantage of this formulation (which is straight out of Hindi's tutorial, based in turn on Boyd & Vandenberghe) is that you can use it with methods that don't require the Hessian (cg, etc). That's already implemented, in fact. I was quite surprised that even for a nnls problem, where you have the whole matrix, cg was only ~10x slower than Newton's method.

Contributor

timholy commented Mar 11, 2014

I did specify in my (edited) comment problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles. Suppose you add linear equality constraints; for such constraints, there's no need to ask the user to supply anything other than the matrix and rhs---all the derivatives can all be handled internally, so again I don't see the need to trouble the user for the full Jacobian. Where one has to think about it is for nonlinear equality constraints, but my suspicion is that there one would be going with an augmented Lagrangian anyway, making it a little different problem. I'll think about it more, however.

The solver here "looks" primal but it exploits duality---eps_gap is the duality gap. One advantage of this formulation (which is straight out of Hindi's tutorial, based in turn on Boyd & Vandenberghe) is that you can use it with methods that don't require the Hessian (cg, etc). That's already implemented, in fact. I was quite surprised that even for a nnls problem, where you have the whole matrix, cg was only ~10x slower than Newton's method.

src/bfgs.jl
@@ -138,6 +138,18 @@ function bfgs{T}(d::Union(DifferentiableFunction,
f_x_previous, f_x = f_x, d.fg!(x, gr)
f_calls, g_calls = f_calls + 1, g_calls + 1
+ x_converged,

This comment has been minimized.

@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Does moving the convergence check here change the inputs? If so, were the previous inputs not correct?

@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Does moving the convergence check here change the inputs? If so, were the previous inputs not correct?

This comment has been minimized.

@timholy

timholy Mar 11, 2014

Contributor

The issue was that there were other exit points from the function (https://github.com/JuliaOpt/Optim.jl/blob/master/src/bfgs.jl#L149). So it was indicating it hadn't converged, but that test often gets triggered because it did converge.

@timholy

timholy Mar 11, 2014

Contributor

The issue was that there were other exit points from the function (https://github.com/JuliaOpt/Optim.jl/blob/master/src/bfgs.jl#L149). So it was indicating it hadn't converged, but that test often gets triggered because it did converge.

src/constraints.jl
+ConstraintsBox{T,N}(l::AbstractArray{T,N}, u::Nothing) = ConstraintsBox{T,N}(l::AbstractArray{T,N}, infs(T, size(l)))
+ConstraintsBox{T,N}(l::Nothing, u::AbstractArray{T,N}) = ConstraintsBox{T,N}(-infs(T,size(u)), u::AbstractArray{T,N})
+
+# FIXME: need a way to indicate equality/inequality

This comment has been minimized.

@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Can we use a BitArray to indicate equality vs. inequality?

@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Can we use a BitArray to indicate equality vs. inequality?

This comment has been minimized.

@timholy

timholy Mar 11, 2014

Contributor

Sorry, I meant "are these to be interpreted as equality constraints, or inequality constraints?" If you like the Constraints type below, then this isn't an issue.

@timholy

timholy Mar 11, 2014

Contributor

Sorry, I meant "are these to be interpreted as equality constraints, or inequality constraints?" If you like the Constraints type below, then this isn't an issue.

src/constraints.jl
+
+
+## project!
+project!(x, constraints::AbstractConstraints) = x

This comment has been minimized.

@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Is the assumption here that specific constraints will implement their own project! function?

@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Is the assumption here that specific constraints will implement their own project! function?

This comment has been minimized.

@timholy

timholy Mar 11, 2014

Contributor

Great question. For complicated constraints, I don't think project!, toedge, and tocorner can be much benefit. I plan to handle those cases by having the constraint return Inf when the point has wandered into forbidden territory; the linesearch algorithm then has to backtrack.

However, for a simple case of bounds constraints, you can quickly calculate the farthest you should search, and this can be very helpful information that speeds convergence by eliminating the need to test useless points. So they seemed worth having, but not worth trying to generalize to difficult cases.

@timholy

timholy Mar 11, 2014

Contributor

Great question. For complicated constraints, I don't think project!, toedge, and tocorner can be much benefit. I plan to handle those cases by having the constraint return Inf when the point has wandered into forbidden territory; the linesearch algorithm then has to backtrack.

However, for a simple case of bounds constraints, you can quickly calculate the farthest you should search, and this can be very helpful information that speeds convergence by eliminating the need to test useless points. So they seemed worth having, but not worth trying to generalize to difficult cases.

@johnmyleswhite

This comment has been minimized.

Show comment
Hide comment
@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Having read this more carefully, one question I have is why we should have a linlsq function in Optim. What's the use case for it?

Contributor

johnmyleswhite commented Mar 11, 2014

Having read this more carefully, one question I have is why we should have a linlsq function in Optim. What's the use case for it?

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Mar 11, 2014

problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles

Most of the serious large-scale solvers use a row-wise interface with a lower and an upper bound for each constraint, one of which can be +/- inf. Equality constraints have lb[i] = ub[i]. From https://projects.coin-or.org/Ipopt/wiki

   min     f(x)
x in R^n

s.t.       g_L <= g(x) <= g_U
           x_L <=  x   <= x_U

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

The KKT system has the Jacobians of both equality and inequality constraints in it, in the sparse non-convex case you don't want to eliminate anything ahead of time, the sparse symmetric indefinite linear solver will do a better job at reducing fill-in. The zero block in the lower right isn't usually actually zero, an inertia perturbation is required to guarantee convergence properties.

When some constraint rows are linear, you could technically leave those nonzeros in the Jacobian untouched between interations and save a few copies, but the savings there are minimal when compared to the KKT factorization cost.

tkelman commented Mar 11, 2014

problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles

Most of the serious large-scale solvers use a row-wise interface with a lower and an upper bound for each constraint, one of which can be +/- inf. Equality constraints have lb[i] = ub[i]. From https://projects.coin-or.org/Ipopt/wiki

   min     f(x)
x in R^n

s.t.       g_L <= g(x) <= g_U
           x_L <=  x   <= x_U

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

The KKT system has the Jacobians of both equality and inequality constraints in it, in the sparse non-convex case you don't want to eliminate anything ahead of time, the sparse symmetric indefinite linear solver will do a better job at reducing fill-in. The zero block in the lower right isn't usually actually zero, an inertia perturbation is required to guarantee convergence properties.

When some constraint rows are linear, you could technically leave those nonzeros in the Jacobian untouched between interations and save a few copies, but the savings there are minimal when compared to the KKT factorization cost.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 11, 2014

Contributor

one question I have is why we should have a linlsq function in Optim

This creates an objective function for least squares, but then you can supplement this with constraints. Using this, nonnegative least-squares becomes a few lines:

objective = linlsq(A, b)
bounds = ConstraintsBox(zeros(eltype(x), length(x)), nothing)
results = interior(objective, initial_x, bounds, method=:newton)

As I mentioned above, I tested this on a particular 50x50 problem and found this version to be roughly 100x faster than the nnls I wrote initially and is currently in Optim.

Contributor

timholy commented Mar 11, 2014

one question I have is why we should have a linlsq function in Optim

This creates an objective function for least squares, but then you can supplement this with constraints. Using this, nonnegative least-squares becomes a few lines:

objective = linlsq(A, b)
bounds = ConstraintsBox(zeros(eltype(x), length(x)), nothing)
results = interior(objective, initial_x, bounds, method=:newton)

As I mentioned above, I tested this on a particular 50x50 problem and found this version to be roughly 100x faster than the nnls I wrote initially and is currently in Optim.

@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Mar 11, 2014

Contributor

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

Contributor

mlubin commented Mar 11, 2014

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

@johnmyleswhite

This comment has been minimized.

Show comment
Hide comment
@johnmyleswhite

johnmyleswhite Mar 11, 2014

Contributor

Building nnls in three lines of code is a really great example for Optim. But I guess I think nnls belongs in a separate package, since it's more of a use case for Optim than part of Optim per se.

Contributor

johnmyleswhite commented Mar 11, 2014

Building nnls in three lines of code is a really great example for Optim. But I guess I think nnls belongs in a separate package, since it's more of a use case for Optim than part of Optim per se.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 12, 2014

Contributor

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

Got it, I hadn't thought through the implications of having two bounds on each (non)linear constraint carefully enough. That will work well for a Newton method, but I have to think through the implications for implementation with CG (which for my problems is the much more important algorithm)---it seems like unifying inequalities and equalities could result in troublesome behavior with respect to roundoff errors.

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

But I guess I think nnls belongs in a separate package

OK, these can go.

Since this PR may take some redesign and I have deadlines, this may sit for a while.

Contributor

timholy commented Mar 12, 2014

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

Got it, I hadn't thought through the implications of having two bounds on each (non)linear constraint carefully enough. That will work well for a Newton method, but I have to think through the implications for implementation with CG (which for my problems is the much more important algorithm)---it seems like unifying inequalities and equalities could result in troublesome behavior with respect to roundoff errors.

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

But I guess I think nnls belongs in a separate package

OK, these can go.

Since this PR may take some redesign and I have deadlines, this may sit for a while.

@johnmyleswhite

This comment has been minimized.

Show comment
Hide comment
@johnmyleswhite

johnmyleswhite Mar 14, 2014

Contributor

Ok. I'll be excited to see where this heads whenever you pick it up again.

Contributor

johnmyleswhite commented Mar 14, 2014

Ok. I'll be excited to see where this heads whenever you pick it up again.

@timholy timholy referenced this pull request in JuliaOpt/MathProgBase.jl May 24, 2014

Merged

RFC: NLP interface #32

@Ken-B Ken-B referenced this pull request Nov 30, 2014

Closed

extra cg arguments in fminbox #89

Cody-G and others added some commits Feb 4, 2015

fixed some deprecation warnings thrown by julia 0.4. Used Compat pack…
…age to ensure compatibility with earlier Julia versions.
Merge pull request #99 from Cody-G/fix_deprecations
fixed deprecation warnings thrown by julia 0.4.  Did NOT use the Compat ...

@mlubin mlubin referenced this pull request Mar 11, 2015

Open

usage with JuMP? #107

timholy added some commits Apr 9, 2015

timholy added some commits Sep 23, 2015

Julia 0.4 update
Binding deprecations, mostly.
@johnmyleswhite

This comment has been minimized.

Show comment
Hide comment
@johnmyleswhite

johnmyleswhite Oct 14, 2015

Contributor

@timholy: Any interest in getting this branch ready to merge? It seems like the desire for consistency with JuMP had the unfortunate effect of delaying all progress on making your work widely available. If we're going to consider redesigning the outward facing API, we should make sure we have basic support for constrained optimization in place first.

Contributor

johnmyleswhite commented Oct 14, 2015

@timholy: Any interest in getting this branch ready to merge? It seems like the desire for consistency with JuMP had the unfortunate effect of delaying all progress on making your work widely available. If we're going to consider redesigning the outward facing API, we should make sure we have basic support for constrained optimization in place first.

src/interior.jl
+ end
+ end
+ if method == :newton
+ return iterior_newton(objective, initial_x, constraints; t=t, mu=mu, eps_gap=eps_gap) # FIXME

This comment has been minimized.

@johnmyleswhite

johnmyleswhite Oct 14, 2015

Contributor

Typo here?

@johnmyleswhite

johnmyleswhite Oct 14, 2015

Contributor

Typo here?

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Oct 14, 2015

Contributor

In just the last two weeks, I've abandoned this in my lab's internal code and switched to MathProgBase. Not because I don't think this is promising, but because the effort to rewrite/finish this to make it consistent with the MathProgBase architecture is not small. Optimization is really important to me, but between my lab's internal needs and the big revamps occurring in the world of Images I just don't have time.

Contributor

timholy commented Oct 14, 2015

In just the last two weeks, I've abandoned this in my lab's internal code and switched to MathProgBase. Not because I don't think this is promising, but because the effort to rewrite/finish this to make it consistent with the MathProgBase architecture is not small. Optimization is really important to me, but between my lab's internal needs and the big revamps occurring in the world of Images I just don't have time.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Oct 14, 2015

Contributor

I should add that I have no reservations about anyone grabbing this and running with it, if it seems like a useful starting point.

Contributor

timholy commented Oct 14, 2015

I should add that I have no reservations about anyone grabbing this and running with it, if it seems like a useful starting point.

@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Oct 14, 2015

Contributor

Which solver have you been using? Ipopt?

Contributor

mlubin commented Oct 14, 2015

Which solver have you been using? Ipopt?

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Oct 14, 2015

Contributor

Yes. (I also have a fix to Ipopt I need to make---it sometimes finds a solution that's worse than the original warmstart solution, and we should at least check for that. The fix is to twiddle the barrier-penalty options; the bottleneck is finding a simple test case that reproduces this behavior.)

Contributor

timholy commented Oct 14, 2015

Yes. (I also have a fix to Ipopt I need to make---it sometimes finds a solution that's worse than the original warmstart solution, and we should at least check for that. The fix is to twiddle the barrier-penalty options; the bottleneck is finding a simple test case that reproduces this behavior.)

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Oct 14, 2015

Are you warmstarting all of primal, dual, and the barrier parameter? It can be tricky to entirely satisfy all of the convergence requirements of an IP solver when warmstarting (there's quite a bit of internal algorithmic state, in the filter line search and elsewhere), and you may end up converging to local optima fairly far from your initial point (even when that initial point is or is close to a better solution) in non-convex problems.

It would be good to continue brainstorming APIs that enable matrix-free operator approaches to large scale constrained optimization, but several of the key operations for that are already standardized in MathProgBase (just not used yet by any existing solvers, to my knowledge).

tkelman commented Oct 14, 2015

Are you warmstarting all of primal, dual, and the barrier parameter? It can be tricky to entirely satisfy all of the convergence requirements of an IP solver when warmstarting (there's quite a bit of internal algorithmic state, in the filter line search and elsewhere), and you may end up converging to local optima fairly far from your initial point (even when that initial point is or is close to a better solution) in non-convex problems.

It would be good to continue brainstorming APIs that enable matrix-free operator approaches to large scale constrained optimization, but several of the key operations for that are already standardized in MathProgBase (just not used yet by any existing solvers, to my knowledge).

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Oct 14, 2015

Contributor

Ipopt's warmstart only sets the primal parameters, as far as I can tell. And yes, this is a case of multiple minima, where I'm able to give a better initial guess than the minimum found by Ipopt after the barrier pushes it out of its starting basin.

Contributor

timholy commented Oct 14, 2015

Ipopt's warmstart only sets the primal parameters, as far as I can tell. And yes, this is a case of multiple minima, where I'm able to give a better initial guess than the minimum found by Ipopt after the barrier pushes it out of its starting basin.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Oct 14, 2015

Contributor

Speaking of API, I'll start one aspect of discussion over here.

Contributor

timholy commented Oct 14, 2015

Speaking of API, I'll start one aspect of discussion over here.

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Oct 14, 2015

This is off topic for this repo, but without duals there's no way for Ipopt to know if the point you gave it satisfies KKT conditions. Dual warmstart might not be exposed in one or more of Ipopt.jl, MathProgBase.jl, or JuMP, but it should be available in Ipopt itself (if a bit buried).

Tweaking options or doing extra checks should usually be in user code, not solver wrappers, unless there's an existing julia API for them.

tkelman commented Oct 14, 2015

This is off topic for this repo, but without duals there's no way for Ipopt to know if the point you gave it satisfies KKT conditions. Dual warmstart might not be exposed in one or more of Ipopt.jl, MathProgBase.jl, or JuMP, but it should be available in Ipopt itself (if a bit buried).

Tweaking options or doing extra checks should usually be in user code, not solver wrappers, unless there's an existing julia API for them.

@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Oct 14, 2015

Contributor

Yes, dual warm starts are possible in Ipopt but not currently exposed.

Contributor

mlubin commented Oct 14, 2015

Yes, dual warm starts are possible in Ipopt but not currently exposed.

timholy added some commits Oct 21, 2015

Guarantee that the initial direction is a descent direction for objec…
…tive w/o constraints

This prevents the barrier penalty from kicking the solution out of the starting basin
Pass constraints through to minimizers
For box constraints, this allows the algorithm to pick alphamax so the constraints will not be violated
@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Oct 21, 2015

Contributor

I'm back to being interested in this branch again. Reference: https://groups.google.com/forum/#!topic/julia-opt/TVmuXFWfeBM. In contrast, here I see no overhead from the solver, and the progress towards the minimum per iteration also seems better. I haven't yet measured it per function evaluation; that's likely to look a little less favorable, because I noticed that our linesearch not infrequently uses more evaluations than Ipopt. (There's some evidence that might be a good thing (p. 2164), but it seems likely to reduce the speed of optimization.)

I kinda wanted to get out of the business of writing my own optimization code, and maybe I still will if someone comes up with another solution. But I thought I'd at least ping this issue to say it may not be dead yet. Another thing about Optim that I like is that I can use Float32s (my 100GB disk file would be a 200GB disk file had I used Float64s), and I worry a little about the minimizer detecting "edges" among adjacent Float32 values if it's using Float64s as coordinates for the optimization.

Since I've just been playing with this, here are a couple of random API thoughts:

  • With barrier methods, you have a combined objective. With show_trace=true, it's nice to show just the part that corresponds to the user's objective. The architecture here doesn't yet make that easy, because what the "outer" optimizer sees is the total function penalty (user-objective + barrier terms).
  • The API here isn't yet as flexible as MathProgBase. However, I note that it takes substantially fewer "user" lines of code. (Making it more flexible might reduce that advantage, of course.)
Contributor

timholy commented Oct 21, 2015

I'm back to being interested in this branch again. Reference: https://groups.google.com/forum/#!topic/julia-opt/TVmuXFWfeBM. In contrast, here I see no overhead from the solver, and the progress towards the minimum per iteration also seems better. I haven't yet measured it per function evaluation; that's likely to look a little less favorable, because I noticed that our linesearch not infrequently uses more evaluations than Ipopt. (There's some evidence that might be a good thing (p. 2164), but it seems likely to reduce the speed of optimization.)

I kinda wanted to get out of the business of writing my own optimization code, and maybe I still will if someone comes up with another solution. But I thought I'd at least ping this issue to say it may not be dead yet. Another thing about Optim that I like is that I can use Float32s (my 100GB disk file would be a 200GB disk file had I used Float64s), and I worry a little about the minimizer detecting "edges" among adjacent Float32 values if it's using Float64s as coordinates for the optimization.

Since I've just been playing with this, here are a couple of random API thoughts:

  • With barrier methods, you have a combined objective. With show_trace=true, it's nice to show just the part that corresponds to the user's objective. The architecture here doesn't yet make that easy, because what the "outer" optimizer sees is the total function penalty (user-objective + barrier terms).
  • The API here isn't yet as flexible as MathProgBase. However, I note that it takes substantially fewer "user" lines of code. (Making it more flexible might reduce that advantage, of course.)
@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Oct 22, 2015

Contributor

@timholy, I'm open to suggestions like JuliaOpt/MathProgBase.jl#87 on how to make the MPB API easier to both use and implement on the solver side.

Contributor

mlubin commented Oct 22, 2015

@timholy, I'm open to suggestions like JuliaOpt/MathProgBase.jl#87 on how to make the MPB API easier to both use and implement on the solver side.

Automatically try a restart if we think we've converged
This should cover cases where dphi < 0 "by a hair," i.e., where the search direction is almost orthogonal to the gradient. It's an extra layer of security beyond commit 50b9d60 to increase the likelihood that when we terminate, we really have reached a minimum.
@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Oct 22, 2015

Contributor

Other than what I suggested already, I can't think of any other ways to improve on what you've done---at the end of the day you have to be able to interact with C code, and the API you've already designed is quite good for that. I just wanted to throw in a tweak or two.

But of course new ideas may crop up in the course of implementation.

Contributor

timholy commented Oct 22, 2015

Other than what I suggested already, I can't think of any other ways to improve on what you've done---at the end of the day you have to be able to interact with C code, and the API you've already designed is quite good for that. I just wanted to throw in a tweak or two.

But of course new ideas may crop up in the course of implementation.

@@ -89,7 +89,7 @@ macro cgtrace()
dt["g(x)"] = copy(gr)
dt["Current step size"] = alpha
end
- grnorm = norm(gr, Inf)
+ grnorm = norm(gr[:], Inf)

This comment has been minimized.

@johnmyleswhite

johnmyleswhite Jan 12, 2016

Contributor

Can you use vecnorm(gr, Inf) here?

@johnmyleswhite

johnmyleswhite Jan 12, 2016

Contributor

Can you use vecnorm(gr, Inf) here?

This comment has been minimized.

@timholy

timholy Jan 26, 2016

Contributor

Yeah.

Thanks for the comment!

@timholy

timholy Jan 26, 2016

Contributor

Yeah.

Thanks for the comment!

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Jan 26, 2016

Contributor

I am under the illusion that I'll soon have a few free days to work on Optim, and might want to put this into shape for merger. Any API thoughts about this PR would be welcome; I think I'll need to make quite a few changes.

Contributor

timholy commented Jan 26, 2016

I am under the illusion that I'll soon have a few free days to work on Optim, and might want to put this into shape for merger. Any API thoughts about this PR would be welcome; I think I'll need to make quite a few changes.

@mlubin

This comment has been minimized.

Show comment
Hide comment
@mlubin

mlubin Jan 26, 2016

Contributor

@timholy, I'd say choose the API that seems most natural to you, so long as it's possible to write a lightweight wrapper to take in MathProgBase input as well. There's no need for the native API to the solver to be exactly MathProgBase.

Contributor

mlubin commented Jan 26, 2016

@timholy, I'd say choose the API that seems most natural to you, so long as it's possible to write a lightweight wrapper to take in MathProgBase input as well. There's no need for the native API to the solver to be exactly MathProgBase.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Jan 26, 2016

Contributor

Aside from the integration into MathProgBase, what I wonder most about is the proper API for specifying constraints. Ipopt, for example, does not appear to have a special category for linear constraints, but I assume that would be a useful thing to specialize on. Likewise, #50 (comment) suggests two possible APIs for specifying equality or inequality constraints. As I think about it, I lean towards making everything an equality constraint and then specifying bounds on the slack variables, but any comments would be welcome.

From the user's perspective, the good news is that no matter how crappy an API I design, it will still look pretty when accessed via the remarkable JuMP 😄. But that's probably not something to shoot for.

Contributor

timholy commented Jan 26, 2016

Aside from the integration into MathProgBase, what I wonder most about is the proper API for specifying constraints. Ipopt, for example, does not appear to have a special category for linear constraints, but I assume that would be a useful thing to specialize on. Likewise, #50 (comment) suggests two possible APIs for specifying equality or inequality constraints. As I think about it, I lean towards making everything an equality constraint and then specifying bounds on the slack variables, but any comments would be welcome.

From the user's perspective, the good news is that no matter how crappy an API I design, it will still look pretty when accessed via the remarkable JuMP 😄. But that's probably not something to shoot for.

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Jan 27, 2016

If you want the API to be appropriate for problems that may have general nonlinear equality constraints, then my personal favorite canonical form (equalities and bounds) hasn't changed over the last 2 years. Conversions between canonical forms are simple so the user-facing API can use something more general like the row-wise form even if the algorithm internally uses something different. For other classes of problems you may want the internal implementation to use a different form (and/or API) though.

In the Ipopt algorithm, linear constraints can at best save you a subset of Jacobian row evaluations, and those are typically not the bottleneck relative to the solution of the KKT system for the Newton step. You can set an option flag if all constraints happen to be linear then the Jacobian will only be evaluated once. Whether it's worth the API complication of allowing situations in between, either via a linearity bitmask or separate inputs for linear vs nonlinear constraints (Matlab fmincon style), depends mostly on whether you expect Jacobian evaluations to be expensive.

tkelman commented Jan 27, 2016

If you want the API to be appropriate for problems that may have general nonlinear equality constraints, then my personal favorite canonical form (equalities and bounds) hasn't changed over the last 2 years. Conversions between canonical forms are simple so the user-facing API can use something more general like the row-wise form even if the algorithm internally uses something different. For other classes of problems you may want the internal implementation to use a different form (and/or API) though.

In the Ipopt algorithm, linear constraints can at best save you a subset of Jacobian row evaluations, and those are typically not the bottleneck relative to the solution of the KKT system for the Newton step. You can set an option flag if all constraints happen to be linear then the Jacobian will only be evaluated once. Whether it's worth the API complication of allowing situations in between, either via a linearity bitmask or separate inputs for linear vs nonlinear constraints (Matlab fmincon style), depends mostly on whether you expect Jacobian evaluations to be expensive.

timholy added some commits Jan 27, 2016

Change convergence criteria in interior_newton to monitor solution, n…
…ot eps_gap

eps_gap introduced an absolute scale for convergence, which seemed problematic. This should be more robust, as it monitors the characteristics of the solution. It should also avoid making changes to the barrier that no longer have a consequence for the solution; previous approaches introduced numerical stability problems when the barrier penalty became extreme.
interior_newton: pass options, bail if encounter numeric instability
Also cleans up a bit of the tracing & convergence-testing

@pkofod pkofod referenced this pull request in cortner/ConstrainedOptim.jl Sep 24, 2016

Open

API #5

@timholy timholy closed this Nov 8, 2016

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Nov 8, 2016

Contributor

Don't delete this branch just yet, though. People may be using it.

Contributor

timholy commented Nov 8, 2016

Don't delete this branch just yet, though. People may be using it.

@pkofod

This comment has been minimized.

Show comment
Hide comment
@pkofod

pkofod Nov 8, 2016

Collaborator

Well, we could let it stay forever. It could be used as a "masterclass in git-fu"... I know I've tried to rebase this quite a few times, only to give up halfway through!

Collaborator

pkofod commented Nov 8, 2016

Well, we could let it stay forever. It could be used as a "masterclass in git-fu"... I know I've tried to rebase this quite a few times, only to give up halfway through!

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Nov 9, 2016

Contributor

Probably a good idea. Certainly people in my lab are using it!

Contributor

timholy commented Nov 9, 2016

Probably a good idea. Certainly people in my lab are using it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment