1. Projected CG

During my first week I intend to work on an iterative solver for equality constrained quadratic programming (EQP) problems, of the form:

min 1/2 x.T H x + g.T x
subject to A x = b

I intend to implement a solver known as projected conjugate gradient (CG). This method is very similar to the conjugate gradient method with a few modifications to take into account the equality constraints. The implementation will be based on Section 16.3 from Nocedal and Wrigth book [1].

[1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization" Second Edition (2006).

[2] Gould, Nicholas IM, Mary E. Hribar, and Jorge Nocedal. "On the solution of equality constrained quadratic programming problems arising in optimization." SIAM Journal on Scientific Computing 23.4 (2001): 1376-1395.

Questions from Nikolay

At the moment it is not very clear to me which methods are going to be used in the final solver.

For problems with dense G. Do you plan to use Null Space with QR factorization to find matrices Y and Z? What about direct solution of KKT system using a general linear solver scipy.linalg.solve? I figured it doesn't make much sense to use a sparse solver when G is dense, am I right? I guess Null Space method is preferable because it accounts for the special structure of the problem, whereas the general solver will ignore this block of zeros in the matrix, it should have lower computational complexity.
For problems with sparse G. Here I'm confused from both theoretical and practical points of view. As far as I figured savings of using projected CG (compared to solving the KKT system) depend on the methods used:
1. When using normal equation approach: because you need to factor only A A^T as opposed to the full system.
2. When using augmented system approach: because the resulting matrix is simpler compared to KKT matrix, as the matrix G is replaced by some preconditioner matrix (identity in the simplest case). Do you know how to measure savings in this case? It doesn't look obvious to me as it will depend on how the sparse factorization procedure works inside. Am I correct here?
Now to the practical issues of sparse problems: your implementation currently depends on cholesky_AAt, but what are possible replacements from scipy? I'm aware only for LSMR/LSQR which is an iterative procedure by itself, so stacking two iterative procedures at this stage is a bit too much in my opinion (and again it requires its own accuracy settings and it's not particularly fast).
What if we use "augmented system approach" with SuperLU implementation instead? Again I don't fully understand how to measure time savings in this case (or maybe I'm missing something). In any case it can be measured by run time directly.
Another alternative is to simply use direct KKT factorization using LU as it is already implemented. Can it be a good simple and reliable approach (perhaps not the most efficient)?

Response

So far I have implemented three main functions, eqp_kktfact, projections and projected_cg.

The method eqp_kktfact solves the EQP problem by the direct factorization of the KKT matrix. It will not be used by the nonlinear trust-region solver and was implemented for comparison purposes.
The method projections provides projections of a given matrix on the row space and on the nullspace is needed by the projected cg method and by other substeps of the nonlinear trust-region solver as well.
The method projected_cg is a substep of the solver.

1 - It is important to highlight that the trust-region substep of the method will add trust-region constraints to the EQP subproblem. And while we can add those constraints seamlessly to the projected cg method, direct factorization methods are not able to take into account those contraints that easy. So the idea is to use projected cg as a substep regardless of the matrix being sparse or not. On the function projections, however, I could include variations depending of the matrix A being sparse or dense.

2 - I have just finished implementing the three aproaches : direct factorization, projected_cg with normal equation approach and with augmented system approach. I will try to do some comparison between them. It is worth noticed, however, that in [2] the authors point out benefits in using the augmented system approach. But it is not clear to me neither how cheaper is to factorize the augmented system than it is to factorize the KKT matrix itself. I think some numerical test may help us to find out.

4 - *I just implement that on my last commit. My idea was to compare normal equation approach with augmented system approach to see how they perform on some test problems.

5 - I did a simple approach using direct factorization in the method eqp_kktfact and I think it will be useful for comparison purposes. Nevertheless I don't think we will be able to use it as a substep for the nonlinear solver, since there isn't (at least I don't know of) a easy way to add the trust-region constraints to the problem.

From Nikolay

I got your point about necessity to account trust-regions constraints. From this description I got an impression that a direct solution of the KKT system might also be used in some variations of the full non-linear algorithm.
After comparison between normal equation and augmented system approaches we can decide whether it is worth it to provide CHOLMOD option (via sksparse or try to add an implementation to scipy).
As for the "iterative refinement" --- I think until you will really feel the need to implement it (by seeing some poor results) perhaps it is not very necessary at that point.

So I see the plan perhaps as follows:

Implement tests from CUTE (at least some of them).
Understand the difference between the two approaches and determine what will be used in the final solver using those tests.
Move on to the main solver, returning back to improve EQP part as necessary. I guess Matt said a similar thing.

Questions from Matt

None at this time : ) I like this presentation format.

Regarding normal equation approach vs augmented system approach: if it is anything like IP methods for linear programming, then although the augmented system is larger, it can be an easier system to solve in special cases. However for the IP LP solver, the system was solved directly and the advantage of the augmented system had to do with avoiding fill-ins, so that may not be relevant for an iterative solver.

So all I really have to suggest is do some preliminary tests as you plan, pick a default, and leave the other one in as an option. I don't think we have to think about it terribly carefully right now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Projected CG

Questions from Nikolay

Response

From Nikolay

Questions from Matt

Clone this wiki locally