# CHAPTER 8 - Equality Constrained Minimisation

---
---

**Author:** Dr Giordano Scarciotti (g.scarciotti@imperial.ac.uk) - Imperial College London 

**Module:** ELEC70066 - Advanced Optimisation

**Version:** 1.1.2 - 24/02/2023

---
---

The material of this chapter is adapted from $[1]$.

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/TWHJeijnmsw"></iframe>')

In this chapter we describe methods for solving a convex optimisation problem with equality constraints

$$
\begin{array}{ll}
\min & f(x) \\
s.t. & Ax=b 
\end{array} \tag{1}
$$

where $f:\mathbb{R}^n\to\mathbb{R}$ is convex and twice continuously differentiable and $A \in \mathbb{R}^{p\times n}$ with $\textbf{rank }A=p < n $, i.e. the equality contraints are independent and there are more variables than constraints. We also assume that the optimal solution $x^*$ exists (and so the problem is feasible).

From the KKT optimality conditions, we know that $x^*$ is optimal if and only if there exists $\nu^*$ such that

$$
\begin{array}{ll}
Ax^*=b & \textit{Primal feasibility equations} \\
\nabla f(x^*) + A^\top \nu^* = 0 & \textit{Dual feasibility equations} \tag{2}
\end{array}
$$

This is a set of $n+p$ equations. The primal feasibility equations are linear while the dual feasibility equations are nonlinear. This nonlinearity makes it in general difficult to solve these equations.

In this chapter we will cover four different methods to solve problem $(1)$:

1.   (Section 8.1.1) Reduction to unconstrained problems by eliminating the equality constraints.
2.   (Section 8.1.2) Solving the dual using the unconstrained method and then recover the solution to $(1)$ from the dual solution.

However, these methods usually destroy the structure (i.e. sparsity) that the problem may have. Exploiting structure motivates us to develop methods that solve $(2)$ directly. We will cover two methods to solve $(2)$:

3.   (Section 8.2) Newton’s method with equality constraints.
4.   (Section 8.3) Infeasible start Newton's method.

We will also see that methods 1 and 3 are equivalent (they produce the same iterations) while method 4 is a generalisation of the previous methods. The Chapter is concluded with a discussion about the implementation and a comparison (Section 8.4).


# 8.1 Reduction to Unconstrained Problems

## 8.1.1 Eliminating Equality Constraints

A general method for solving $(1)$ is by eliminating the equality constaint and solve the resulting uncontrained problem. To this end, we find a matrix $F$ such that $\mathcal{R}(F) = \mathcal{N}(A)$ (i.e. the range of $F$ is the nullspace of $A$) and a vector $\hat x$ as any vector that solves $Ax-b=0$. Then, the feasible set can be parametrized as

$$
\{x:Ax=b\} = \{z\in\mathbb{R}^{n-p}:Fz + \hat x\}.
$$

This parametrization can be proved trivially: $A (Fz + \hat x) = AFz + A\hat x = 0 + b$, because $Fz\in\mathcal{N}(A)$ and $\hat x$ is a solution of the primal feasibility equation. Thus problem $(1)$ is equivalent to

$$
\begin{array}{ll}
\displaystyle \min_z & f(Fz + \hat x). 
\end{array}\tag{3}
$$

Solving $(3)$, we recover the solution of $(1)$ as $x^* = Fz^* + \hat x$. We can also trivially find the optimal dual variable $\nu^*$ by simply solving the dual feasibility equation, yielding

$$
\nu^* = - (AA^\top)^{-1}A \nabla f(x^*).
$$

There are many possible choices for the elimination matrix $F$. It can be shown that changing elimination matrix is equivalent to changing coordinates of the reduced problem. A standard method to find $F$ is by using the so-called [QR factorisation](https://en.wikipedia.org/wiki/QR_decomposition). However, if $A$ is sparse, the QR factorisation should be avoided as it does usually produce a dense $F$. In this case, one could use a sparse $LU$ factorisation.


---

**Example 8.1:** (*Optimal allocation with resource constraint*) Consider the problem

$$
\begin{array}{ll}
\displaystyle \min_{x_1,\dots,x_{n-1}} & \sum_{i=1}^n f_i(x_i) \\
s.t. & \sum_{i=1}^n x_i=b.
\end{array}
$$

This problem can be interpreted as the problem of optimally allocating a
single resource, with a fixed total amount $b$ (the budget), to $n$ otherwise independent activities. We can eliminate the constraint by using (for instance)

$$
x_n = b - x_1 - \cdots - x_{n-1} \tag{4}
$$

yielding

$$
\begin{array}{ll}
\min & f_n(b - x_1 - \cdots - x_{n-1})  + \sum_{i=1}^{n-1} f_i(x_i)
\end{array}
$$

**Exercise 8.1:** work out $\hat x$ and $F$ corresponding to the choice $(4)$.

***EDIT THE FILE TO ADD YOUR SOLUTION HERE***

---

## 8.1.2 Solving Equality Constrained Problems via the Dual

The second approach we cover consists in solving the dual and then solving $\nabla L(x, \nu^*) = 0$ (we have already seen this in the Section 6.4.1 on Duality). The dual function of $(1)$ is 

$$
g(\nu) = -b^\top \nu + \inf_x (f(x) + \nu^\top Ax) = -b^\top \nu - f^*(-A^\top \nu)
$$

where $f^*$ is the conjugate of $f$. So the dual problem of $(1)$ is

$$
\begin{array}{ll}
\max & -b^\top \nu - f^*(-A^\top \nu) 
\end{array} \tag{5}
$$

Since by assumption there is an optimal point, the problem is strictly feasible, so Slater’s condition holds. Therefore strong duality holds, and the dual optimum is attained. If $g$ is twice differentiable, the optimal solution $\nu^*$ can be found solving $(5)$ using the unconstrained minimisation method. The optimal primal solution $x^*$ can then be found by solving $\nabla L(x, \nu^*) = 0$. An example (*Entropy maximisation*) is available in Section 6.4.1.

---

**Example 8.2:** (*Equality constrained analytic center*) Consider the problem

$$
\begin{array}{ll}
\min & f(x) = - \sum_{i=1}^n \log x_i\\
s.t. & Ax = b   
\end{array}
$$

where $b \in \mathbb{R}^{p\times n}$, with implicit constraint $x \succ 0$. Using $f^*(y) = -n - \sum_{i=1}^n \log(- y_i)$, the dual problem is

$$
\max \,\,\, g(\nu) = -b^\top \nu + n \sum_{i=1}^n \log(A^\top \nu)_i
$$

with implicit constraint $A^\top \nu \succ 0$. 

For this example we can easily solve the dual feasibility equation, i.e. find the minimiser of $L(x,\nu^*)$

$$
\nabla f(x) + A^\top \nu^* = -(1/x_1,\dots,1/x_n) + A^\top \nu^* = 0
$$

which gives $x_i(\nu^*) = 1/(A^\top \nu^*)_i$. Thus, to solve the analylitic centering problem, we first solve the unconstrained dual problem and then recover the solution of the primal problem as $x_i(\nu^*) = 1/(A^\top \nu^*)_i$.

---

# 8.2 Newton’s Method with Equality Constraints

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/IvmC5XGLok0"></iframe>')

## 8.2.1 Special Case: Quadratic Minimisation

Before discussing the general case, we look at the case in which $f$ is quadratic as this will help us to understand the method in the general case. Consider the problem

$$
\begin{array}{ll}
\min & \frac{1}{2}x^\top P x + q^\top x + r \\
s.t. & Ax=b 
\end{array} \tag{6}
$$

where $P\in\mathbb{S}^n_+$ and $A\in\mathbb{R}^{p\times n}$. In this case the optimality conditions $(2)$ become

$$
\left[\begin{array}{ll}P & A^\top\\ A & 0 \end{array}\right]\left[\begin{array}{l} x^* \\ v^* \end{array}\right]=\left[\begin{array}{r} -q\\b \end{array}\right].
$$

This is a set of $n+p$ linear equations in $n+p$ variables called **KKT system**. The coefficient matrix is called **KKT matrix**. When the KKT matrix is nonsingular, there is a unique optimal primal-dual pair $(x^*, \nu^*)$. If the KKT matrix is singular, but the KKT system is solvable, any solution yields an optimal pair $(x^*, \nu^*)$. If the KKT system is not solvable, the
quadratic optimisation problem is unbounded below or infeasible.

Given the assumption that $P\in\mathbb{S}^n_+$ and $\textbf{rank }A=p < n$ there are several ways in which the nonsingularity of the KKT matrix can be characterised. For instance, nonsingularity is achieved when $P$ and $A$ have no nontrivial common nullspace, or equivalently, if $x^\top Px>0$ for all $x\ne0$ belonging to the nullspace of $A$. It follows that if $P\succ 0$, then the KKT matrix is nonsingular.  

## 8.2.2 The Newton Step

We now describe a Newton's method which includes equality constraints. The method is almost identical to the uncontrained method but for two differences: we assume that the initial point is feasible and that the Newton step is a feasible direction, i.e. $A \Delta x_{nt}=0$.

The derivation of the Newton step follows exactly the same ideas of the unconstrained Netwon's method: we replace the objective function in $(1)$ with its second-order Taylor approximation near $x$, yielding

$$
\begin{array}{ll}
\min & \hat f(x+v) = f(x) + \nabla f(x)^\top v + \frac{1}{2}v^\top \nabla^2 f(x) v \\
s.t. & A(x+v)=b 
\end{array}
$$

with variable $v$ (note that here $x$ is fixed). This is a convex quadratic problem $(6)$ in $v$. Let us rename $v$ as $\Delta x_{nt}$ and call the dual variable $w$. Then from $(6)$ we have that the optimal solution at this $x$ is characterized by

$$
\left[\begin{array}{cc}\nabla^2 f(x) & A^\top\\ A & 0 \end{array}\right]\left[\begin{array}{c} \Delta x_{nt} \\ w \end{array}\right]=\left[\begin{array}{c} -\nabla f(x)\\ 0 \end{array}\right].\tag{7}
$$

The Newton step is defined only when the KKT matrix is nonsingular. As in Newton's method for unconstrained problems, we observe that when the objective $f$ is quadratic then the Newton update $x+\Delta x_{nt}$ solves the problem exactly (and $w$ is the optimal dual solution). This suggests that as in the unconstrained case when $f$ is nearly quadratic (e.g. when we are close to the optimum) then $x+\Delta x_{nt}$ is a very good estimate of the solution $x^*$.

We can interpret the Newton step $\Delta x_{nt}$ and the associated vector $w$ as the solutions of the linearisation of the optimality conditions $(2)$, namely

$$
\begin{array}{l}
A(x+\Delta x_{nt})=b \\
\nabla f(x+\Delta x_{nt}) + A^\top w \approx \nabla f(x) + \nabla^2 f(x) \Delta x_{nt} + A^\top w = 0.
\end{array}\tag{8}
$$

By assumption the starting point (and all the subsequent points) is feasible, i.e. $Ax=b$. So the conditions reduce to 

$$
\begin{array}{l}
A\Delta x_{nt}=0 \\
\nabla^2 f(x) \Delta x_{nt} + A^\top w = -\nabla f(x)
\end{array}\tag{9}
$$

which is exactly $(7)$.

We can define the Newton decrement in exactly the same way as for the unconstrained problem, namely

$$
\lambda(x) = \left(\Delta x_{nt}^\top \nabla^2 f(x)\Delta x_{nt}\right)^{\frac{1}{2}}.
$$

The same interpretation holds, namely $\lambda(x)$ is the norm of the Netwon step in the norm weighted by the Hessian. Moreover, like in the unconstrained case $\lambda(x)^2/2$ gives us an estimate of $f(x) - p^*$ because

$$
\displaystyle f(x) - \inf \{\hat{f}(x+v) : A(x+v)=b\} = \frac{1}{2} \lambda(x)^2.
$$

Thus $\lambda(x)$ can be used as a stopping criterion.

Suppose that $Ax = b$. We say that $v\in\mathbb{R}^n$ is a **feasible direction** if $Av = 0$. In this case, every point of the form $x+tv$ is also feasible, i.e., $A(x+tv) = b$. We say that $v$ is a descent direction for $f$ at $x$, if for small $t > 0$, $f(x + tv) < f(x)$.
The Newton step is always a **feasible descent** direction (except when $x$ is optimal). Feasibility comes fom the KKT equation $A\Delta x_{nt}=0$. To show that it is a descent direction, take $\hat f(x+v)$, evaluate it at $v=t\Delta x_{nt}$, take the derivative with respect to $t$ and set $t=0$. This yields

$$
\left.\frac{d}{dt}\hat f(x+t\Delta x_{nt})\right|_{t=0} = \nabla f(x)^\top \Delta x_{nt}.
$$

This quantity is equal to $-\lambda(x)^2$, so it is negative and consequently $\Delta x_{nt}$ is a descent direction.

---

**Exercise 8.2:** Show that $\nabla f(x)^\top \Delta x_{nt} = -\lambda(x)^2$. *Hint*: use the KKT equations $(9)$.

***EDIT THE FILE TO ADD YOUR SOLUTION HERE***

---

Similarly to the Newton step and decrement for unconstrained optimisation, the Newton step and decrement for equality constrained optimisation are affine invariant.

## 8.2.3 Newton’s method with equality constraints

In summary, the Newton method with equality constraints is identical to the unconstrained one as long as $x^{(0)}$ satisfies $Ax^{(0)}=b$ and we replace the definition of $\Delta x_{nt}$ with the solution of $(7)$ (for the unconstrained method it was $\Delta x_{nt} = -\nabla^2 f(x)^{-1}\nabla f(x)$).

$$
\begin{array}{l}
\textbf{given} \text{ a starting point } x^{(0)} \in \textbf{dom }f\text{ with }Ax^{(0)}=b\text{, tolerance }\varepsilon >0.\\
\textbf{repeat}\\
\quad 1.\textit{ Compute the Newton step and decrement. }\Delta x^{(k)}_{nt} \text{ and } \lambda(x^{(k)})^2\\
\quad 2.\textit{ Stopping criterion. }\textbf{quit} \text{ if }\lambda^2/2 \le \varepsilon\\
\quad 3.\textit{ Line search. }\text{Choose a step size }t^{(k)}\\
\quad 4.\textit{ Update. }x^{(k+1)}= x^{(k)}+t^{(k)}\Delta x^{(k)}_{nt}
\end{array}
$$

This method is an example of **feasible descent method** because all the interates are feasible and such that $f(x^{(k+1)}) < f(x^{(k)})$.

It is possible to show that the iterates of Newton's method with equality constaints coincide with the iterates of the unconstrained Newton's method applied to the reduced problem $(3)$ (i.e. the problem obtained eliminating the equality constaints). We omit the details of the proof. This equivalence has an important consequence. Everything we know about the convergence of Newton's
method for unconstrained problems transfers to Newton's method
for equality constrained problems. In particular, the practical performance of Newton’s method with equality constraints is exactly like the performance of Newton’s method for unconstrained problems. Once $x^{(k)}$ is near $x^*$, convergence is quadratic.

# 8.3 Infeasible Start Newton's Method

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/ZxJF6-aiHVM"></iframe>')

The method described above requires that the starting point is feasible. When $\textbf{dom }f = \mathbb{R}^n$ finding a feasible point simply requires to find a solution to $Ax=b$. However, when $\textbf{dom }f \ne \mathbb{R}^n$ finding a feasible point becomes a challange. Here we describe a generalisation of Newton's method which works well when $\textbf{dom }f \ne \mathbb{R}^n$ is relatively simple and known to contain a point satifying $Ax=b$ (i.e. we know that the problem is feasible). We postpone to the next chapter the presentation of a method for when we do not know if the problem is feasible. 

Let $x$ denote the current point, which we do not assume to be feasible, but we do assume satisfies $x \in \textbf{dom }f$. We apply the same derivation that we have done above to yield $(8)$ (i.e. the Newton step is the solution of the linearised optimality conditions), that we rewrite here for convenience


$$
\begin{array}{l}
A(x+\Delta x_{nt})=b \\
\nabla f(x+\Delta x_{nt}) + A^\top w \approx \nabla f(x) + \nabla^2 f(x) \Delta x_{nt} + A^\top w = 0.
\end{array}
$$

Now, differently from above, $x$ is not such that $Ax=b$. So the KKT system becomes 

$$
\left[\begin{array}{cc}\nabla^2 f(x) & A^\top\\ A & 0 \end{array}\right]\left[\begin{array}{c} \Delta x_{nt} \\ w \end{array}\right]=-\left[\begin{array}{c} \nabla f(x)\\ Ax-b \end{array}\right].\tag{10}
$$

These equations differ from $(7)$ just in the second block of the right-hand side which contains $Ax-b$ instead of $0$. This is the residual of the linear equality constraint. When $x$ is feasible, the residual is zero and the method reduces to the feasible Netwon's method of Section 8.2. This also implies that as soon as the iterate becomes feasible, then all subsequent iterates will be feasible.

We can provide a primal-dual interpretation of the Newton step which will be useful in its analysis. By a primal-dual interpretation, we mean that we can show that the method is related to the update of both the primal variable $x$ and the dual variable $\nu$. Define the residual

$$
r(y)=r(x,\nu)=(r_{d}(x,\nu),r_{p}(x,\nu))=(\nabla f(x) + A^\top \nu, Ax-b).
$$

Then the optimality condition can be expressed as $r(x^*,\nu^*)=0$. We compute the Taylor approximation at $y$, namely

$$
r(y + \Delta y) \approx \hat r(y + \Delta y) = r(y)+ D r(y) \Delta y
$$

where $D r(y)$ is the derivative of $r$ evaluated at $y$. We define the Newton step $\Delta y_{pd}$ as the step that makes the approximation vanish, namely

$$
D r(y) \Delta y_{pd} = -r(y).
$$

Writing this equation explicitily and defining $\Delta y_{pd} = (\Delta x_{pd},\Delta \nu_{pd})$ yields


$$
\left[\begin{array}{cc}\nabla^2 f(x) & A^\top\\ A & 0 \end{array}\right]\left[\begin{array}{c} \Delta x_{pd} \\ \Delta \nu_{pd} \end{array}\right]=-\left[\begin{array}{c} \nabla f(x) + A^\top \nu\\ Ax-b \end{array}\right].\tag{11}
$$



Now note that $(10)$ and $(11)$ are identical once we set $w = \nu + \Delta \nu_{pd}$ and $\Delta x_{nt}=\Delta x_{pd}$. This shows that the (infeasible) Newton step is the same as the primal part of the primal-dual step, and the associated dual vector $w$ is the updated primal-dual variable $\nu + \Delta \nu_{pd}$.

This interpretation is useful once we realise that the infeasible Newton step (at an infeasible point) is not a descent direction. In fact note that (exploiting twice $(10)$)

$$
\left.\frac{d}{dt} f(x+t\Delta x)\right|_{t=0} = \nabla f(x)^\top \Delta x = - \Delta x^\top (\Delta^2 f(x)\Delta x + A^\top w) = - \Delta x^\top \Delta^2 f(x)\Delta x + (Ax-b)^\top w
$$

which is not necessarily negative (unless $x$ is feasible and so $Ax-b=0$).

However, exploiting the primal-dual interpretation, we can show that the norm of the residual decreases in the Newton direction, namely

$$
\left.\frac{d}{dt} || r(y+t\Delta y_{dp})||_2^2\right|_{t=0} = 2 r(y)^\top Dr(y) \Delta y_{dp} =  -2 r(y)^\top r(y).\tag{12}
$$

This means that the algorithm must be adapted descending along $r$ instead of $f$. So for instance, the line search must be done along $r$.

Moreover, note that since the Newton step is constructed in $(10)$ as $A(x+\Delta x_{nt})=b$, it follows that if a step $t$ of length $1$ is taken along $\Delta x_{nt}$ then the next iterate will be feasible (and all future iterates will be feasible). If the step $t$ is less than $1$, then (using $(10)$)

$$
r_{p}^+ = A(x + t\Delta x_{nt})-b = (1-t)(Ax-b) = (1-t) r_p 
$$

where $r_{p}^+$ is the value of the primal residual at the next step. So we see that for any $t \in (0,1)$ the primal residual will reduce (and for $t=1$, it becomes $0$).

We can now summarize the algorithm. We define $\Delta \nu_{nt}=\Delta \nu_{pd}=w-\nu$.

$$
\begin{array}{l}
\textbf{given} \text{ a starting point } x^{(0)} \in \textbf{dom }f\text{, tolerance }\varepsilon >0,\,\alpha\in(0,0.5),\,\beta\in(0,1).\\
\textbf{repeat}\\
\quad 1.\textit{ Compute the primal and dual Newton steps. }\Delta x_{nt}^{(k)},\,\Delta \nu_{nt}^{(k)}.\\
\quad 2.\textit{ Backtracking the line search on }||r||_2.\\
\qquad t:=1\\
\qquad \textbf{while } ||r(x^{(k)}+t\Delta x_{nt}^{(k)},\nu^{(k)}+t\Delta \nu_{nt}^{(k)})||_2 > (1-\alpha t) ||r(x,\nu)||_2, \qquad t:=\beta t.\\
\quad 3.\textit{ Update. }x^{(k+1)}= x^{(k)}+t\Delta x^{(k)}_{nt},\,\nu^{(k+1)}= \nu^{(k)}+t\Delta \nu^{(k)}_{nt}\\
\textbf{until }Ax-b \text{ and }||r(x,\nu)||_2 \le \varepsilon.
\end{array}
$$



The main differences with the feasible Newton's method are as follows:

*   The search direction includes the extra term related to the primal residual
*   The line search is done on the norm of the residual instead of on $f$. This may have increased computational cost, but it is usually negligible. Note that $(12)$ guarantees that the search will terminate for small $t$.
*   The algorithm terminates when primal feasibility is achieved and the resitual is small (instead of being based on $\lambda(x)$).

If at any iteration the step size is selected as $1$, then all the following iterates will be feasible and the directions will be identical to the ones generated by the feasible Newton's method. At that point the difference is only in the search direction and stopping criteria and the two methods have very similar perfomance.


We have mentioned at the beginning of this section that the infeasible start Newton's method is useful to initialise Newton's method when a feasible point in unknown and $\textbf{dom }f \ne \mathbb{R}^n$ (otherwise the initialisation would be trivial). We could even use the infeasible start Newton's method for unconstrained problems when a point in $\textbf{dom }f$ is unknown. For instance, for the dual problem 

$$
\begin{array}{ll}
\max & -b^\top \nu + n + \sum_{i=1}^n \log\left(A^\top \nu\right)_i
\end{array}
$$

we may not know a point $\nu^{(0)}$ that satisfies $A^\top \nu^{(0)} \succ 0$. The problem is equivalent to

$$
\begin{array}{ll}
\max & -b^\top \nu + n + \sum_{i=1}^n \log\left(y_i\right) \\
s.t. & y = A^\top \nu 
\end{array}
$$

We can now use the infeasible start Newton's method, starting with any positive $y^{(0)}$.

We have stated that the infeasible start Netwton's method works well when $\textbf{dom }f$ is relatively simple and known to contain a point satifying $Ax=b$. However, the method has no way of detecting whether the problem is infeasible. In fact, even when $\textbf{dom }f\, \cap \{x:Ax=b\} = \emptyset $, the algorithm will return a value (in this case the residual will converge slowly to a small positive value and the line search never selects $1$). We will see in the next chapter algorithms that will detect infeasibility.

The convergence analysis of the infeasible start Newton's method is analogous to the previous case and so it is omitted. In particular, we have a damped phase and a (feasible) quadratic phase.




# 8.4 Implementation

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/UDOVcgPCO_I"></iframe>')

The methods seen above rely on the solution of a system of the type

$$
\left[\begin{array}{cc}H & A^\top\\ A & 0 \end{array}\right]\left[\begin{array}{c} v \\ w \end{array}\right]=-\left[\begin{array}{c} g\\ h \end{array}\right].\tag{13}
$$

where $H \in\mathbb{S}^n_+$ and $\textbf{rank }A=p$. We have the following approaches.

*   The KKT matrix is symmetric but not positive definite, so we can use the $LDL^\top$ factorisation. If no structure is exploited the cost will be $\frac{1}{3}(n+p)^3$
*   Since the KKT matrix has a special structure, we can use the block elimination method. If $H\succ 0$, from the first equation $Hv + A^\top w = -g$ we solve for $v$ and substitute in the second. This yields $w = (AH^{-1}A^\top)^{-1}(h-AH^{-1}g)$ and $v=-H^{-1}(g+A^\top w)$. The matrix $AH^{-1}A^\top$ is the (negative) Schur complement of $H$ in the KKT matrix. If no structure of $H$ is exploited, the cost is analogous to before. However, if we exploit the structure of $H$, the cost is much less (quadratic or linear in $n$). The point here is that it is much more likely to have a structured $H$ than a structured full KKT matrix. 
*   A variation of the block elimination method above can be applied also when $H$ is singular (as long as the KKT matrix is not singular). In fact, it can be shown that the KKT matrix is not singular if and only if $H + A^\top Q A \succ 0$ for some $Q\succcurlyeq 0$. So let  $Q\succcurlyeq 0$ be a matrix for which $H + A^\top Q A \succ 0$. Then the KKT system is equivalent to 
$$
\left[\begin{array}{cc}H+ A^\top Q A  & A^\top\\ A & 0 \end{array}\right]\left[\begin{array}{c} v \\ w \end{array}\right]=-\left[\begin{array}{c} g+ A^\top Q h \\ h \end{array}\right].
$$
which can be solved by block elimination since now the first block is positive definite.



## 8.4.1 Equality Constrained Analytic Centering

We now consider an example where we compare all the algorithms that we have seen above. Consider the equality constrained analytic centering problem

$$
\begin{array}{ll}
\min & -\sum_{i=1}^n \log x_i \\
s.t. & Ax=b 
\end{array}
$$

with $p=100$ and $n=500$. 



The first method is Newton's method with equality constraints, which on paper is a problem with $500$ variables. For this method the KKT system $(13)$ has $H = \textbf{diag}(1/x_1^2,\dots,1/x_n^2)$, $g = -(1/x_1,\dots,1/x_n)$ and $h=0$. We use the block elimination method, which boils down to solving

$$
A \,\textbf{diag}(x)^2 A^\top w = b \tag{14}
$$

with the Newton step given by $\Delta x_{nt} = -\textbf{diag}(x)^2 A^\top w + x$.



The second method is the Newton's method applied to the dual

$$
\begin{array}{ll}
\max & -b^\top \nu + n + \sum_{i=1}^n \log\left(A^\top \nu\right)_i
\end{array}
$$

which on paper is a problem with $100$ variables. Here the Newton step is obtained by solving 

$$
A \,\textbf{diag}(y)^2 A^\top \Delta \nu_{nt} = -b + Ay \tag{15}
$$

where $y = (1/(A^\top \nu)_1,\dots,1/(A^\top \nu)_n)$.

The third method is the infeasible start Newton's method, which on paper is a problem with 600 variables. We apply again block elimination to solve the KKT system. The Netwon step is given by $\Delta \nu_{nt}= w-v$ and $\Delta x_{nt} = x -\textbf{diag}(x)^2 A^\top w$, where $w$ is the solution of

$$
A \,\textbf{diag}(x)^2 A^\top w = 2Ax -b. \tag{16}
$$

In conclusion we see that all three methods requires to solve an equation of the type $ADA^\top w = d$ with $D$ positive and diagonal. So despite the different number of variables, the three methods have similar computational complexity. 

Let's now see a simulation. The plots below compare the three methods. Each plot shows $4$ different initialisations.

<div>
<img src="https://drive.google.com/uc?export=view&id=1DU3W4cgCS9R6m97160LOXT5mz_tTizv0" width="350"/>
</div>

Figure 8.1. *Error $f(x^{(k)}) - p^*$ in Newton's method with equality constraints for four different initial points $x^{(0)}$. Source: page 549 of $[1]$.*

<div>
<img src="https://drive.google.com/uc?export=view&id=1CDrMjXRf7wLvH02r-Q_DuPzroLtc7kQa" width="350"/>
</div>

Figure 8.2. *Error $|g(\nu^{(k)})-p^*|$ in Newton's method applied to the dual  for four different initial points $\nu^{(0)}$. Source: page 549 of $[1]$.*

<div>
<img src="https://drive.google.com/uc?export=view&id=18kTEzErCm78HwUg_gNih_t0FIHKqlnLe" width="350"/>
</div>

Figure 8.3. *Residual $||r(x^{(k)},\nu^{(k)})||_2$ in  infeasible start Newton's method for four different initial points $(x^{(0)},\nu^{(0)})$. Source: page 550 of $[1]$.*

The figures show that the dual method takes $6$-$7$ iterations to reach the quadratic convergence, the primal method $12$-$15$ and the infeasible start $10$-$20$. The dual method is faster by a factor of $2$ or $3$ (but not $5$ or $6$ as one may think from the number of variables involved). Also, the figures do not include the time required to find a feasible point for the dual method ($A^\top \nu^{(0)}\succ 0$) and for the primal method ($A x^{(0)}=b$, $ x^{(0)}\succ 0$). The infeasible start method requires no initialisation.

# End of CHAPTER 8