# Uncontraied optimization

minimize $f(x)$
- $f$ convex, twice continuously differentiable (hence $dom f$ open)
- we assume optimal value $p^*=\inf_x f(x)$ is attained (and finite)
solving the uncontrained optimization = find the solution to:
$\nabla f(x^*)=0$ $(n equations)

usually need iterative algorithms:
- produce a sequence of $x^{(k)}\in dom(f), k=1,2,\dots$ with $f(x^{(k)}) \to \min f(x)$ (i.e., $\nabla f(x^{(k)}) $) 
- 

Descent methods
- start with $x^{(0)}$
- update $x^{(k+1)} = x^{(k)} + t^{(k)} \Delta x^{(k)}$ s.t $f(x^{(k+1)}< f(x^{(k)})$
- other notations: $x^+=x+ t \Delta x$, $x:=x+t\Delta x$
- $\Delta x$ is the step, or search direction; $t$ is the step size, or step length
- from convexity, $f(x^+) < f(x)$ implies $\nabla f(x)^T \Delta x < 0$ (i.e., $\Delta x$ is a descent direction)

General descent method.
- given a starting point $x\in dom(f)$.
- repeat 
1. determie a descent direction $\Delta x$.
2. line search
3. Update: $x:=x+t \Delta $

Determine a descent direction
- one approach is gradient descent $\Delta x = -\nabla f(x)$
- choose a step size
1. line search methods: exact line search $t=argmin_{t>0} f(x+t \Delta x)$
2. backtracking 

find the steepest descent direction - steepest descent method
normalized steepest descent direction
$\Delta x_{nsd} = argmin \{ \nabla f(x)^T v \mid ||v|| = 1 \}$
$\Delta x = -\nabla f(x)$ is the steepest descent direction with respect to Euclidean norm

so performance of steepest descent similar to graident descent

### Newton's method
Newton's direction $\Delta x_{nt} = - \nabla^2 f(x)^{-1} \nabla f(x)$

One interpretation
- $x+\Delta x_{nt}$ minimizes second order approximation
$f(x+v)=f(x) + \nabla f(x)^T v + \frac{1}{2} v^T \nabla^2 f(x) v$

`intuition`: $f$ twice differentiable, so this quadratic model is very accurate when $x$ is near $x^*$.

Newton's descrement (stopping criteria):
$\lambda(x) = (\nabla f(x)^T \nabla^2 f(x)^{(-1)} \nabla f(x))^{1/2}$ a measure of the proximity of $x$ to $x^*$
- gives an estimate of $f(x) - p^*$, using quadratic approximation $\hat{f}$:
$f(x) - \inf_v \hat{f} (x+v) = \frac{1}{2} \lambda(x)^2$
- directional derivative in the Newton direction: $\nabla f(x)^T \Delta x_{nt} = -\lambda(x)^2$

(damped or guarded) Newton's method:

given a starting point , tolerance $\epsilon$
repeat 
1. computer the Newton step and decrement.
2. stopping criterion.
3. linea search
4. update

- damped Newton phase: $t<1$
- quadratically convergent phase: $t=1$
- pros: convergence is rapid in general and quadratic near $x^*$
- cons: compute and store $\nabla^2 f(x)$ and $\nabla^2f(x)^{-1}$ (cost)

## Missing one lecture [todo]

### infeasible start Newton method

A generalization that deals with infeasible initial points and iterates
- let x be a point that we do not assume to be feasible
- find $x+\Delta x_{nt}$ that solves the second-order approximation
$\min \hat{f}(x+v)=f(x)+\nabla f(x)^T v + (1/2)v^T\nabla^2 f(x) v$ s.t. $A(x+v) =b$

primal Newton step, dual Newton step

primal-dual Newton step (an alternative way to derive)
- write optimality condition as $r(y) = 0$, where 
$y=(x,v)$, $r(y) = (\nabla f(x) + A^T v, Ax - b )$ can be understood as $(r_{dual} (x,v) , r_{pri} (x,v)$
- linearizing $r(y) = 0$ gives $r(y+ \Delta y) \approx r(y) + D r(y) \Delta y = 0$
`first-order approx of first-order approx = second-order`


- given starting point $x\in dom f, v$, tolerance $\epsilon > 0$, $\alpha\in (0, 1/2)$, $\beta \in (0,1)$.
- repeat
1. Compute primal aand dual Newton steps $\Delta x_{nt}$, $\Delta v_{nt}$
2. Backtracking line search on $||r||_2$. $t:=1$.
while $||r(x+t\Delta x_{nt}, v+t \Delta v_{nt})||_2 > (1-\alpha t) || r(x,v)||_2$, $t:=\beta t$
3. Update. $x:=x+t\Delta x_{nt}$, $v:=v+ t\Delta v_{nt}$.
- until $Ax = b$ and $||r(x,v) || \le \epsilon$.

$r(y) + D r(y) \Delta y = 0$

- not a decent method: 
- the norm of $r$ decreases in the Newton's direction:
- if $t=1$, the next iterate will be feasible, and all the following iterates will be feasible

## Algorithms for Equality Constrained Optimization


## Interior-Point methods

### basic ideas

move inequality constraint to objective function via `indicator functions`
- $\min f_0(x) + \sum_{i=1}^m I_- (f_i(x))$
- s.t. $Ax=b$

where $I_-(u)=0$ if $u\le 0$, $I_-(u)=\infty$ otherwise (indicator function of $R_-$).

#### Logarithmic barrier function

approximation via `logarithmic barrier`: fix some $t>0$
- $\min f_0(x) - (1/t) \sum_{i=1}^m \log (-f_i(x))$
- s.t. $Ax=b$

$\phi(x)=-\sum_{i=1}^m \log(-f_i(x))$, $dom \phi = \{x\mid f_1(x) \le 0, \dots, f_m(x)<0 \}$
- convex (follows from composition rules)
- twice continuously differentiable, with `gradient Hessian`

becomes
- $\min f_0(x) - (1/t)\sum \log(-f_i(x))$
- s.t. $Ax=b$

- difficult to minimize using Newton's method (from a random starting point) when t is large

because Hessian varies rapidly near boundary of feasibility set

- can be circumvented by solving a sequence of problems with increasing $t$

startnig each Newton minmization from the solution to the problem with previous $t$

#### Central path

- central paths: $\{ x^*(t) \mid t>0 \}$
- $x^*(t)$: central points

example: central path for an LP
- $\min c^T x$
- s.t. $a_i^T x \le b_i$, $i=1,\dots,6$

hyperplane $c^T x = c^T x^*(t)$ is tangent to level curve of $\phi$ through $x^*(t)$

* take the central path through interior of the feasible set

`analytic center` of a set of convex inequalities and linear equations
$f_i(x) \le 0$, $i=1,\dots,m$, $Fx=g$
is defined as the optimal point of 
- $\min -\sum_i^m \log(-f_i(x))$
- s.t. $Fx=g$
analytic center of linear inequalities $a_i^T x\le b_i$, $i=1,\dots,m$

$x_{ac}$ is minimizer of $\phi(x) = -\sum_i^m \log(b_i-a_i^T x)$

### Barrier method (one interior-point method)

- given strictly feasible $x$, $t:=t^{(0)}>0$, $\mu>1$, tolerance $\epsilon > 0$.
repeat 
1. Centering step. Compute $x^*(t)$ by minimizing $t f_0 + \phi$, subject to $Ax=b$
2. Update $x:=x^*(t)$
3. Stopping criterion. quit if 
4. increase $t$. $t:=\mu t$

choice of $\mu$ involves a trade-off: large $\mu$ means fewer outer interations, more inner (Newton) iterations; typical values: $\mu=10-20$
(for more practical choices of parameters, pp.570, textbook)

#### dual points from central path

every $x^*(t)$ corresponds to a dual feasible point (`of the original inequality constrained problem`)
$\lambda_i^*(t) = 1/(-tf_i(x^*(t)))$ and $\nu^*(t) = w/t$

verification:
$x^*(t)$ solves 
- $\min t f_0(x) + \phi(x)$
- s.t. $Ax=b$

implies 
- $Ax^* = b$, $f_i(x^*) < 0$, $i=1, \dots, m$
- $\exists w$, $t\nabla f_0(x^*) + \sum \frac{1}{-f_i(x^*)} \nabla f_i(x^*) + A^T w=0$

implies 
- $x^*(t)$ minimizes the Lagrangian (of the `original problem`)
- $L(x,\lambda^*(t), \nu^*(t))=f_0(x) + \sum \lambda_i^*(t) f_i(x) + \nu^*(t)^T (Ax-b)$ 
- at: $\lambda_i^*(t) = 1/(-t f_i(x^*(t)))$ and $\nu^*(t) = w/t$

Duality gap $m/t$: 
$f_0(x^*(t)) \ge p^* \ge d^* \ge g(\lambda^*(t), \nu^*(t)) = L(x^*(t), \lambda^*(t),\nu^*(t))=f_0(x^*(t))-m/t$

#### Interpretation via KKT condition
$x=x^*(t)$, $\lambda=\lambda^*(t)$, $\nu=\nu^*(t)$ satsify
1. primal constraints: $f_i(x)\le 0$, $i=1,\dots,m$, $Ax=b$
2. dual constraints: $\lambda\succeq 0$
3. approximate complementary slackness: 
4. 

#### Convergnce 
The number of steps to converge within tolerance $\epsilon$:

plus the initial centering step (to compute $x^*(t^{(0)})$)

Example: geometric program