# 9 Proximal Gradient Method

In the last section we have introduced the subgradient method, which has a slow convergence. Now we introduce the proximal gradient method, a better method for solving 
$$\min \left\{g(x)+h(x)\right\}$$
where $g$ is differentiable over $\mathbb R^n$ while $h$ is convex but not necessarily differentiable. We shall see how to solve it when $h(x)$ is somehow special.


### Proximal Gradient Method

In each step we iterate by
$$x_{k+1} = \argmin_x\left\{\nabla g(x_k)^T(x - x_k) + \frac{1}{2t_k}\Vert x - x_k\Vert^2 + h(x)\right\}.$$

### Special Cases

#### Newton's Method

When $h(x)\equiv 0$, then it degenerates to the Newton's method. 

Proof:
$$\nabla g(x_k)^T(x - x_k) + \frac{1}{2t_k}\Vert x - x_k\Vert^2 
= \frac{1}{2t_k} \Vert t_k\nabla g(x_k)  +( x - x_k)\Vert^2 - \frac{t_k}{2}\Vert g(x_k)\Vert^2
$$
So we choose $x_{k+1} = x_k -  t_k\nabla g(x_k)$, precisely the iteration in Newton's method.

#### Projected Subgradient Method

When $C$ is closed convex and $h(x)  = I_C(x) =\left\{\begin{array}{ll} 0 & x\in C\\ +\infty & x \notin C\end{array}\right.$ is the indicator, then it degenerates to the projected subgradient method.

Proof: Since $h(x) = +\infty$ as long as $x\notin C$, it suffices to consider the cases where $x\in C$, which is solving for
$$\argmin_{x\in C}\{\frac{1}{2t_k} \Vert t_k\nabla g(x_k)  +( x - x_k)\Vert^2 - \frac{t_k}{2}\Vert g(x_k)\Vert^2\}.$$
And it is clear that the minimizer should be the projection, i.e.
$$x_{k+1} = \prod_C (x_k - t_k \nabla g(x_k)).$$


### Proximal Mapping 

As claimed above, the key is to solve the minimization problem
$$\argmin_x\left\{\frac12 \Vert t_k\nabla g(x_k)  +( x - x_k)\Vert^2 +t_k h(x)\right\}. $$
If we use the notation
$${\rm prox}_f(x_0)= \argmin_x\{f(x) +\frac 12 \Vert x -x_0\Vert^2  \},$$
then the problem is equivalent to finding ${\rm prox}_{t_kh}(x_k)$. The notation is called the proximal mapping. 