## References 

1. Ruszczyński, Andrzej. Nonlinear Optimization. Princeton University Press, Princeton, New Jersey, 2006.

# 8 Subgradient Method

In this chapter we develop optimization startegies dealing with non-differentiable functions.  In the discussions below we would assume that $f$ is convex.

### Subgradient

Recall that for a differentiable convex function $f$, the gradient $\nabla f$ has the property that 
$$f(y)\geqslant f(x) + \nabla f(x)^T(y - x).$$

In a same manner, we define the **subgradient** $g$ of a non-differentiable but convex function $f$ if $g$ satisfies that 
$$f(y)\geqslant f(x)+g(x)^T(y-x).$$
It is also a supporting hyperplane of the epigraph at $x$.

### Subdifferential

The subdifferential of $f$ at $x$ is the set of all subgradients:
$$\partial f(x) = \{g:\ g^T(y - x)\leqslant f(y) - f(x)\quad  \forall y\in {\rm dom}(f )\}.$$

## Basic Rules


### Calculus

The proof to the basic properties below can be found in [1]. If $f$ is convex and has nonempty and open domain, then

1. $\partial (\alpha f) = \alpha \partial f$ for $\alpha>0$. 
2. $\partial (f_1 + f_2) = \partial f_1 + \partial f_2$. 
3. If $g(x) = f(Ax+b)$ is an affine transformation, then $$\partial g = A^T\partial f(Ax+b).$$
4. If $f(x) = \max_{1\leqslant i\leqslant n}f_i(x)$ is a pointwise maximum, then
$$\partial f(x) = {\rm Conv\ Hull\ }\bigcup_{ f_i(x)= f(x)}  \partial f_i.$$

### Optimality Conditions
 
A point $x$ minimizes $f(x)$ if and only if $0\in \partial f(x)$. The proof is trivial.


### Examples




#### Maximum of Linear

Let $f(x) = \max_{1\leqslant i\leqslant m}a_i^Tx+b_i$, characterize the minimizer of $f$.

Solution: It is clear that $f$ is convex. So for arbitrary $x$, 
$$\partial f(x) = {\rm Conv\ Hull\ }\bigcup_{ f_i(x)= f(x)}  \{a_i\}.$$
Assume $0\in \partial f(x)$, then $0$ lies in the convex hull of the active sets means that it is the convex combination of the active $\{a_i\}$. Thus there exists $\lambda_i\geqslant 0$ such that
$$0 = \sum_{i=1}^m \lambda_i a_i\quad{\rm and}\quad \lambda_i = 0 {\ \rm if\ }f_i(x)\neq f(x)\quad{\rm and}\quad \sum_{i=1}^m\lambda_i = 1.$$

It is equivalent to KKT (after introducing the slackness variable) and the second constraint corresponds to the complementary slackness.

## Indicator

Define an indicator $I_C(x)$ by $I_C(x) = \left\{\begin{array}{ll}0 & x\in C\\ +\infty & x\notin C\end{array}\right.$, then for $x\in C$, $\partial I_C(x) = N_C(x)$ is the normal cone where $$N_C(x) = \{g:\ g^T(y - x)\leqslant 0\quad \forall y\in C\}.$$

Proof: As we require $f(y)\geqslant f(x) + g^T(y - x) = g^T(y-x)$. When $y\in C$, it turns to $0\geqslant g^T(y-x)$.

### Minimum on a Set
Suppose $f$ is convex over a domain $C$, then $\min_{x\in C}f(x)$ = $\min_x f(x) + I_C(x)$.


### Projection

Let $\prod_C(x)$ be the projection of $x$ on a set $C$, a point $y$ in $C$ that has minimum distance to $x$,
$$y = \prod_C(x) = {\rm argmin}_{y\in C}\frac12 \Vert y - x\Vert_2^2
= {\rm argmin}_{y}\frac12 \Vert y - x\Vert_2^2 + N_C(y).$$
Hence it has subdifferential over $C$, 
$$\{y - x+g:\quad g^T(u-y)\leqslant 0\quad\forall u\in C\}.$$
Let $0$ fall in the set to reach a minimum. Now $g = x-y$ and 
$$(x - y)^T(u - y)\leqslant 0 \quad \forall u \in C.$$ 