[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RikVoorhaar/optimization-II-2021/blob/master/notebooks/week10.ipynb)

# Week 10

## Exercise 1
<div class="alert alert-info"> Exercise

Consider the following convex problem
$$
	\begin{array}{ll}
	\mbox{minimize} & f_0(x)\\
	\mbox{subject to} & f_i(x)\leq0\,,\quad i=1,\ldots,m\,.
	\end{array}
$$
    
where all $f_i$ are differentiable. Assume that $x^*\in\mathbb{R}^n$ and $\lambda^*\in\mathbb{R}^m$ satisfy the KKT conditions. Prove by using properties of convex functions and the KKT conditions that
$$
	\nabla f_0(x^*)^T(x-x^*)\geq0
$$

for all feasible $x$. (We have seen a similar result when $x^*$ is the global minimum. Here you need to prove it for a point that satisfies the KKT conditions.)

_Hint: Show that_ $(x^*,\lambda^*)$ _satisfies_
$$\sum_{i=1}^m \lambda_i^*(f_i(x^*)+\nabla f_i(x^*)^\top (x-x^*)) = -\nabla f_0(x^*)^\top (x-x^*)$$

_for any_ $x$_, and use feasibility of_ $x$ _to prove that the left-hand side is negative._

</div>

## Exercise 2: Inequalities for steepest descent
<div class="alert alert-info"> Exercise

In this exercise we will derive some inequalities that will be important for the analysis of gradient descent.

Recall that for $\beta>0$ a function $f\colon \mathbb R^n\to \mathbb R$ is called $\beta$-smooth if $f$ is differentiable and for all $x,y\in \mathbb R^n$ we have the following inequality:
$$
    \| \nabla f(x) - \nabla f(y)\|\leq \beta\|x-y\|
$$
    
(in other words, the gradient is $\beta$-Lipschitz).


</div>

### Exercise 2a)

<div class="alert alert-info"> Exercise

Let $f$ be $\beta$-smooth, show that for all $x,y\in \mathbb R^n$ we have 
$$
    |f(x)-f(y)-\nabla f(y)^\top (x-y)|\leq \frac\beta2\|x-y\|^2
$$

_Hint: Write_ $f(x)-f(y)$ _as_ 
$$
\int_0^1 \frac{d}{dt} f(y+t(x-y))\,\mathrm dt= \int_0^1\nabla f(y+t(x-y))^\top (x-y)\,\mathrm dt,
$$
    
_Use this to write the entire expression under the absolute value signs as an integral over an inner product_ $\int_0^1 a(t)^\top b(t)\,\mathrm dt$_, and apply Cauchy-Schwarz to this inner product._

_Remark: If we also assume that_ $f$ _is convex, then we can drop the absolute value sign, since this expression is always positive for convex functions._

</div>

### Exercise 2b)

<div class="alert alert-info"> Exercise

Assume $f$ is convex and $\beta$-smooth. Show that for all $x,y\in\mathbb R^n$ we have
$$
    f(x)-f(y)\leq \nabla f(x)^\top (x-y)-\frac1{2\beta} \|\nabla f(y)-\nabla f(x)\|^2
$$

_Hint: define_ $z=y-\frac1\beta (\nabla f(y)-\nabla f(x))$ _and consider_ 
$$f(x)-f(y) = (f(x)-f(z))+(f(z)-f(y)).
$$
    
_Use convexity to bound the first term, and use the inequality of 2a) to bound the second term._

_Remark: We only need_ $\beta$_-smoothness here to use the inequality of 2a), so if_ $f$ _is convex and satisfies the inequality of 2a), it also satisfies the inequality of this exercise. This observation will be useful in 2d)._

</div>

### Exercise 2c)
<div class="alert alert-info"> Exercise

Prove the converse of 2b), that is, prove that if $f:\mathbb R^n\to \mathbb R$ is differentiable and if for all $x,y\in\mathbb R^n$ we have 
$$
    f(x)-f(y)\leq \nabla f(x)^\top (x-y)-\frac1{2\beta} \|\nabla f(y)-\nabla f(x)\|^2
$$

then $f$ is convex and $\beta$-smooth. To show $\beta$-smoothenss, use the inequality of 2b) to show the stronger result that
$$
\|\nabla f(y)-\nabla f(x)\|^2\leq  \beta (\nabla f(x)-\nabla f(y))^\top (x-y)
$$

</div>

### Exercise 2d)
<div class="alert alert-info"> Exercise

Suppose that $f\colon \mathbb R^n\to \mathbb R$ is $\alpha$-strongly convex, and $\beta$-smooth. Show that for all $x,y\in\mathbb R^n$ we have
$$
    (\nabla f(x)-\nabla f(y))^\top (x-y) \geq \frac{\alpha\beta}{\beta+\alpha}\|x-y\|^2 +\frac{1}{\beta-\alpha}\|\nabla f(x) - \nabla f(y)\|^2
$$

This result is useful for improving an easier convergence result of gradient descent. 

_Hint: Use_ $\alpha$_-strong convexity to show that_ $\phi(x):=f(x)-\frac\alpha2 \|x\|^2$ _is convex and_ $(\beta-\alpha)$_-smooth. To show_ $(\beta-\alpha)$_-smoothness, rewrite the inequality of 2a) in terms of_ $\phi$. _Then finally apply the stronger inequality mentioned in 2c) to_ $\phi$ _to obtain the required result after some algebraic manipulation._

</div>

## Exercise 3: Boolean least squares
<div class="alert alert-info"> Exercise

Let $A\in \mathbb R^{m\times n}$ and $b\in \mathbb R^m$. We consider the Boolean least squares problem
$$
\begin{array}{ll}
    \text{minimize} & \|Ax-b\|^2\\
    \text{subject to} & x_i\in \{-1,\,1\}, \quad i=1,\dots,n
\end{array}
$$

This is not a convex problem, and we thus want to relax it to a convex problem giving a useful lower bound.

</div>

### Exercise 3a)
<div class="alert alert-info"> Exercise

Show that the Boolean least squares problem is equivalent to 
$$
\begin{array}{ll}
\text{minimize} & \operatorname{tr}(A^\top A X)-2b^\top Ax+b^\top b \\ 
\text{subject to} & X = xx^\top\\
& X_{ii} = 1,\quad i=1,\dots,n
\end{array}
$$

Where we consider the minimization problem with variables $x\in \mathbb R^n$ and $X\in\mathbb S(n)$.

</div>

### Exercise 3b)
<div class="alert alert-info"> Exercise

We want to write the objective function of 3a) as an SDP objective of form $\operatorname{tr}(BY)$. To this end, let 
$$
Y = \begin{pmatrix}
X & x \\ 
x^\top & 1
\end{pmatrix},\qquad B=\begin{pmatrix}C&d\\d^\top &\alpha\end{pmatrix}
$$

Find the symmetric block matrix $B$ such that
$$
\operatorname{tr}(A^\top A X)-2b^\top Ax+b^\top b = \operatorname{tr}(BY)
$$

</div>

### Exercise 3c)
<div class="alert alert-info"> Exercise

Let $B$ be as in the previous exercise. Show that the following SDP is a convex relaxation of the Boolean least squares problem, i.e. the solution gives a lower bound to the Boolean least squares problem:
$$
\begin{array}{ll}
\text{minimize} & \operatorname{tr}(BY) \\
\text{subject to} & Y\succeq0\\
& Y_{ii}=1
\end{array}
$$

</div>