# Optim HW 2 : Gradient methods
___
Martin Guyard

**Due date: 20 Nov. 2019**
___

## Problem 1 
___
Goal is to minimize unconstrained $f$:

$$ f(x_1,x_2,...,x_m) = \sum_{i=1}^{m} a_1\cdot(x_i-b_i)^2 + 3 $$
 
$$ x_{i}^{k+1} = x_{i}^{k} - t\nabla{f(x_{i}^{k})} $$
$$ x_{i}^{k+1} = x_{i}^{k} - t\cdot2\cdot a_i(x_{i}^{k} - b_i) $$

___
**Analytical solution**:

$f$ is minimum when $\forall i, x_i = b_i$. Moreover we can solve this problem with gradient descent in one step with:
$\forall i, x_{i}^{k} - t\cdot2\cdot a_i(x_{i}^{k} - b_i) = b_i $, means that $\forall{i},  2\cdot t \cdot a_i = 1$.

**One trivial solution is $t=0.5$ and $\forall i, a_i=1$**
___
**Convergence analysis**:

The gradient descent algorithm converge on $t<1/L$ given that $f$ is convex and $L$ Lipschitz.

$f$ is $L$ Lipschitz if $\| \nabla f(x) - \nabla f(y)\|_2 \leq L\|x-y\|_2$


$\|A(X-B) - A(Y-b)\|_2 \leq L \|X-Y\|_2$

$\|AX - AY\|_2 \leq L \|X-Y\|_2$

$\|A(X-Y)\|_2 \leq L \|X-Y\|_2$

$max(A)\|X-Y\|_2 \leq L \|X-Y\|_2$

$L \geq max(A)$

**if $\forall i, a_i=1$, then the convergence criterium is $L \geq 1$ and $t \leq 1$**

Gradient (and gradient+backtracking) convergence rate is $O(1/k)$
___

### Gradient Descent 

In [187]:
from math import sqrt
from random import randrange

def norm(K, L=None):
    # return euclidean distance between vector K and L
    if L:
        X = [k-l for k, l  in zip(K, L)]
    else:
        X = K
    return sqrt(sum(x**2 for x in X))

In [341]:
def gradient_descent(X, A, B, epsilon=1e-6, step=0.5):
    assert type(X) == type(A) == type(B) == list
    assert len(X) == len(A) == len(B)
    
    f_grad = lambda X: [2*a*(x-b) for x, a, b in zip(X, A, B)]
    update = lambda X: [x-step*xf for x, xf in zip(X, f_grad(X))]
    
    X_k = list(X)
    X_k1 = update(X_k)
    
    steps = 0
    while norm(X_k, X_k1) >= epsilon:
        X_k2 = update(X_k1)
        X_k = list(X_k1)
        X_k1 = list(X_k2)
        steps += 1
        
    print("Gradient descent steps:", steps)
    print("[ step size t =", step, ', convergence criterium 1/t <', max(A),']\n')
    return X_k    

### Gradient Descent with Backtracking

In [354]:
def gradient_descent_bt(X, A, B, epsilon=1e-6, step=0.5, alpha=0.5, beta=0.5):
    assert type(X) == type(A) == type(B) == list
    assert len(X) == len(A) == len(B)
    assert 0 < alpha <= 0.5 and 0 < beta < 1
    
    f = lambda X: sum([(a*(x-b)**2)+3 for x, a, b in zip(X, A, B)])
    f_grad = lambda X: [2*a*(x-b) for x, a, b in zip(X, A, B)]
    update = lambda X: [x-step*xf for x, xf in zip(X, f_grad(X))]
    
    X_k = list(X)
    X_k1 = update(X_k)
    
    steps = 0
    while norm(X_k, X_k1) >= epsilon:
        while f(X_k1) > f(X_k) - alpha * step * norm(f_grad(X_k)) ** 2:
            step *= beta
            X_k1 = update(X_k)
        else:
            X_k2 = update(X_k1)
            X_k = list(X_k1)
            X_k1 = list(X_k2)
        steps += 1
    
    print('Gradient descent with backtracking steps:', steps)
    print('[ step size t =', step, ', alpha =', alpha, ', beta =', beta, ']\n')
    return X_k 

___
Assume that:

- $ m = 500 $

- $ \forall i, a_i = 1 $

- $ \forall i, b_i \sim U(0, 100) $

- $ \forall i, x_i = 0 $ for initial x

**As we already analyticaly solved this precise problem, I choose step size t = 1**

**The stopping condition is $d(x_{k+1}, x_{k}) \leq \varepsilon$ with default $\varepsilon = 1e^{-6}$ and $d(.)$ the euclidean distance**

**Note: The stopping condition could also be X == B or norm(f_grad(X)) <= epsilon**

In [355]:
m = 500
X = [0] * m
A = [1] * m
B = [randrange(0, 100) for i in range(m)]

In [356]:
# step size = 0.5
gd_res = gradient_descent(X, A, B)

# step size = 0.5, alpha = 0.5, beta = 0.5
gd_bt_res = gradient_descent_bt(X, A, B)

Gradient descent steps: 1
[ step size t = 0.5 , convergence criterium 1/t < 1 ]

Gradient descent with backtracking steps: 1
[ step size t = 0.5 , alpha = 0.5 , beta = 0.5 ]



**The convergence speed is the same with step size t=0.5 since this solve the problem analytically as shown above.**

___
Assume that:

- $ m = 500 $

- $ \forall i, a_i \sim U(1, 100) $

- $ \forall i, b_i \sim U(1, 100) $

- $ \forall i, x_i = 0 $ for initial x

**I choose step size t=1/100 since max(A) < 100**

**The stopping condition is $d(x_{k+1}, x_{k}) \leq \varepsilon$ with default $\varepsilon = 1e^{-6}$ and $d(.)$ the euclidean distance**

**Note: The stopping condition could also norm(f_grad(X)) <= epsilon**

In [371]:
m = 500
X = [0] * m
A = [randrange(1, 100) for i in range(m)]
B = [randrange(1, 100) for i in range(m)]

In [358]:
# step size = 0.5
gd_res = gradient_descent(X, A, B, step=1/100)

# step size = 0.5, alpha = 0.5, beta = 0.5
gd_bt_res = gradient_descent_bt(X, A, B, step=1/100)

Gradient descent steps: 967
[ step size t = 0.01 , convergence criterium 1/t < 99 ]

Gradient descent with backtracking steps: 1375
[ step size t = 0.005 , alpha = 0.5 , beta = 0.5 ]



**The convergence speed is slower with backtracking as it is taking more steps, however backtracking advantage is not to converge faster, but to always converge, with batcktracking step size can be anything since the backtrack is updating the step size**

## Problem 2
___
Goal is to minimize **constrained** $f$:

$$ f(x_1,x_2,...,x_m) = \sum_{i=1}^{m} a_1\cdot(x_i-b_i)^2 + 3 $$
$$\forall i, x_i \geq 0 $$
$$\sum_{i=0}^{m} x_i \leq 100 $$
<br>
<br>

#### Anaytical solution

This problem has no equalities constraints, the Lagrangian is simply:
<br>$L(x, \lambda) = f(x) - \lambda h(x)$ where $x=(x_1, ... , x_m)$ and $h(x)=(\sum_{i=1}^{m} x_i) -100$
<br>
<br>
KKT conditions:
- primal feasibility: $\forall i, x_i \geq 0$ and $\sum_{i=1}^m x_i -100 = 0$
- dual feasibility: $\lambda \geq 0$
- complementary slackness: $x_i(2a_i(x_i-b_i)-\lambda)=0$ and $\lambda (\sum_{i=1}^{m} x_i - 100) = 0$


<br>
We consider $a_i, b_i \geq 1$
<br>
<br>
With complementary slackness, $xi=b_i+\frac{\lambda}{2a_i}\geq0$
<br>

**So: $x_i=b_i+\frac{\lambda}{2a_ib_i}$**

Then $\lambda (\sum_{i=1}^{m} x_i - 100) = 0$ implies that either $\lambda = 0$ or $\sum_{i=1}^{m} x_i - 100=0$

if $\lambda = 0$ then $x_i=b_i$ but $\sum_{i=1}^{m} b_i \leq 100$ might be wrong.

My assumption is: $\sum_{i=1}^{m} x_i - 100=0$

So we have: $\sum_{i=1}^{m} b_i+\frac{\lambda}{2a_ib_i} = 100$

<br>
<br>

**From here our analytical solution is easy to find:**

$\lambda = \frac{100-\sum_{i=1}^{m}b_i}{\sum_{i=1}^{m}\frac{1}{2a_ib_i}}$
<br>
$\forall i, x_i=b_i+\frac{\lambda}{2a_ib_i}$
<br>
Thus $\forall i, x_i=b_i+\frac{\frac{100-\sum_{i=1}^{m}b_i}{\sum_{i=1}^{m}\frac{1}{2a_ib_i}}}{2a_ib_i}$



In [403]:
m = 500
X = [0] * m
A = [randrange(1, 100) for i in range(m)]
B = [randrange(1, 100) for i in range(m)]

In [404]:
def heuristic(X, A, B):
    f = lambda X: sum([((a*(x-b)**2)+3) for x, a, b in zip(X, A, B)])
    lmbda = (100-sum(B))/(sum([1/(2*a*b) for a, b in zip(A, B)]))
    X_k = [b+lmbda/(2*a*b) for a, b in zip(A, B)]
    print(sum(X_k), sum(B))
    print(f(X) < f(X_k))

heuristic(X, A, B)
    
def dual_ascent(X, A, B, epsilon=1e-6, step=0.5):
    pass

99.99999999998366 24785
True


**dual ascent YT tuto: https://www.youtube.com/watch?v=HOx-fZ01VnY**

**exercise: https://bdesgraupes.pagesperso-orange.fr/UPX/Master1/MNM1_corr_doc2.pdf**