Nota basada en [liga1](https://drive.google.com/file/d/1xtkxPCx05Xg4Dj7JZoQ-LusBDrtYUqOF/view?usp=sharing), [liga2](https://drive.google.com/file/d/16-_PvWNaO0Zc9x04-SRsxCRdn5fxebf2/view?usp=sharing)

# Problemas de optimización sin restricciones

En esta nota se consideran resolver problemas de la forma:

$$\min f_o(x)$$

con $f:\mathbb{R}^n \rightarrow \mathbb{R}$ convexa y $f \in \mathcal{C}^2(\text{dom}f)$.

También se asume que existe un punto óptimo $x^*$ por lo que el problema tiene solución y el valor óptimo se denota por $p^* = f(x^*) = \inf f(x)$

Por lo anterior una **condición necesaria y suficiente** para que $x^*$ sea óptimo es: $\nabla f(x^*) = 0$ que **en general** es un conjunto de $n$ **ecuaciones no lineales** en $n$ variables y que resuelve el problema de optimización planteado al inicio. 


**Ejemplos:**

1)$$\displaystyle \min_{x \in \mathbb{R}^2} x_1^4+2x_1^2x_2+x_2^2$$

Entonces:

$$
\nabla f(x) = 
\left [
\begin{array}{c}
4x_1^3+4x_1x_2\\
2x_1^2+2x_2
\end{array}
\right ]=0
$$

que es una ecuación de dos variables y dos incógnitas **no lineal**.

2) $$\displaystyle \min_{x \in \mathbb{R}^2} \frac{1}{2}x^TPx+q^Tx+r$$

con $P=\left [\begin{array}{cc}
5 & 4\\
4 & 5
\end{array}
\right ]$, $q=\left [\begin{array}{c}
-1\\
1
\end{array}
\right]
$, $r=3$. Obsérvese que haciendo las multiplicaciones de matriz-vector y productos punto se reescribe el problema como:



$$\displaystyle \min_{x \in \mathbb{R}^2} \frac{5}{2}x_1^2 + \frac{5}{2}x_2^2+4x_1x_2 -x_1 + x_2+3$$ 

Entonces:

$$\nabla f(x) = Px +q =\left [ \begin{array}{cc}
5 & 4\\
4 & 5
\end{array}
\right ]
\left [ \begin{array}{c}
x_1\\
x_2
\end{array}
\right ]
+ \left [ \begin{array}{c}
-1\\
1
\end{array}
\right ]=
\left [ \begin{array}{cc}
5x_1+4x_2-1\\
4x_1+5x_2+1
\end{array}
\right ]
=0
$$

que es una ecuación en dos variables con dos incógnitas **lineal**.

**Comentario:** en algunos casos especiales es posible resolver la ecuación no lineal $\nabla f(x) = 0$ para $x$ de forma analítica o cerrada. Este es el caso del ejemplo $2$ anterior la cual está dada por $x^* = -P^{-1}q$:

In [1]:
import numpy as np

In [3]:
P=np.array([[5,4],[4,5]])
q=np.array([-1,1])
np.linalg.solve(P,-q)

array([ 1., -1.])

pero típicamente se utiliza un algoritmo iterativo: calcular una secuencia de puntos $x^{(0)}, x^{(1)}, \dots \in \text{dom}f$ con $f(x^{(k)}) \rightarrow p^*$ si $k \rightarrow \infty$. El conjunto de puntos $x^{(0)}, x^{(1)},\dots$ se nombra **secuencia de minimización** para el problema de optimización. El algoritmo termina si $f(x^{(k)})-p^* \leq \epsilon$ con $\epsilon >0$ una tolerancia dada.

## Método de búsqueda de línea por *backtracking*

## Métodos de descenso

>Algoritmo de descenso
>> Punto inicial


## Método de máximo de descenso

## Influencia del número de condición 

## Ejemplos

In [1]:
import numpy as np
import math

In [2]:
def inc_index(vec,index,h):
    '''
    Auxiliary function for gradient and Hessian computation.
    Args:
        vec (array): numpy array
        index (int): index 
        h (float):   quantity that vec[index] will be increased
    Returns:
        vec (array): 
    '''
    vec[index] +=h
    return vec

In [3]:
def dec_index(vec,index,h=1):
    '''
    Auxiliary function for gradient and Hessian computation.
    Args:
        vec (array): numpy array
        index (int): index 
        h (float):   quantity that vec[index] will be decreased
    Returns:
        vec (array): 
    '''
    vec[index] -=h
    return vec

In [4]:
def gradient_approximation(f,x,h=1e-8):
    '''
    Numerical approximation of gradient for function f using forward differences.
    Args:
        f (lambda expression): definition of function f
        x (array): numpy array that holds values where gradient will be computed
        h (float): step size for forward differences, tipically h=1e-8
    Returns:
        gf (array):
    '''
    n = x.size
    gf = np.zeros(n)
    f_x = f(x)
    for i in np.arange(n):
        inc_index(x,i,h)
        gf[i] = f(x) - f_x
        dec_index(x,i,h)
    return gf/h

In [5]:
def Hessian_approximation(f,x,h=1e-6):
    '''
    Numerical approximation of Hessian for function f using forward differences.
    Args:
        f (lambda expression): definition of function f
        x (array): numpy array that holds values where Hessian will be computed
        h (float): step size for forward differences, tipically h=1e-6
    Returns:
        Hf (array):
    '''
    n = x.size
    Hf = np.zeros((n,n))
    f_x = f(x)
    for i in np.arange(n):
        inc_index(x,i,h)
        f_x_inc_in_i = f(x)
        for j in np.arange(i,n):
            inc_index(x,j,h)
            f_x_inc_in_i_j = f(x)
            dec_index(x,i,h)
            f_x_inc_in_j = f(x)
            dif = f_x_inc_in_i_j-f_x_inc_in_i-f_x_inc_in_j+f_x
            Hf[i,j] = dif
            if j != i:
                Hf[j,i] = dif
            dec_index(x,j,h)
            inc_index(x,i,h)
        dec_index(x,i,h)
    return Hf/h**2

In [7]:
def line_search_by_backtracking(f,dir_desc,x,
                                der_direct, alpha=.15, beta=.5):
    """
    Line search that sufficiently minimizes f restricted to a ray.
    Args:
        alpha (float): parameter in line search with backtracking, tipically .15
        beta (float): parameter in line search with backtracking, tipically .5
        f (lambda expression): definition of function f
        dir_desc (array): descent direction
        x (array): numpy array that holds values where line search will be performed
        der_direct (float): directional derivative of f
    Returns:
        t (float):
    """
    t=1
    if alpha > 1/2:
        print('alpha must be less than or equal to 1/2')
        t=-1
    if beta>1:
        print('beta must be less than 1')
        t=-1;   
    if t!=-1:
        eval1 = f(x+t*dir_desc)
        eval2 = f(x) + alpha*t*der_direct
        while eval1 > eval2:
            t=beta*t
            eval1=f(x+t*dir_desc)
            eval2=f(x)+alpha*t*der_direct
    else:
        t=-1
    return t

### Descenso en gradiente

In [8]:
def compute_error(x_obj,x_approx):
    if np.linalg.norm(x_ast) > np.nextafter(0,1):
        Err=np.linalg.norm(x_obj-x_approx)/np.linalg.norm(x_obj)
    else:
        Err=np.linalg.norm(x_obj-x_approx)
    return Err

In [9]:
def compute_error_with_sign(x_obj,x_approx):
    if np.linalg.norm(x_ast) > np.nextafter(0,1):
        Err=(x_obj-x_approx)/np.linalg.norm(x_obj)
    else:
        Err=x_obj-x_approx
    return Err

In [10]:
def gradient_descent(f, x_0, tol, 
                     tol_backtracking, x_ast=None, p_ast=None, maxiter=30):
    '''
    Gradient descent to numerically approximate solution of min f
    Args:
        f (lambda expression):
        x_0 (array):
        tol (float):
        tol_backtracking (float):
        x_ast (array):
        p_ast (float):
        maxiter (int):
    Returns:
        (list): 
    '''
    iter = 0
    x = x_0
    
    feval = f(x)
    #gradient_approximation = lambda x: np.array([2*(x[0]-2),
    #                    -2*(2-x[1]),
    #                    2*x[2],
    #                    4*x[3]**3])
    gfeval = gradient_approximation(f,x)
    #gfeval = gradient_approximation(x)

    normgf = np.linalg.norm(gfeval)
    
    Err_plot_aux = np.zeros(maxiter)
    Err_plot_aux[iter]=math.fabs(feval-p_ast)
    
    Err = compute_error(x_ast,x)
    n = x.size
    x_plot_aux = np.zeros((n,maxiter))
    x_plot_aux[:,0] = x
    
    print('Iter   Normagf   Error_x_ast   Error_p_ast   backtracking_result')
    print('{}    {:0.2e}    {:0.2e}    {:0.2e}    {}'.format(iter,normgf,Err,Err_plot_aux[iter],"---"))
    
    while(normgf>tol and iter < maxiter):
        dir_desc = -gfeval
        der_direct = gfeval.dot(dir_desc)
        t = line_search_by_backtracking(f,dir_desc,x,der_direct)
        x = x + t*dir_desc
        feval = f(x)
        gfeval = gradient_approximation(f,x)
        #gfeval = gradient_approximation(x)
        normgf = np.linalg.norm(gfeval)
        Err = compute_error(x_ast,x)
        iter+=1
        Err_plot_aux[iter] = math.fabs(feval-p_ast);
        x_plot_aux[:,iter-1] = x
        print('{}    {:0.2e}    {:0.2e}    {:0.2e}    {:0.2e}'.format(iter,normgf,Err,Err_plot_aux[iter],t))
        if t<tol_backtracking: #valor de backtracking es menor a tolerancia2, revisar motivo...
            iter_salida=iter
            iter = maxiter
        
    print('{} {:0.2e}'.format("Error utilizando valor de x_ast:",Err))
    print('{} {}'.format("Valor aproximado a x_ast:", x))
    cond = Err_plot_aux > np.nextafter(0,1)
    Err_plot = Err_plot_aux[cond]
    aux_diferencia_x_plot_aux = compute_error_with_sign(x_ast,x)
    cond = (np.linalg.norm(aux_diferencia_x_plot_aux,axis=0) > np.nextafter(0,1)) & (np.sum(aux_diferencia_x_plot_aux,axis=0)!=0)
    #if np.sum(cond)!=0:
    #    x_plot[:,1:1+sum(cond)] = x_plot_aux[:,cond]
    if iter == maxiter and t < tol_backtracking:
        print("Backtracking value less than tol_backtracking, check approximation")
        iter=iter_salida
    return [x,iter,Err_plot,x_plot_aux]

In [11]:
f = lambda x: (x[0]-2)**2 + (2-x[1])**2 + x[2]**2 + x[3]**4
x_ast = np.array([2.0,2.0,0.0,0.0])
x_0 = np.array([5.0,5.0,1.0,0.0])
tol=1e-8
tol_backtracking=1e-14
maxiter=50
p_ast=f(x_ast)
[x,iter,Err_plot,x_plot_aux]=gradient_descent(f, x_0, tol, tol_backtracking, x_ast, p_ast, maxiter)

Iter   Normagf   Error_x_ast   Error_p_ast   backtracking_result
0    8.72e+00    1.54e+00    1.90e+01    ---
1    3.53e-07    6.10e-08    2.98e-14    5.00e-01
2    2.14e-15    3.06e-09    7.50e-17    5.00e-01
Error utilizando valor de x_ast: 3.06e-09
Valor aproximado a x_ast: [ 2.e+00  2.e+00 -5.e-09  0.e+00]


In [12]:
gamma_cte=5;
f = lambda x: 1/2*(x[0]**2+gamma_cte*x[1]**2)
x_ast=np.array([0.0,0.0])
x_0 = np.array([.5,.5])
tol=1e-8
tol_backtracking=1e-14
maxiter=50
p_ast=f(x_ast)
[x,iter,Err_plot,x_plot_aux]=gradient_descent(f, x_0, tol, tol_backtracking, x_ast, p_ast, maxiter)

Iter   Normagf   Error_x_ast   Error_p_ast   backtracking_result
0    2.55e+00    7.07e-01    7.50e-01    ---
1    7.29e-01    3.95e-01    1.09e-01    2.50e-01
2    3.22e-01    2.83e-01    4.20e-02    2.50e-01
3    2.73e-01    1.48e-01    1.54e-02    5.00e-01
4    1.21e-01    1.06e-01    5.91e-03    2.50e-01
5    1.02e-01    5.56e-02    2.16e-03    5.00e-01
6    4.52e-02    3.98e-02    8.30e-04    2.50e-01
7    3.84e-02    2.08e-02    3.04e-04    5.00e-01
8    1.70e-02    1.49e-02    1.17e-04    2.50e-01
9    1.44e-02    7.82e-03    4.28e-05    5.00e-01
10    6.36e-03    5.60e-03    1.64e-05    2.50e-01
11    5.41e-03    2.93e-03    6.01e-06    5.00e-01
12    2.39e-03    2.10e-03    2.31e-06    2.50e-01
13    2.03e-03    1.10e-03    8.46e-07    5.00e-01
14    8.95e-04    7.87e-04    3.25e-07    2.50e-01
15    7.60e-04    4.12e-04    1.19e-07    5.00e-01
16    3.36e-04    2.95e-04    4.57e-08    2.50e-01
17    2.85e-04    1.55e-04    1.67e-08    5.00e-01
18    1.26e-04    1.11e-04    6.

**Referencias:**

* S. P. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, 2009.
