[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io/)

# Optimization Methods

## Convex Optimization - Non Smooth Optimization - Sub Gradient Method

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 29/09/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0012LinearFitL1.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Optimization
import cvxpy as cp

# Miscellaneous
import os
import math
from platform import python_version
import random

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
from matplotlib.colors import LogNorm, Normalize, PowerNorm
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [2]:
# Configuration
%matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [3]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [4]:
# Course Packages


In [5]:
# Auxiliary Functions


In [6]:
# Parameters

# Data
numRows = 30 #<! Number of functions
numCols = 5  #<! Data Dimensions

# Solver
numIterations = 1_000_000

# Visualization
lLim = [-2, 2]

# # Verification
ε = 1e-3 #<! Error threshold

## Minimization of the Maximum of a Set of Functions

The problem is given by:

$$ \arg \min_{x} \max_{i} \boldsymbol{a}_{i}^{T} \boldsymbol{x} + b $$

Where the set of functions is defined by the matrix $\boldsymbol{A}$ where $\boldsymbol{a}_{i}$ is the $i$ -th row.

* <font color='brown'>(**#**)</font> The maximum function is non smooth.
* <font color='brown'>(**#**)</font> The function is _piece wise_ linear.



## Generate Data


The data will be generated in a vectorized manner.

In [7]:
# Generate / Load the Data

mA = np.random.randn(numRows, numCols)
vB = np.random.randn(numRows)


## Analysis

This section defines the problem and solve it using the _Sub Gradient Method_.

### Objective Function

The objective function in its vectorized form can be written as:

$$ \arg \min_{\boldsymbol{x}} \max \boldsymbol{A} \boldsymbol{x} + \boldsymbol{b} $$

In [8]:
# Objective Function

#===========================Fill This===========================#
# 1. Implement the objective function. 
#    Given a vector of `vX` it returns the objective.
# 2. The implementation should be using a Lambda Function.
# !! Pay attention to the difference between `np.max()` and `np.maximum()`.

hObjFun = lambda vX: np.max(mA @ vX + vB)
#===============================================================#

### Sub Gradient Function

The [_Sub Gradient_](https://en.wikipedia.org/wiki/Subderivative) is a generalization of the _Gradient_.  
For a _Convex Function_ it fulfills the properties of the _Gradient_ in the context of stationary points.


#### Question 001

Derive the sub gradient of the objective function:

$$ f \left( \boldsymbol{x} \right) = \max \boldsymbol{A} \boldsymbol{x} + \boldsymbol{b} $$

### Solution 001

For intuition, assume $f \left( x \right) = \max \left\{ {f}_{1} \left( x \right), {f}_{2} \left( x \right) \right\}$, then:

 - Where ${f}_{1} \left( x \right) > {f}_{2} \left( x \right)$ then $\partial f = \frac{d}{dx} {f}_{1}$.
 - Where ${f}_{2} \left( x \right) > {f}_{1} \left( x \right)$ then $\partial f = \frac{d}{dx} {f}_{2}$.
 - Where ${f}_{1} \left( x \right) = {f}_{2} \left( x \right)$ then $\partial f \in \left[ \frac{d}{dx} {f}_{1}, \frac{d}{dx} {f}_{2}\right]$.

Specifically, it can be _convex hull_ / _convex combination_ of the gradients of the functions which achieve the maximum. Hence:

$$ \partial \max \boldsymbol{A} \boldsymbol{x} + \boldsymbol{b} = \sum_{i \in \mathcal{I}} {\lambda}_{i} \boldsymbol{A} \left[ i, : \right] $$

Where:

 - ${\lambda}_{i} \geq 0, \; \sum_{i} {\lambda}_{i} = 1$ - Convex Combination.
 - $\mathcal{I} = \left\{ i \mid \boldsymbol{A} \left[ i, : \right] \boldsymbol{x} + \boldsymbol{b} \left[ i \right] = f \left( \boldsymbol{x} \right) \right\}$ - Set of functions which achieves the maximum.


 * <font color='brown'>(**#**)</font> A simple example for such sets is the mean or a single choice.

---

In [9]:
# Sub Gradient Function

#===========================Fill This===========================#
# 1. Implement the sub gradient function. 
#    Given a vector of `vX` it returns the gradient at `vX`.
# 2. The implementation should be using a Lambda Function.
# !! Pay attention to the output dimensions.

hSubGradFun = lambda vX: np.mean(mA[(mA @ vX + vB) == np.max(mA @ vX + vB), :], axis = 0)
#===============================================================#

### Sub Gradient Method

This section implement the _Sub Gradient Method_: $\boldsymbol{x}_{k + 1} = \boldsymbol{x}_{k} - {\mu}_{k} \partial f \left( \boldsymbol{x}^{k} \right)$.

 * <font color='brown'>(**#**)</font> The _Sub Gradient Method_ is not a _descent_ method.
 * <font color='brown'>(**#**)</font> The convergence of the method is proven for the cases:
   - $\sum_{k = 1}^{\infty} {\mu}_{k}^{2} < \infty, \sum_{k = 1}^{\infty} {\mu}_{k} = \infty$: Such as ${\mu}_{k} = \frac{1}{k}$.
   - $\lim_{k \to \infty} {\mu}_{k} = 0, \sum_{k = 1}^{\infty} {\mu}_{k} = \infty$: Such as ${\mu}_{k} = \frac{1}{\sqrt{k}}$.

In [10]:
# Sub Gradient Method

vX = np.zeros(numCols)
vG = np.empty(numCols)

lX = [np.copy(vX)]

#===========================Fill This===========================#
# 1. Implement the sub gradient method. 
# !! You may choose the step size policy.

for ii in range(1, numIterations):
    vG  = hSubGradFun(vX) #<! The sub gradient
    μ   = 1 / math.sqrt(ii) #<! The step size
    vX -= μ * vG #<! Optimization step

    lX.append(np.copy(vX))
#===============================================================#


* <font color='red'>(**?**)</font> What's the motivation for using `np.copy()`?

### The DCP Solution

This section solves the problem using a DCP solver.

In [None]:
# DCP Solution

#===========================Fill This===========================#
# 1. Formulate the problem in CVXPY.  
#    Use `vXRef` for the optimal argument.
# !! You may find `cp.max()` useful.

# Model Data
vXRef = cp.Variable(numCols)

# Model Problem
cpObjFun = cp.Minimize(cp.max(mA @ vXRef + vB)) #<! Objective Function
cpConst  = [] #<! Constraints
oCvxPrb  = cp.Problem(cpObjFun, cpConst) #<! Problem

oCvxPrb.solve(solver = cp.CLARABEL)
#===============================================================#

vXRef = vXRef.value

assert (oCvxPrb.status == 'optimal'), 'The problem is not solved.'
print('Problem is solved.')

assertCond = abs(hObjFun(vXRef) - hObjFun(lX[-1])) <= (ε * max(abs(hObjFun(vXRef)), ε))
assert assertCond, f'The optimization calculation deviation exceeds the threshold {ε}'

print('The implementation is verified')

 * <font color='brown'>(**#**)</font> The _Sub Gradient Method_ is a slow method. Hence the $\epsilon$ value is very low and the number of iterations is rather high.

In [12]:
# Solution Analysis

objValRef   = hObjFun(vXRef)
vObjVal     = np.empty(numIterations)
vArgErr     = np.empty(numIterations)

for ii in range(numIterations):
    vObjVal[ii] = hObjFun(lX[ii])
    vArgErr[ii] = np.linalg.norm(lX[ii] - vXRef)

vObjVal = 20 * np.log10(np.abs(vObjVal - objValRef) / max(np.abs(objValRef), np.sqrt(np.spacing(1.0))))
vArgErr = 20 * np.log10(np.abs(vArgErr) / max(np.linalg.norm(vXRef), np.sqrt(np.spacing(1.0))))

In [None]:
# Display Results

hF, hA = plt.subplots(figsize = (12, 6))
hA.plot(range(numIterations), vObjVal, lw = 2, label = 'Objective Function')
hA.plot(range(numIterations), vArgErr, lw = 2, label = 'Argument Error')
hA.set_xlabel('Iteration Index')
hA.set_ylabel('Relative Error [dB]')
hA.set_title('Sub Gradient Method Convergence')

hA.legend();

* <font color='red'>(**?**)</font> Explain the "noisy" graphs.