[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Optimization Methods

## Convex Optimization - Smooth Optimization - Coordinate Descent

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 28/09/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0008ObjectiveFunction.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Miscellaneous
import os
import math
from platform import python_version
import random

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
from matplotlib.colors import LogNorm, Normalize, PowerNorm
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
%matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [None]:
# Course Packages

from AuxFun import StepSizeMode
from AuxFun import ProxGradientDescent

from NumericDiff import DiffMode
from NumericDiff import CalcFunGrad


In [None]:
# Auxiliary Functions



In [None]:
# Parameters

# Data
numRows = 20
numCols = numRows

# Numerical Differentiation
diffMode    = DiffMode.CENTRAL
ε           = 1e-6

# Solver
stepSizeMode    = StepSizeMode.ADAPTIVE
μ               = 0.0005
numIterations   = 10000

## Coordinate Descent

The concept of _Coordinate Descent_ (CD) is decomposing the problem into a set of 1D problems.

* <font color='brown'>(**#**)</font> The CD is the _Gradient Descent_ with respect to the ${L}_{1}$ norm.
* <font color='brown'>(**#**)</font> There are some problems (See LASSO) where the CD approach is the most efficient.

### Least Squares by Coordinate Descent

The problem is given by:

$$ \arg \min_{\boldsymbol{x}} \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} $$



## Generate Data


The model is built with a _Symmetric Positive Definite_ (SPD) matrix.

In [None]:
# Generate / Load the Data

# Symmetric PD Matrix
mA = np.random.randn(numRows, numCols)
mA = mA.T @ mA + (0.95 * np.eye(numRows))
mA = mA + mA.T

vB = np.random.randn(numRows)

# Objective Function
hObjFun = lambda vX: 0.5 * np.sum(np.square(mA @ vX - vB))


## Analysis

### Gradient Function

The gradient of the objective function is given by:

$$ f \left( \boldsymbol{x} \right) = \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} \implies \nabla_{f} \left( \boldsymbol{x} \right) = \boldsymbol{A}^{T} \left( \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right) $$

The coordinate gradient is the Gradient along a specific axis.

In [None]:
# The Gradient Function

#===========================Fill This===========================#
# 1. Implement the coordinate gradient as a Lambda function of `vX` and the index `jj`.
# !! You may pre calculate values for efficient calculation.

mAA = mA.T @ mA
vAb = mA.T @ vB

hGradFun = lambda vX, jj: np.dot(mAA[jj, :], vX) - vAb[jj]

#===============================================================#

In [None]:
# Verify the Gradient Function

vX = np.random.randn(numCols)
vG = CalcFunGrad(vX, hObjFun, diffMode = diffMode, ε = ε)

ii  = np.random.randint(numCols)
vEi = np.zeros(numCols)
vEi[ii] = 1


assertCond = np.abs(np.dot(vG, vEi) - hGradFun(vX, ii)) <= (ε * np.abs(vG[ii])) #<! Can we use the implicit index instead of dot?
assert assertCond, f'The gradient calculation deviation exceeds the threshold {ε}'

print('The gradient function implementation is verified')

### Coordinate Descent

Implement the coordinate descent algorithm.

* <font color='brown'>(**#**)</font> The CD algorithm makes sense when the directional derivative can be calculated efficiently.

In [None]:
# Coordinate Descent

#===========================Fill This===========================#
# 1. Implement the coordinate gradient descent function.
# !! You may pre calculate values for efficient calculation.

def CoordinateDescent( mX: np.ndarray, hGradFun: Callable, μ: float ) -> np.ndarray:
    """
    Input:
      - mX                -   2D Matrix.
                              The first column is the initialization.
                              Structure: Matrix (dataDim * numIterations).
                              Type: 'Single' / 'Double'.
                              Range: (-inf, inf).
      - hGradFun          -   The Gradient Function.
                              A function to calculate the gradient.
                              Its input is `vX`, `jj` for the location 
                              of the gradient and the component index.
                              Structure: NA.
                              Type: Callable.
                              Range: NA.
      - μ                 -   The Step Size.
                              The descent step size.
                              Structure: Scalar.
                              Type: 'Single' / 'Double'.
                              Range: (0, inf).
    Output:
      - mX                -   2D Matrix.
                              All iterations results.
                              Structure: Matrix (dataDim * numIterations).
                              Type: 'Single' / 'Double'.
                              Range: (-inf, inf).
    """

    numComp         = np.size(mX, 0)
    numIterations   = np.size(mX, 1)

    for ii in range(1, numIterations):
        mX[:, ii] = mX[:, ii - 1]
        for jj in range(numComp):
            valG = hGradFun(mX[:, ii], jj) #<! Directional Derivative
            mX[jj, ii] -= μ * valG
    
    return mX

#===============================================================#

In [None]:
# Solve by Coordinate Descent

# Define Data
mX      = np.zeros(shape = (numCols, numIterations))
vObjVal = np.empty(numIterations)
vArgErr = np.empty(numIterations)

# Optimization
mX = CoordinateDescent(mX, hGradFun, μ)

In [None]:
# Validation of Solution

# Reference Solution
vXRef, *_  = np.linalg.lstsq(mA, vB, rcond = None) #<! Equivalent to MATLAB's `\` (`mldivide()`)
objValRef  = hObjFun(vXRef)

for ii in range(numIterations):
    vObjVal[ii] = hObjFun(mX[:, ii])
    vArgErr[ii] = np.linalg.norm(mX[:, ii] - vXRef)

vObjVal = 20 * np.log10(np.abs(vObjVal - objValRef) / max(np.abs(objValRef), np.sqrt(np.spacing(1.0))))
vArgErr = 20 * np.log10(np.abs(vArgErr) / max(np.linalg.norm(vXRef), np.sqrt(np.spacing(1.0))))

In [None]:
# Display Results

hF, hA = plt.subplots(figsize = (12, 6))
hA.plot(range(numIterations), vObjVal, lw = 2, label = 'Objective Function')
hA.plot(range(numIterations), vArgErr, lw = 2, label = 'Argument Error')
hA.set_xlabel('Iteration Index')
hA.set_ylabel('Relative Error [dB]')
hA.set_title('Coordinate Descent Convergence')

hA.legend();

* <font color='red'>(**?**)</font> Check the sensitivity to the step size by checking larger / smaller step size.
* <font color='blue'>(**!**)</font> Replace the use of the `CoordinateDescent()` function with the `CoordinateDescent` class (`from AuxFun import CoordinateDescent`).  
Use the adaptive step size mode and check the sensitivity to the step size.