[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Optimization Methods

## Convex Optimization - Constraint Optimization - Least Squares with Equality Constraints

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.001 | 04/12/2024 | Royi Avital | Added the $\frac{1}{2}$ factor to the LS formulation               |
| 1.0.000 | 27/09/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0012LinearFitL1.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Optimization
import autograd.numpy as anp
import autograd.scipy as asp
from autograd import grad
import cvxpy as cp

# Miscellaneous
import os
import math
from platform import python_version
import random

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
vallToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
%matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [None]:
# Course Packages


In [None]:
# Auxiliary Functions

from AuxFun import ConvMode
from AuxFun import ProxGradientDescent
from AuxFun import GenConvMtx1D


In [None]:
# Parameters

# Data
numCoeff    = 11
numSamples  = 110
noiseStd    = 0.005
convMode    = ConvMode.VALID


# Solver
μ             = 0.001 #<! Step Size
numIterations = 2500

# Verification
ε      = 1e-6 #<! Error threshold

## Least Squares (Linear Fit) with Equality Constraints

The _Linear Fit_ / _Least Squares_ with Equality Constraints is given by:

$$\begin{align}
\arg \min_{\boldsymbol{x}} \quad & \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} \\
\text{subject to} \quad & \begin{aligned} 
\boldsymbol{C} \boldsymbol{x} & = \boldsymbol{d} \\
\end{aligned}
\end{align}$$

## Least Squares with Sum Constraints

An _High Pass Filter_ (HPF) is a filter decays low frequencies.  
A simplistic way to force an HPF is to force it will vanish the DC component.  
This can be done by forcing the sum of its coefficients to be zero.

This notebook deals with estimating an High Pass Filter (HPF) using Projected Gradient Descent.  
The problems equivalent to a Least Squares problem with a linear equality constraint.

This notebook shows how to solve the problem:

$$\begin{align}
\arg \min_{\boldsymbol{x}} \quad & \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} \\
\text{subject to} \quad & \begin{aligned} 
\sum {x}_{i} & = 0 \\
\end{aligned}
\end{align}$$

The notebook:

 - Calculates a solution using DCP Solver (Reference).
 - Calculates a solution using _Projected Gradient Descent_.

* <font color='brown'>(**#**)</font> Constraints are useful for undetermined case where the number of solutions is infinite.
* <font color='red'>(**?**)</font> Formulate the constraint as a dot product (Vector).


## Generate Data


The data model is a stream of samples going through an LTI system.  
In our case the LTI system is built by an HPF filter:

$$ \boldsymbol{y} = \operatorname{conv} \left( \boldsymbol{x}, \boldsymbol{h} \right) + \boldsymbol{n} $$

Where:
 - $\boldsymbol{x}$ - The data samples.
 - $\boldsymbol{h}$ - The filter coefficients.
 - $\boldsymbol{n}$ - The white noise samples (AWGN).

Since the model is linear is can be written in a matrix form:

$$ \boldsymbol{y} = \boldsymbol{X} \boldsymbol{h} + \boldsymbol{n} $$

Where $\boldsymbol{X}$ is the convolution matrix form of the samples.

* <font color='brown'>(**#**)</font> Since the model is a convolution (LTI) system, the matrix $\boldsymbol{X}$ is a [Toeplitz Matrix](https://en.wikipedia.org/wiki/Toeplitz_matrix).
* <font color='brown'>(**#**)</font> Read on [`numpy.convolve()`](https://numpy.org/doc/stable/reference/generated/numpy.convolve.html) and [`scipy.signal.convolve()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve.html). Pay attention to the `method` option in [`scipy.signal.convolve()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve.html).
* <font color='brown'>(**#**)</font> Unlike MATLAB, [`numpy.convolve()`](https://numpy.org/doc/stable/reference/generated/numpy.convolve.html), is commutative for the case of `valid` and `same`. The reason is the switch the signals so the kernel is always the shorter one.

In [None]:
# Generate / Load the Data

vX = sp.signal.sawtooth(range(numSamples))
for ii in range(2, 11):
    vX = vX + sp.signal.sawtooth(range(ii, numSamples + ii))

# High Pass Filter (HPF): Zero mean.
vHRef  = np.random.randn(numCoeff)
vHRef -= np.mean(vHRef) #<! Zero Mean -> High Pass Filter (Though the worst one!)

numSamplesY = numSamples - numCoeff + 1; #<! Output of the output ov `valid` convolution

vN = noiseStd * np.random.randn(numSamplesY)
vY = np.convolve(vX, vHRef, 'valid') + vN



In [None]:
# Data for Analysis

mH = np.zeros(shape = (numCoeff, numIterations)) #<! Initialization is the zero vector
vObjVal = np.zeros(numIterations)

In [None]:
# Display the Data

hF, hA = plt.subplots(figsize = (12, 6))
hA.plot(range(numSamplesY), vY, lw = 2, label = 'Samples')
hA.set_xlabel('Sample Index')
hA.set_ylabel('Sample Value')
hA.set_title('Measured Data')

hA.legend();

## Convolution Matrix

This section transforms the data $\boldsymbol{x}$ (`vX`) into a convolution matrix $\boldsymbol{X}$ (`mX`) such that $\boldsymbol{y} \approx \boldsymbol{X} \boldsymbol{h}$ (`vY ≈ mX @ vHRef`).

* <font color='brown'>(**#**)</font> The problem can be solved without generating the explicit matrices using operators.
* <font color='red'>(**?**)</font> If $\boldsymbol{X}$ stands for the convolution operator. What's the meaning of the adjoint $\boldsymbol{X}^{T}$?


In [None]:
# Convolution Matrix

mX = GenConvMtx1D(vX, numCoeff, convMode = convMode)

assertCond = np.linalg.norm(mX @ vHRef - np.convolve(vX, vHRef, 'valid'), np.inf) <= ε
assert assertCond, f'The matrix convolution deviation exceeds the threshold {ε}'
print('The matrix convolution implementation is verified')

## Objective Function

The objective function:

$$\begin{align}
\arg \min_{\boldsymbol{x}} \quad & \frac{1}{2} {\left\| \boldsymbol{h} \ast \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} \\
\text{subject to} \quad & \begin{aligned} 
\sum {h}_{i} & = 0 \\
\end{aligned}
\end{align}$$


* <font color='red'>(**?**)</font> Write the problem in a matrix form.


In [None]:
# Objective Function

hObjFun = lambda vH: 0.5 * np.sum(np.square(np.convolve(vX, vH, 'valid') - vY))

### Gradient Function

The gradient of the objective can be derived from the matrix form of the problem. 

* <font color='red'>(**?**)</font> Derive the gradient of the objective function.

In [None]:
# The Gradient Function

hGradFun   = lambda vH: mX.T @ (mX @ vH - vY)
# Auto Grad only support SciPy's `convolve()`
hAutoGradF = grad(lambda vH: (0.5 * anp.sum(anp.square(asp.signal.convolve(vX, anp.array(vH), mode = 'valid') - anp.array(vY)))))

In [None]:
vT = np.random.randn(numCoeff)

vG = hAutoGradF(vT)
assertCond = np.linalg.norm(hGradFun(vT) - vG, np.inf) <= (ε * np.linalg.norm(vG))
assert assertCond, f'The gradient calculation deviation exceeds the threshold {ε}'

print('The gradient implementation is verified')

### Projection Function

The projection function should project a vector onto the set $\mathcal{H} = \left\{ \boldsymbol{h} \mid \boldsymbol{h}^{T} \boldsymbol{1} = 0 \right\}$.

The projection function is the solution of the following optimization problem:

$$\begin{align}
\arg \min_{\boldsymbol{x}} \quad & \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} \\
\text{subject to} \quad & \begin{aligned} 
\boldsymbol{x}^{T} \boldsymbol{1} & = 0 \\
\end{aligned}
\end{align}$$

* <font color='red'>(**?**)</font> Derive the projection function.



In [None]:
# The Projection Function

hProjFun = lambda vY, λ: vY - np.mean(vY)

In [None]:
# Projection Function
# This section verify the projection function using CVX

# Model Data
vXX = cp.Variable(numCoeff)
vYY = np.random.randn(numCoeff)

# Model Problem
cpObjFun = cp.Minimize(0.5 * cp.sum_squares(vXX - vYY)) #<! Objective Function
cpConst = [cp.sum(vXX) == 0] #<! Constraints
oCvxPrb = cp.Problem(cpObjFun, cpConst) #<! Problem

oCvxPrb.solve(solver = cp.CLARABEL)

assert (oCvxPrb.status == 'optimal'), 'The problem is not solved.'
print('Problem is solved.')

assertCond = np.linalg.norm(vXX.value - hProjFun(vYY, 0.0), np.inf) <= (ε * np.linalg.norm(vXX.value))
assert assertCond, f'The projection calculation deviation exceeds the threshold {ε}'

print('The projection implementation is verified')

## Projected Gradient Descent

The _Projected Gradient Descent_ is a generalization of the _Gradient Descent_ method for the case:

$$\begin{align}
\arg \min_{\boldsymbol{x}} \quad & f \left( \boldsymbol{x} \right) \\
\text{subject to} \quad & \begin{aligned} 
\boldsymbol{x} & \in \mathcal{C} \\
\end{aligned}
\end{align}$$

Where $f \left( \cdot \right)$ is a _smooth convex_ function and the projection onto $\mathcal{C}$ can be calculated efficiently.

In [None]:
# Run Projected Gradient Descent

vH = np.zeros(numCoeff)
oProjGrad = ProxGradientDescent(vH, hGradFun, μ, 0.0, hProxFun = hProjFun, useAccel = False)
lH = oProjGrad.ApplyIterations(numIterations)


In [None]:
# Solution Analysis

objValRef   = hObjFun(vHRef)
vObjVal     = np.empty(numIterations)
vArgErr     = np.empty(numIterations)

for ii in range(numIterations):
    vObjVal[ii] = hObjFun(lH[ii])
    vArgErr[ii] = np.linalg.norm(lH[ii] - vHRef)

vObjVal = 20 * np.log10(np.abs(vObjVal - objValRef) / max(np.abs(objValRef), np.sqrt(np.spacing(1.0))))
vArgErr = 20 * np.log10(np.abs(vArgErr) / max(np.linalg.norm(vHRef), np.sqrt(np.spacing(1.0))))

In [None]:
# Display Results

hF, hA = plt.subplots(figsize = (12, 6))
hA.plot(range(numIterations), vObjVal, lw = 2, label = 'Objective Function')
hA.plot(range(numIterations), vArgErr, lw = 2, label = 'Argument Error')
hA.set_xlabel('Iteration Index')
hA.set_ylabel('Relative Error [dB]')
hA.set_title('Projected Gradient Convergence')

hA.legend();

* <font color='red'>(**?**)</font> Why do we have this dip in the graph and then up? Think about the reference.
* <font color='red'>(**?**)</font> Solve the closed form solution for the Least Squares with Equality Constraints.   
  Write the Normal Equations of the Lagrangian and Constraints as a combined linear system.