[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Optimization Methods

## SVD & Linear Least Squares - Sequential Least Squares

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 12/11/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0012LinearFitL1.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

import numba

# Machine Learning

# Optimization

# Image Processing / Computer Vision

# Miscellaneous
import math
from platform import python_version
import random
import time

# Typing
from typing import Callable, List, Optional, Tuple, Union

# Visualization
import matplotlib.pyplot as plt

# Jupyter
from IPython import get_ipython
from ipywidgets import IntSlider, Layout, interact

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
vallToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 640 # 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [None]:
# Course Packages


In [None]:
# Auxiliary Functions

def SequentialLeastSquares(vX: np.ndarray, valB: float, vA: np.ndarray, mR: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    vRA = mR @ vA
    mR -= np.outer(vRA, vRA) / (1.0 + vA.T @ vRA)
    vK  = mR @ vA
    vX += vK * (valB - vA.T @ vX)

    return vX, mR


In [None]:
# Parameters

# Data
modelOrder      = 3 #<! Polynomial Order / Degree
numSamples      = 25
numSamplesInit  = 5
σ               = 2 #<! Noise STD


## Sequential Least Squares

The Sequential Least Squares deal with integrating new data samples in an optimal way computationally.  
Let $\boldsymbol{x}^{k}$ be the LS solution of $k$ data samples:

$$ \boldsymbol{x}_{k} = \arg \min_{\boldsymbol{x}} \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2}, \; \boldsymbol{b} \in \mathbb{R}^{k} $$

Given a new samples, ${b}_{k + 1}$, the updated LS solution is given by:

$$ \boldsymbol{x}_{k + 1} = \boldsymbol{x}_{k} + \boldsymbol{k}_{k + 1} \left( {b}_{k + 1} - \boldsymbol{a}_{k + 1}^{T} \boldsymbol{x}_{k} \right)  $$

Where
 - $\left( {b}_{k + 1} - \boldsymbol{a}_{k + 1}^{T} \boldsymbol{x}_{k} \right)$ - The prediction error.
 - $\boldsymbol{k}_{k + 1} = \boldsymbol{R}_{k + 1} \boldsymbol{a}_{k + 1}$ - The error gain.
 - $\boldsymbol{R}_{k + 1} = \boldsymbol{R}_{k} - \frac{ \boldsymbol{R}_{k} \boldsymbol{a}_{k} \boldsymbol{a}_{k} \boldsymbol{a}_{k}^{T} \boldsymbol{R}_{k} }{ 1 + \boldsymbol{a}_{k}^{T} \boldsymbol{R}_{k} \boldsymbol{a}_{k} }$.

The initialization is done by solving the LS problem on an initial batch of $n$ samples:
 - $\boldsymbol{R}_{n} = {\left( \boldsymbol{A}_{n}^{T} \boldsymbol{A}_{n} \right)}^{-1}$.
 - $\boldsymbol{x}_{n} = \boldsymbol{R}_{n} \boldsymbol{A}_{n}^{T} \boldsymbol{b}_{n}$.


</br>

* <font color='brown'>(**#**)</font> The method is often called [_Recursive Least Squares_](https://en.wikipedia.org/wiki/Recursive_least_squares_filter). Though there is a different Recursive Order Least Squares filter.
* <font color='brown'>(**#**)</font> The method can be extended for the case of updating a batch at once.
* <font color='brown'>(**#**)</font> For the regularized form, for solving $\arg \min_{\boldsymbol{x}} \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} + \lambda { \left\| \boldsymbol{C} \boldsymbol{x} \right\| }_{2}^{2}$, the only change needed is the initialization: $\boldsymbol{R}_{n} = {\left( \boldsymbol{A}_{n}^{T} \boldsymbol{A}_{n} + \lambda \boldsymbol{C}^{T} \boldsymbol{C} \right)}^{-1}$.
* <font color='brown'>(**#**)</font> For derivation see [Sequential Form of the Least Squares Estimator for Linear Least Squares Model](https://dsp.stackexchange.com/a/56670).


## Generate Data


The data model is polynomial.

In [None]:
# Generate / Load the Data

vA = np.linspace(0, 3, numSamples)
mA = np.power(vA[:, None], np.arange(modelOrder + 1)[None, :])
vX = 3 * np.random.randn(modelOrder + 1) #<! Parameters (Ground truth)
vN = σ * np.random.randn(numSamples)
vZ = mA @ vX #<! Model Data
vB = vZ + vN #<! Measurements


In [None]:
# Display Data 

hF, hA = plt.subplots(figsize = (10, 6))
hA.plot(vA, vZ, linewidth = 2, label = 'Data Model')
hA.scatter(vA, vB, s = 20, c = 'm', label = 'Data Samples')
hA.set(xlabel = 'Sample Index', ylabel = 'Sample Value', title = 'Model and Noisy Samples')

hA.legend();

* <font color='red'>(**?**)</font> How many parameters for the model? What's the data degree? Explain.

## The Least Squares Solution

This section calculates the solution for the Ordinary Least Squares model.

In [None]:
# Least Squares Solution

#===========================Fill This===========================#
# 1. Calculate the least squares solution.
# 2. The given data: `mA`, 'vB`. Name the solution `vXLS`.
# !! You may find `np.linalg.lstsq()` useful.

vXLS, *_ = np.linalg.lstsq(mA, vB, rcond = None)
#===============================================================#

* <font color='red'>(**?**)</font> Will the result be better than the sequential result?

## The Sequential Least Squares Solution

This section applies the _Sequential Least Squares_ method.

In [None]:
# Sequential Least Squares Solution

#===========================Fill This===========================#
# 1. Calculate the sequential least squares solution.
# 2. The given data: `mA`, 'vB`.
# 3. The initial solution and the iteration solution should be incorporated into `mXSLS`.
# !! You may find `SequnetialLeastSquares()` useful.

mXSLS = np.zeros(shape = (numSamples - numSamplesInit + 1, len(vX)))

# Initialization
mAn   = mA[:numSamplesInit, :]
mR    = np.linalg.pinv(mAn.T @ mAn)
vXSLS = mR @ (mAn.T @ vB[:numSamplesInit])

kk = 0
mXSLS[kk] = np.copy(vXSLS)
for ii in range(numSamplesInit, numSamples):
    kk       += 1
    valB      = vB[ii]
    vAk       = mA[ii]
    vXSLS, mR = SequentialLeastSquares(vXSLS, valB, vAk, mR)
    mXSLS[kk] = np.copy(vXSLS)
#===============================================================#


* <font color='brown'>(**#**)</font> The matrix $\boldsymbol{R}$ is the LS Estimator Covariance Matrix.
* <font color='brown'>(**#**)</font> The model can be extended for the _Weighted Least Squares_ model with the initialization $\boldsymbol{R}_{n} = {\left( \boldsymbol{A}_{n}^{T} \boldsymbol{W} \boldsymbol{A}_{n} \right)}^{-1}, \; \boldsymbol{x}_{n} = \boldsymbol{R}_{n} \boldsymbol{A}_{n}^{T} \boldsymbol{W} \boldsymbol{b}_{n}$.
* <font color='red'>(**?**)</font> At `kk = 2`, what is the solution meaning? What is it equivalent to?
* <font color='blue'>(**!**)</font> Adjust the above to the model of Ridge Regression with regularization term $\frac{\alpha}{2} {\left\| \boldsymbol{x} \right\|}_{2}^{2}$.

## Display Results

In [None]:
# Display Data Function

def DisplaySLS( dataIdx: int, vA: np.ndarray, mA: np.ndarray, vB: np.ndarray, mXSLS: np.ndarray, vXLS: np.ndarray, vZ: np.ndarray, numSamplesInit: int ) -> plt.Axes:

    hF, hA = plt.subplots(figsize = (8, 6))

    vS = 20 * np.ones_like(vB)
    vS[:(numSamplesInit + dataIdx)] *= 3

    hA.plot(vA, vZ, linewidth = 2, label = 'Data Model')
    hA.scatter(vA, vB, s = vS, c = 'm', label = 'Data Samples')
    hA.plot(vA, mA @ vXLS, linewidth = 1.5, label = 'Least Squares Estimator')
    hA.plot(vA, mA @ mXSLS[dataIdx], linewidth = 1.5, label = 'Sequential Least Squares Estimator')
    
    hA.set(xlabel = 'Sample Index', ylabel = 'Sample Value', title = f'SLS Estimator with {numSamplesInit + dataIdx} Samples');
    
    hA.legend();
    
    return hA

    

In [None]:
# Auxiliary Function

hDisplaySLS = lambda dataIdx: DisplaySLS(dataIdx, vA, mA, vB, mXSLS, vXLS, vZ, numSamplesInit)

In [None]:
# Display Data 

kkSlider = IntSlider(value = 0, min = 0, max = len(mXSLS) - 1, step = 1, description = 'Iteration Index', readout = True, readout_format = 'd', layout = Layout(width = '45%'))
interact(hDisplaySLS, dataIdx = kkSlider);


* <font color='red'>(**?**)</font> Why is the estimator so bad at initialization and first samples?