[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Optimization Methods

## SVD & Linear Least Squares - Regularized Least Squares

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 12/11/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0012LinearFitL1.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

import numba

# Machine Learning

# Optimization

# Image Processing / Computer Vision

# Miscellaneous
import math
from platform import python_version
import random
import time

# Typing
from typing import Callable, List, Optional, Tuple, Union

# Visualization
import matplotlib.pyplot as plt
import plotly.graph_objects as go

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 640 # 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [None]:
# Course Packages


In [None]:
# Auxiliary Functions

def PartitionTrainTest( numSamples: int, trainRatio: float ) -> Tuple[np.ndarray, np.ndarray]:

    numTrainSamples = round(trainRatio * numSamples)
    vTrainIdx       = np.sort(np.random.choice(numSamples, numTrainSamples))
    vTestIdx        = np.setdiff1d(range(numSamples), vTrainIdx, assume_unique = True)

    return vTrainIdx, vTestIdx


In [None]:
# Parameters

# Data
polyDeg     = 6
σ           = 0.5
numSamples  = 100
gridMinVal  = 0
gridMaxVal  = 1.5

trainDataRatio = 0.15

# Model
vλ          = np.linspace(0, 50, 5000) / numSamples
vPolyDeg    = np.arange(4, 21)


## Regularized Least Squares

In general the [Regularized Least Squares](https://en.wikipedia.org/wiki/Regularized_least_squares) model is given by:

$$ \arg \min_{ \boldsymbol{x} } \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} + \lambda r \left( \boldsymbol{x} \right) $$

Where $\lambda$ is the regularization factor and $r \left( \cdot \right)$ is the regularizer.    

The [Tikhonov Regularization](https://en.wikipedia.org/wiki/Ridge_regression) (Also known as _Ridge Regression_) is given by:

$$ \arg \min_{ \boldsymbol{x} } \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} + \frac{\lambda}{2} {\left\| \boldsymbol{x} \right\|}_{2}^{2} $$

One motivation of the regularization is to simplify the model so it won't overfit the data and will generalize to other realizations of the model.


* <font color='brown'>(**#**)</font> The motivation for the regualrization can be interpreted in many ways: Bayesian Prior (Gaussian, Laplace, etc...), Model (Sparse, Shifted), Kernel, etc...



## Generate Data


The data generates both the train and the test data.

In [None]:
# Generate / Load the Data

# The whole data
vARef = np.linspace(gridMinVal, gridMaxVal, numSamples)
mARef = np.power(vARef[:, None], np.arange(polyDeg + 1)[None, :])

vX = 1 * np.random.randn(polyDeg + 1)

vZ = mARef @ vX
vN = σ * np.random.randn(numSamples)
vY = vZ + vN

mAModel = np.power(vARef[:, None], np.arange(np.max(vPolyDeg) + 1)[None, :])

vIdxTrain, vIdxTest = PartitionTrainTest(numSamples, trainDataRatio)
mA      = mAModel[vIdxTrain, :]
vB      = vY[vIdxTrain]
mATest  = mAModel[vIdxTest, :]
vBTest  = vY[vIdxTest]


In [None]:
# Display Data 

hF, hA = plt.subplots(figsize = (10, 6))
hA.plot(vARef, vZ, linewidth = 2, label = 'Data Model')
hA.scatter(vARef, vY, s = 20, c = 'm', label = 'Data Samples')
hA.set(xlabel = 'Sample Index', ylabel = 'Sample Value', title = f'Model and Noisy Samples, Polynomial Degree: {polyDeg}')

hA.legend();

* <font color='red'>(**?**)</font> Does the support have an effect on the estimation performance? Think the affine model with 2 samples.

In [None]:
# Train and Test

hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 6))
vHa = vHa.flat

hA = vHa[0]
hA.plot(vARef, vZ, linewidth = 2, label = 'Data Model')
hA.scatter(vARef[vIdxTrain], vY[vIdxTrain], s = 20, c = 'm', label = 'Train Samples')
hA.set(xlabel = 'Sample Index', ylabel = 'Sample Value', title = f'Train: Model and Noisy Samples, Polynomial Degree: {polyDeg}')

hA.legend();

hA = vHa[1]
hA.plot(vARef, vZ, linewidth = 2, label = 'Data Model')
hA.scatter(vARef[vIdxTest], vY[vIdxTest], s = 20, c = 'm', label = 'Train Samples')
hA.set(xlabel = 'Sample Index', ylabel = 'Sample Value', title = f'Test: Model and Noisy Samples, Polynomial Degree: {polyDeg}')

hA.legend();

* <font color='red'>(**?**)</font> For Least Squares, what will happen when $p \to N$ where $N$ is the number of samples?

## Polynomial Degree vs. Regularization Factor



In [None]:
# Polynomial Degree vs. Regularization Factor

mZTrain = np.zeros(shape = (len(vPolyDeg), len(vλ)))
mZTest  = np.zeros(shape = (len(vPolyDeg), len(vλ)))

for jj in range(len(vλ)):
    λ = vλ[jj]
    for ii in range(len(vPolyDeg)):
        paramP  = vPolyDeg[ii]
        mAP     = mA[:, :(paramP + 1)]
        if λ == 0.0:
            vXRls, *_ = np.linalg.lstsq(mAP, vB, rcond = None)
        else:
            vXRls = sp.linalg.solve(mAP.T @ mAP + λ * np.eye(paramP + 1), mAP.T @ vB)

        mZTrain[ii, jj] = math.log10(np.mean(np.square(mAP @ vXRls - vB)))
        mZTest[ii, jj]  = math.log10(np.mean(np.square(mATest[:, :(paramP + 1)] @ vXRls - vBTest)))

* <font color='red'>(**?**)</font> Numerically, what's the effect of $\lambda$.
* <font color='red'>(**?**)</font> How can the loop be optimized for faster calculation?
* <font color='blue'>(**!**)</font> Optimize the solution of the system `sp.linalg.solve(mAP.T @ mAP + λ * np.eye(paramP + 1), mAP.T @ vB)` using its properties.
* <font color='brown'>(**#**)</font> SciKit Learn offer a model to solve the problem with [`Ridge`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html).

## Display Results

In [None]:
# Display Data 

hF = go.Figure(data = [go.Surface(z = mZTrain, x = vλ, y = vPolyDeg)])
hF.update_layout(
    title = dict(text = 'Estimation RMSE - Train'), autosize = False,
    width = 800, height = 500,
    margin = dict(l = 25, r = 25, b = 25, t = 25),
)
hF.update_scenes(
    xaxis_title_text = 'λ',
    yaxis_title_text = 'Polynomail Degree',
    zaxis_title_text = 'RMSE [dB]',
)
hF.show()

In [None]:
# Display Data 

hF = go.Figure(data = [go.Surface(z = mZTest, x = vλ, y = vPolyDeg)])
hF.update_layout(
    title = dict(text = 'Estimation RMSE - Test'), autosize = False,
    width = 800, height = 500,
    margin = dict(l = 25, r = 25, b = 25, t = 25),
)
hF.update_scenes(
    xaxis_title_text = 'λ',
    yaxis_title_text = 'Polynomail Degree',
    zaxis_title_text = 'RMSE [dB]',
)
hF.show()

* <font color='red'>(**?**)</font> Explain the effect of the regularization. Specifically the connection with the polynomial degree.