[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io/)

# AI Program

## SVD & Linear Least Squares - Solving Multiple LS with the Same Model

Solving:

$$ \boldsymbol{x}_{i} = \arg \min_{ \boldsymbol{x} } \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b}_{i} \right\|}_{2}^{2}, \; i = 1, 2, \ldots $$

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 10/02/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0015SolveLinearLS.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Miscellaneous
import os
import math
from platform import python_version
import random

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
from matplotlib.colors import LogNorm, Normalize, PowerNorm
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
%matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [None]:
# Course Packages


In [None]:
# Auxiliary Functions


In [None]:
# Parameters

numRows = 500
numCols = 100
numIn   = 1000 #<! Number of inputs

## Solving Multiple Linear Systems

There are cases where a linear system, with the same model matrix $\boldsymbol{A}$, is solved multiple times:

$$ \boldsymbol{x}_{i} = \arg \min_{ \boldsymbol{x} } \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b}_{i} \right\|}_{2}^{2}, \; i = 1, 2, \ldots $$

In most solvers the solution is basically:

1. Find the optimal decomposition based on the properties of the model matrix.
2. Solve the system using the decomposition.

This notebook illustrates efficient methods to deal with such case.

* <font color='brown'>(**#**)</font> The problem above is equivalent to $\arg \min_{\boldsymbol{X}} \frac{1}{2} {\left\| \boldsymbol{A} \boldsymbol{X} - \boldsymbol{B} \right\|}_{F}^{2}$ where $\boldsymbol{x}_{i}, \, \boldsymbol{b}_{i}$ are the columns of $\boldsymbol{X}, \, \boldsymbol{B}$. The motivation for above is the case the data is too large or the timing of the data (Each $\boldsymbol{b}_{i}$ as a different time).

## Generate Data

In [None]:
# Generate / Load the Data

mA = np.random.randn(numRows, numCols)
mB = np.random.randn(numRows, numIn)

mX = np.zeros(shape = (numCols, numIn))

In [None]:
# Reference Solution

mXRef = np.linalg.lstsq(mA, mB, rcond = None)

## Naive Solution

In [None]:
%%timeit
# Solving Using LS Solver
# SciPy / NumPy 's solver for least squares problem is `lstsq()`.

for ii in range(numIn):
    mX[:, ii] = np.linalg.lstsq(mA, mB[:, ii], rcond = None)[0]

## Solution by Normal Equations

The normal equations are given by:

$$ \boldsymbol{A}^{T} \boldsymbol{A} \boldsymbol{x} = \boldsymbol{A}^{T} \boldsymbol{b} $$

Hence finding $\boldsymbol{x}$ is by solving a system defined by an SPSD (_Symmetric Positive Semi Definite_) matrix.  

* <font color='brown'>(**#**)</font> In case $\boldsymbol{A}$ is full rank, the matrix is SPD (_Symmetric Positive Definite_) which even faster to decompose.


In [None]:
%%timeit mC = mA.T @ mA
# Solving Using SPD Decomposition
# SciPy / NumPy 's `solve()` only supports full rank matrices.  
# Hence this code works only for SPD matrices (For non SPD use `lstsq()` or build manual solver based on `ldl()`).

for ii in range(numIn):
    mX[:, ii] = sp.linalg.solve(mC, mA.T @ mB[:, ii], assume_a = 'pos')

* <font color='brown'>(**#**)</font> While the normal equations are efficient (Especially when $m \ll n$), their main disadvantage is the increased sensitivity (Condition number).
* <font color='blue'>(**!**)</font> Measure the time with `assume_a = 'gen'` (The default) and `assume_a = 'sym'`.

## Solution by Pre Process of the Decomposition

This applies the proper decomposition to the matrix and reuse it to solve the problem.  

A simple guideline is to follow [MATLAB's `mldivide()` documentation](https://www.mathworks.com/help/matlab/ref/mldivide.html):

![](https://i.imgur.com/adlNcBY.png)

* <font color='brown'>(**#**)</font> This approach could also work for the previous approach with the LDL (Bunch Kaufman factorization) or Cholesky decomposition.

In [None]:
%%timeit mQ, mR = sp.linalg.qr(mA, mode = 'economic')
# Solving Using LU Decomposition
# One of the general decomposition for non square matrices is the LU decomposition.

for ii in range(numIn):
    mX[:, ii] = sp.linalg.solve_triangular(mR, mQ.T @ mB[:, ii], check_finite = False)

* <font color='brown'>(**#**)</font> Usually for $m \ll n$ the normal equations is faster (If using low level functions with minimal overhead).
* <font color='green'>(**@**)</font> Apply the trick of pre calculated decomposition to `mC`. Use _Cholesky_ under the assumption _SPD_ matrices.