[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Optimization Methods

## Essential Linear Algebra - Numerical Differentiation

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 03/12/2023 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/OptimizationMethods/2023_12/0001NumericalDiff.ipynb)

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```matlab
someVar    = 2; %<! Notation for a variable
vVector    = rand(4, 1); %<! Notation for 1D array
mMatrix    = rand(4, 3); %<! Notation for 2D array
tTensor    = rand(4, 3, 2, 3); %<! Notation for nD array (Tensor)
cCell      = cell(3, 1); %<! Notation for a cell array
sStructure = struct(); %<! Notation for a structure
taTable    = table(); %<! Notation for a table
hObj       = axes(); %<! Notation for an object / handler / function handler
```

## Configuration

### Configuration Parameters

In [None]:
%% Configuration Parameters

subStreamNumberDefault = 79;

run('InitScript.m');

figureIdx           = 0;
figureCounterSpec   = '%04d';

generateFigures = ON;

imatlab_export_fig('print-svg');

### Constants

In [None]:
%% Constants

DIFF_MODE_FORWARD   = 1;
DIFF_MODE_BACKWARD  = 2;
DIFF_MODE_CENTRAL   = 3;
DIFF_MODE_COMPLEX   = 4;


### Parameters

In [None]:
%% Parameters


## Numerical Differentiation

This notebooks explores the use of [_Numerical Differentiation_](https://en.wikipedia.org/wiki/Numerical_differentiation) to caclulate the gradient of a function.

The gradient of a multivariate scalar function, $f : \mathbb{R}^{n} \to \mathbb{R}$, is given by:

$$ {{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} = \lim_{t \to 0} \frac{ f \left( \boldsymbol{x} + t \boldsymbol{e}_{i} \right) - f \left( \boldsymbol{x} \right) }{t} $$

Where $\boldsymbol{e}_{i} = \left[ 0, 0, \ldots, 0, \underbrace{1}_{\text{i -th index}}, 0, \ldots, 0 \right]$. 

This can be approximated by [_Finite Difference_](https://en.wikipedia.org/wiki/Finite_difference) with specific [_Finite Difference Coefficient_](https://en.wikipedia.org/wiki/Finite_difference_coefficient).  
There 3 common approaches:

 - Forward: ${{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} \approx \frac{ f \left( \boldsymbol{x} + h \boldsymbol{e}_{i} \right) - f \left( \boldsymbol{x} \right) }{h}$.
 - Backward: ${{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} \approx \frac{ f \left( \boldsymbol{x} \right) - f \left( \boldsymbol{x} - h \boldsymbol{e}_{i} \right) }{h}$.
 - Central: ${{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} \approx \frac{ f \left( \boldsymbol{x} + h \boldsymbol{e}_{i} \right) - f \left( \boldsymbol{x} - h \boldsymbol{e}_{i} \right) }{2 h}$.


* <font color='brown'>(**#**)</font> The notebook use the `CalcFunGrad.m` file for the actual calculation.

### The Gradient of a Composition of a Linear Function and Element Wise Function

Compute the directional derivative $\nabla f \left( \boldsymbol{x} \right) \left[ \boldsymbol{h} \right]$ and the gradient $\nabla f \left( \boldsymbol{x} \right)$ of:

$$ f \left( \boldsymbol{x} \right) = {a}^{T} g \left[ \boldsymbol{x} \right] $$

Where $g \left[ \cdot \right]$ is an element wise function $g \left[ \boldsymbol{x} \right] = \begin{bmatrix} g \left( {x}_{1} \right) \\ g \left( {x}_{2} \right) \\ \vdots \\ g \left( {x}_{d} \right) \end{bmatrix} \in \mathbb{R}^{d}$.


* <font color='brown'>(**#**)</font> We'll be using $\left[ \cdot \right]$ as a notation for element wise functions.

The directional derivative of $g \left( \cdot \right)$ is given by:

$$ \nabla g \left( \boldsymbol{x} \right) = \lim_{t \to 0} \frac{g \left( \boldsymbol{x} + t \boldsymbol{h} \right) - g \left( \boldsymbol{x} \right)}{t} = \lim_{t \to 0} \frac{1}{t} \left( \begin{bmatrix} g \left( {x}_{1} + t {h}_{1} \right) \\ g \left( {x}_{2} + t {h}_{2} \right) \\ \vdots \\ g \left( {x}_{d} + t {h}_{d} \right) \end{bmatrix} - \begin{bmatrix} g \left( {x}_{1} \right) \\ g \left( {x}_{2} \right) \\ \vdots \\ g \left( {x}_{d} \right) \end{bmatrix} \right) = \begin{bmatrix} g' \left( {x}_{1} \right) {h}_{1} \\ g' \left( {x}_{2} \right) {h}_{2} \\ \vdots \\ g' \left( {x}_{d} \right) {h}_{d} \end{bmatrix} = g' \left( \boldsymbol{x} \right) \circ \boldsymbol{h} $$

* <font color='brown'>(**#**)</font> Pay attention that $g \left( \cdot \right)$ is not a scalar function but a vector function.

By definition $ f \left( \boldsymbol{x} \right) = \left \langle \boldsymbol{x}, g \left( \boldsymbol{x} \right) \right \rangle$ hence:

$$
\begin{aligned}
\nabla f \left( \boldsymbol{x} \right) \left[ \boldsymbol{h} \right] & = \left \langle \boldsymbol{a}, \nabla g \left( \boldsymbol{x} \right) \left[ \boldsymbol{h} \right] \right \rangle && \text{Linear operator} \\
& = \left \langle \boldsymbol{a}, g' \left( \boldsymbol{x} \right) \circ \boldsymbol{h} \right \rangle && \text{} \\
& = \left \langle \boldsymbol{a}, \operatorname{Diag} \left( g' \left( \boldsymbol{x} \right) \right) \boldsymbol{h} \right \rangle && \text{Property of Hadamard product: $\boldsymbol{a} \circ \boldsymbol{b} = \operatorname{Diag} \left( \boldsymbol{a} \right) \boldsymbol{b}$} \\
& = \left \langle \operatorname{Diag} \left( g' \left( \boldsymbol{x} \right) \right) \boldsymbol{a}, \boldsymbol{h} \right \rangle && \text{Adjoint of diagonal matrix} \\
& \Rightarrow \nabla f \left( \boldsymbol{x} \right) = \operatorname{Diag} \left( g' \left( \boldsymbol{x} \right) \right) \boldsymbol{a}
&& \blacksquare
\end{aligned}
$$

* <font color='brown'>(**#**)</font> The function $\operatorname{diag} \left( \cdot \right) : \mathbb{R}^{d \times d} \to \mathbb{R}^{d} $ returns the diagonal of a matrix, that is, $\boldsymbol{b} = \operatorname{diag} \left( \boldsymbol{X} \right) \implies \boldsymbol{b} \left[ i \right] = \left( \boldsymbol{X} \left[ i, i\right] \right)$.
* <font color='brown'>(**#**)</font> The function $\operatorname{Diag} \left( \cdot \right) : \mathbb{R}^{d} \to \mathbb{R}^{d \times d} $ returns a diagonal matrix from a vector, that is, $B = \operatorname{diag} \left( \boldsymbol{x} \right) \implies \boldsymbol{B} \left[ i, j \right] = \begin{cases}
{x}_{i} & \text{ if } i = j \\ 
0 & \text{ if } i \neq j 
\end{cases}$.
* <font color='brown'>(**#**)</font> Pay attention that $\left \langle \boldsymbol{a}, \operatorname{diag} \left( X \right) \right \rangle = \left \langle \operatorname{Diag} \left( \boldsymbol{a} \right), X \right \rangle$.

### Step Size Sensitivity Analysis

In this section we'll analyze the sensitivity of the numerical differentiation to the step size, $h$.

We'll use the function:

$$ f \left( \boldsymbol{X} \right) = \left \langle \boldsymbol{A}, \sin \left[ \boldsymbol{X} \right] \right \rangle $$

Where:

 - $\boldsymbol{X} \in \mathbb{R}^{d \times d}$.
 - The function $\sin \left[ \cdot \right]$ is the element wise $\sin$ function: $\boldsymbol{M} = \sin \left[ \boldsymbol{X} \right] \implies \boldsymbol{M} \left[ i, j \right] = \sin \left( \boldsymbol{X} \left[ i, j\right] \right)$.

$$
\begin{aligned}
\nabla f \left( X \right) \left[ \boldsymbol{H} \right] & = \left \langle A, \left( \cos \left[ X \right] \right) \circ H \right \rangle && \text{Since $\frac{d \sin \left( x \right)}{dx} = \cos \left( x \right)$} \\
& = \left \langle \cos \left[ \boldsymbol{X} \right] \circ \boldsymbol{A}, H \right \rangle && \text{Adjoint} \\
& \Rightarrow \nabla f \left( X \right) = \cos \left[ \boldsymbol{X} \right] \circ A
&& \blacksquare
\end{aligned}
$$

In [None]:
% Parameters

numSteps = 1000;

numRows = 100;
numCols = 1; %<! Like a vector

vStepSize = logspace(-3, -9, numSteps);

vMethods    = [DIFF_MODE_FORWARD; DIFF_MODE_BACKWARD; DIFF_MODE_CENTRAL];
vMethodName = ["Forward", "Backward", "Central"];

% Data 
mA = randn(numRows, numCols);
mX = randn(numRows, numCols);

% Function
hF = @(mX) sum(mA .* sin(mX));

% Analytic Gradient
hGradF = @(mX) cos(mX) .* mA;

In [None]:
%% Sensitivity Analysis

numMethods = length(vMethods);

vG = hGradF(mX);
mE = zeros(numSteps, numMethods);

for jj = 1:numMethods
  for ii = 1:numSteps
    mE(ii, jj) = 20 * log10(norm(vG - CalcFunGrad(mX, hF, vMethods(jj), vStepSize(ii)), 'inf'));
  end
end

figure();
hA = axes();
set(hA, 'NextPlot', 'add');
for ii = 1:numMethods
  plot(vStepSize, mE(:, ii), 'DisplayName', vMethodName(ii), 'LineWidth', lineWidthNormal);
end
set(get(hA, 'Title'), 'String', {['Numerical Differentiation Error - Max Absolute Error']}, 'FontSize', fontSizeTitle);
set(get(hA, 'XLabel'), 'String', {['Step Size']}, 'FontSize', fontSizeAxis);
set(get(hA, 'YLabel'), 'String', {['Error [dB]']}, 'FontSize', fontSizeAxis);
ClickableLegend();




## The Complex Step Trick

In general, the finite differences step size si a function of the argument and the function itself.  
There are many cases where the method becomes highly sensitive and with the finite floating point accuracy it might cause some errors.

It turns out that for _real analytic functions_ (Think of a convergent Taylor Series) we can do a trick:

$$ f \left( x + ih \right) = f \left( x \right) + f' \left( x \right) i h + \frac{f'' \left( x \right)}{2} {\left(ih \right)}^{2} + \mathcal{O}(h^3) \implies \mathrm{Im} \,\left( \frac{ f \left( x + ih \right)}{h} \right) = f' \left( x \right) + \mathcal{O}(h^2). $$

Which is much more stable regardless of the value of the step size.

Yet, there are some cases to handle:
 - Use `abs()` which uses the definition `abs(x + i y) = sign(x) * (x + i y)`.
 - Use `min()` / `max()` which only use the real part for comparison.
 - Use `.'` instead of `'` to apply _transpose_ instead of _hermitian transpose_.

Resources:
 - [Sebastien Boisgerault - Complex Step Differentiation](https://direns.mines-paristech.fr/Sites/Complex-analysis/Complex-Step%20Differentiation/).
 - [Nick Higham - What Is the Complex Step Approximation](https://nhigham.com/2020/10/06/what-is-the-complex-step-approximation/).
 - [Derek Elkins - Complex Step Differentiation](https://www.hedonisticlearning.com/posts/complex-step-differentiation.html).


### Analysis

In order to verify the robustness of the problem we'll use:

$$ f \left( x \right) = {e}^{x} $$

At $x = 0$, which will allow us to use a perfect reference and the relative error.

In [None]:
% Parameters

numSteps = 1500;

vStepSize = logspace(-3, -15, numSteps);

vMethods    = [DIFF_MODE_FORWARD; DIFF_MODE_BACKWARD; DIFF_MODE_CENTRAL; DIFF_MODE_COMPLEX];
vMethodName = ["Forward", "Backward", "Central", "Complex"];

% Data 
valX = 0;

% Function
hF = @(x) exp(x);

% Analytic Gradient
gradF = 1; %<! At x = 0

In [None]:
%% Sensitivity Analysis

numMethods = length(vMethods);

mE = zeros(numSteps, numMethods);

for jj = 1:numMethods
  for ii = 1:numSteps
    mE(ii, jj) = 20 * log10(abs(gradF - CalcFunGrad(valX, hF, vMethods(jj), vStepSize(ii))));
  end
end

figure();
hA = axes();
set(hA, 'NextPlot', 'add');
for ii = 1:numMethods
  plot(vStepSize, mE(:, ii), 'DisplayName', vMethodName(ii), 'LineWidth', lineWidthNormal);
end
set(get(hA, 'Title'), 'String', {['Numerical Differentiation Error - Relative Error']}, 'FontSize', fontSizeTitle);
set(get(hA, 'XLabel'), 'String', {['Step Size']}, 'FontSize', fontSizeAxis);
set(get(hA, 'YLabel'), 'String', {['Error [dB]']}, 'FontSize', fontSizeAxis);
set(hA, 'XScale', 'log', 'XDir', 'reverse');
ClickableLegend();


