[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io/)

# AI Program

## Essential Matrix Calculus - Numerical Differentiation

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 06/02/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0007NumericDiff.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
import autograd.numpy as anp
from autograd import grad
from autograd import elementwise_grad as egrad

# Miscellaneous
import os
import math
from platform import python_version
import random

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
from matplotlib.colors import LogNorm, Normalize, PowerNorm
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
%matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [None]:
# Course Packages

from NumericDiff import * 

In [None]:
# Auxiliary Functions

In [None]:
# Parameters

## Numerical Differentiation

This notebook explores the use of [_Numerical Differentiation_](https://en.wikipedia.org/wiki/Numerical_differentiation) to calculate the gradient of a function.

The gradient of a multivariate scalar function, $f : \mathbb{R}^{n} \to \mathbb{R}$, is given by:

$$ {{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} = \lim_{t \to 0} \frac{ f \left( \boldsymbol{x} + t \boldsymbol{e}_{i} \right) - f \left( \boldsymbol{x} \right) }{t} $$

Where $\boldsymbol{e}_{i} = \left[ 0, 0, \ldots, 0, \underbrace{1}_{\text{i -th index}}, 0, \ldots, 0 \right]$. 

This can be approximated by [_Finite Difference_](https://en.wikipedia.org/wiki/Finite_difference) with specific [_Finite Difference Coefficient_](https://en.wikipedia.org/wiki/Finite_difference_coefficient).  
There 3 common approaches:

 - Forward: ${{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} \approx \frac{ f \left( \boldsymbol{x} + h \boldsymbol{e}_{i} \right) - f \left( \boldsymbol{x} \right) }{h}$.
 - Backward: ${{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} \approx \frac{ f \left( \boldsymbol{x} \right) - f \left( \boldsymbol{x} - h \boldsymbol{e}_{i} \right) }{h}$.
 - Central: ${{\nabla}_{x} f \left( \boldsymbol{x} \right)}_{i} \approx \frac{ f \left( \boldsymbol{x} + h \boldsymbol{e}_{i} \right) - f \left( \boldsymbol{x} - h \boldsymbol{e}_{i} \right) }{2 h}$.


* <font color='brown'>(**#**)</font> The notebook use the `NumericDiff.py` file for the actual calculations.

### Step Size Sensitivity Analysis

In this section we'll analyze the sensitivity of the numerical differentiation to the step size, $h$.

We'll use the function:

$$ f \left( \boldsymbol{X} \right) = \left \langle \boldsymbol{A}, \sin \left[ \boldsymbol{X} \right] \right \rangle $$

Where:

 - $\boldsymbol{X} \in \mathbb{R}^{d \times d}$.
 - The function $\sin \left[ \cdot \right]$ is the element wise $\sin$ function: $\boldsymbol{M} = \sin \left[ \boldsymbol{X} \right] \implies \boldsymbol{M} \left[ i, j \right] = \sin \left( \boldsymbol{X} \left[ i, j\right] \right)$.

$$
\begin{aligned}
\nabla f \left( X \right) \left[ \boldsymbol{H} \right] & = \left \langle A, \left( \cos \left[ X \right] \right) \circ H \right \rangle && \text{Since $\frac{d \sin \left( x \right)}{dx} = \cos \left( x \right)$} \\
& = \left \langle \cos \left[ \boldsymbol{X} \right] \circ \boldsymbol{A}, H \right \rangle && \text{Adjoint} \\
& \Rightarrow \nabla f \left( X \right) = \cos \left[ \boldsymbol{X} \right] \circ A
&& \blacksquare
\end{aligned}
$$

* <font color='brown'>(**#**)</font> The function can be evaluated efficiently using element wise multiplication and summation.

In [None]:
# Parameters

numSteps = 1000

numRows = 5
numCols = 1; #<! Like a vector

vStepSize = np.logspace(-4, -11, numSteps)

lMethods    = [DiffMode.BACKWARD, DiffMode.CENTRAL, DiffMode.FORWARD]
lMethodName = ['Forward', 'Backward', 'Central']

# Data 
mA = np.random.randn(numRows, numCols)
mX = np.random.randn(numRows, numCols)

# Function
hF = lambda mX: np.sum(mA * np.sin(mX))

# Analytic Gradient
hGradF = lambda mX: np.cos(mX) * mA

In [None]:
# Sensitivity Analysis

numMethods = len(lMethods)

vG = hGradF(mX)
mE = np.zeros(shape = (numSteps, numMethods)) #<! Error

for jj in range(numMethods):
  for ii in range(numSteps):
    mE[ii, jj] = 20 * np.log10(np.linalg.norm(vG - CalcFunGrad(mX, hF, diffMode = lMethods[jj], ε = vStepSize[ii]), np.inf))


In [None]:
# Display Results

hF, hA = plt.subplots(figsize = (16, 8))

for ii in range(numMethods):
  hA.plot(vStepSize, mE[:, ii], lw = 2, label = f'{lMethodName[ii]}')

hA.set_title('Numerical Differentiation Error - Max Absolute Error')
hA.set_xlabel('Step Size')
hA.set_ylabel('Error [dB]')

hA.legend();

## The Complex Step Trick

In general, the finite differences step size si a function of the argument and the function itself.  
There are many cases where the method becomes highly sensitive and with the finite floating point accuracy it might cause some errors.

It turns out that for _real analytic functions_ (Think of a convergent Taylor Series) we can do a trick:

$$ f \left( x + ih \right) = f \left( x \right) + f' \left( x \right) i h + \frac{f'' \left( x \right)}{2} {\left(ih \right)}^{2} + \mathcal{O}(h^3) \implies \mathrm{Im} \,\left( \frac{ f \left( x + ih \right)}{h} \right) = f' \left( x \right) + \mathcal{O}(h^2). $$

Which is much more stable regardless of the value of the step size.

Yet, there are some cases to handle:
 - Use `abs()` which uses the definition `abs(x + i y) = sign(x) * (x + i y)`.
 - Use `min()` / `max()` which only use the real part for comparison.
 - Use `.'` instead of `'` to apply _transpose_ instead of _hermitian transpose_.

Resources:
 - [Sebastien Boisgerault - Complex Step Differentiation](https://direns.mines-paristech.fr/Sites/Complex-analysis/Complex-Step%20Differentiation/).
 - [Nick Higham - What Is the Complex Step Approximation](https://nhigham.com/2020/10/06/what-is-the-complex-step-approximation/).
 - [Derek Elkins - Complex Step Differentiation](https://www.hedonisticlearning.com/posts/complex-step-differentiation.html).


### Analysis

In order to verify the robustness of the problem we'll use:

$$ f \left( x \right) = {e}^{x} $$

At $x = 0$, which will allow us to use a perfect reference and the relative error.

In [None]:
# Parameters

numSteps = 1500

vStepSize = np.logspace(-3, -15, numSteps)

lMethods    = [DiffMode.BACKWARD, DiffMode.CENTRAL, DiffMode.FORWARD, DiffMode.COMPLEX]
lMethodName = ['Forward', 'Backward', 'Central', 'Complex']

# Data 
valX = 0.0

# Function
hF = lambda x: np.exp(x)

# Analytic Gradient
gradF = 1; #<! At x = 0

In [None]:
# Sensitivity Analysis

numMethods = len(lMethods)

mE = np.zeros(shape = (numSteps, numMethods)) #<! Error

for jj in range(numMethods):
  for ii in range(numSteps):
    mE[ii, jj] = 20 * np.log10(abs(gradF - CalcFunGrad(valX, hF, diffMode = lMethods[jj], ε = vStepSize[ii])))

In [None]:
# Display Results

hF, hA = plt.subplots(figsize = (16, 8))

for ii in range(numMethods):
  hA.plot(vStepSize, mE[:, ii], lw = 2, label = f'{lMethodName[ii]}')

hA.set_title('Numerical Differentiation Error - Relative Error')
hA.set_xlabel('Step Size')
hA.set_ylabel('Error [dB]')
hA.set_xscale('log')
hA.invert_xaxis()

hA.legend();