# Derivatives of a functions using Finite Differences - Differenzenquotient
In the lecture, we learned different ways to calculate the derivative of a function:
- finite differences
    - forward difference
    - central difference
- automatic differentiation (AD)
    - forward mode (future exercise)
    - reverse mode (future exercise)

> Note: A good explanation for the error in estimating derivatives and automatic differentiation can be found here: https://book.sciml.ai/notes/08-Forward-Mode_Automatic_Differentiation_(AD)_via_High_Dimensional_Algebras/ 
> Some parts of this exercise are based on these lecture notes.

The derivative of a function $f(x)$ can be approximated by the finite differences:
$$
\frac{d f}{d x} \approx \frac{f(x + \Delta x) - f(x)}{\Delta x}
$$
This is called the forward difference. We can also use the central difference:
$$
\frac{d f}{d x} \approx \frac{f(x + \Delta x) - f(x - \Delta x)}{2 \Delta x}
$$

In the lecture we already discussed that the step size $\Delta x$ has a big influence on the error of the approximation. We want to investigate this influence and visualize the error for different step sizes $\Delta x$. What is the smallest number we can use for $\Delta x$? What happens if we use such a small number? What happens if we use a larger number? 

In [None]:
import Pkg
Pkg.activate("finitediff")
Pkg.add("Plots")
Pkg.add("FiniteDifferences")
Pkg.add("Printf")

In [None]:
import Pkg
Pkg.activate("finitediff")
using FiniteDifferences
using Printf
using Plots

At first let's take a look at the forward and central finite difference methods. We let's start by defining a function we want to differentiate together with the differentiation so that we can estimate the error we are making.

In [None]:
# defining a function and its derivative
function f(x)
    x^2 + sin(x)
end

function df(x)
    2 * x + cos(x)
end

Now let's define the point `x` at which we want to evaluate the derivative. An compare the forward finite difference method with the central finite difference method.

In [None]:
# Point where we want to approximate the derivative
x = 2.0

# testing the finite difference methods for the function f(x) = x^2+sin(x)
central_fdm(2, 1)(f, x) # 2 is the number of points and 1 is the order of the derivative
forward_fdm(2, 1)(f, x)
df(2)
error_cforward = abs(forward_fdm(2, 1)(f, 2) - df(2))
error_central = abs(central_fdm(2, 1)(f, 2) - df(2))
@printf("Error for forward difference: %.14f\n", error_cforward)
@printf("Error for central difference: %.14f\n", error_central)

## Different Step Sizes

Write a function which takes the function $f(x)$, a value for $x$, and a vector of step sizes $\Delta x$ and returns the forward and central differences of $f(x)$ at $x$ for the different step sizes $\Delta x$.

In [None]:
function FiniteDifferenceMethod(x::Real, our_eps::Vector, f::Function)
    forward = # TODO: implement the forward finite difference method
    central = # TODO: implement the central finite difference method
    forward, central
end

Let's test our implementation for different eps:

In [None]:
eps1 = 1e-2
eps2 = 1e-10
# FiniteDifferenceMethod(x, eps_range, f)
forward1, central1 = FiniteDifferenceMethod(x, [eps1], f)
forward2, central2 = FiniteDifferenceMethod(x, [eps2], f)

@printf("With eps = %.2e:\n", eps1)
@printf("Forward difference approximation: %.6f\n", forward1[1])
@printf("Central difference approximation: %.6f\n", central1[1])
@printf("Error forward difference approximation: %.10f\n", abs(forward1[1] - df(2)))
@printf("Error central difference approximation: %.10f\n", abs(central1[1] - df(2)))
@printf("\nWith eps = %.2e:\n", eps2)
@printf("Forward difference approximation: %.6f\n", forward2[1])
@printf("Central difference approximation: %.6f\n", central2[1])
@printf("Error forward difference approximation: %.10f\n", abs(forward2[1] - df(2)))
@printf("Error central difference approximation: %.10f\n", abs(central2[1] - df(2)))

## Different Step Sizes based on the machine precision $\epsilon$

First lets check our machine precision $\epsilon$:

In [None]:
prec = eps(Float64)

What does this number tell us? Since floating point numbers are scaled there is a limit to the precision of the numbers we want to represent and perform operations with. This is however relative to the size of the number:

In [None]:
@show eps(1.0)
@show eps(0.1)
@show eps(0.01);

As we have seen in the lecture this can be a problem if we substract numbers that are close to each other. Let's check this again:

In [None]:
ϵ = 1e-10rand()
@show ϵ
@show (1+ϵ)
@show ϵ2 = (1+ϵ) - 1
@show (ϵ - ϵ2);

By adding 1 to the small random number we lose the information of the last digits of the small number. When we substract 1 again we don't get the same number back. Hence, we have a loss of precision.

Ok, so now we want to see how this influence our finite differences approach. We learned that $\sqrt{\epsilon}$ is a good choice for $\Delta x$. In general we cannot expect a lower errer than approximately $\sqrt{\epsilon} \approx 10^{-8}$.

In [None]:
@show sqrt(eps(Float64));

We use $\Delta x = 2.3*10^{-16}$ as the smallest number. For the largest number we use $\Delta x = 0.1$. Let's plot the error for different step sizes $\Delta x$.

Plot the error of the the forward and central difference for the function $f(x) = x^2+sin(x)$ and for different step sizes $\Delta x$ from $0.1$ to $10^{-16}$.

In [None]:
import Plots
using LinearAlgebra
f(x) = x^2+sin(x)
df(x) = 2*x+cos(x) # analytical derivative of f to estimate the error

eps_length = 16
eps_range = 10 .^ -range(1, stop=16, length=eps_length)
eps_range[eps_length] = 2.3*10^-16 # set the last element to 2.3e-16
x = 1.0
real_dfx = df(x)
df_forw_eps, df_cent_eps = FiniteDifferenceMethod(x, eps_range, f)
# TODO: estimate derivative and error using FiniteDifferenceMethod
error_forward = ...
error_central = ...

p1 = Plots.scatter(eps_range, error_forward, label="err(eps)" , title="forward", xlabel="log eps", ylabel="log error", xscale=:log10, yscale=:log10)
p2= Plots.scatter(eps_range, error_central, label="err(eps)" , title="central", xlabel="log eps", ylabel="log error", xscale=:log10, yscale=:log10)
Plots.plot(p1, p2)

In [None]:
eps_range = 10 .^ -range(4, stop=6, length=1000)
df_fd_eps, df_cd_eps = FiniteDifferenceMethod(x, eps_range, f)

# TODO: estimate the error 
error = ... 

p3 = Plots.scatter(eps_range, abs.(df_cd_eps.-real_dfx), label="err(eps)" , title="central", xlabel="eps", ylabel="error", xscale=:log10, yscale=:log10)

### Derivatives of a function with multiple variables with finite differences
We have learned that the partial derivative of a function $f(x,y)$ with respect to $x$ is estimated by keeping $y$ constant and vice versa. We can use this approach to estimate the partial derivative using finite differences. Since the gradient of a function $f(x,y)$ is defined as:
$$
\nabla f = \begin{bmatrix}
\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}
\end{bmatrix}
$$
We can estimate the gradient with the partial derivative for each variable seperately.

Let's try this for the function $f(x, y) = x^2 + x*y$.

In [None]:
f(x, y) = x^2 + x * y
a, b = 1.0, 1.0 # We want to estimate the partial derivatives of f at (a,b)

f_x(x) = f(x, b)
f_y(y) = f(a, y)

real_df_dx = 2 * a + b
real_df_dy = a

df_dx = # TODO: estimate partial derivative w.r.t. x using central difference
df_dy = # TODO: estimate partial derivative w.r.t. y using central difference
@show grad_f = [df_dx, df_dy]
@show real_grad_f = [real_df_dx, real_df_dy];

If we have another equation $g(x,y) = x^2 + y^2$ we can calculate the total derivative of the system of equations $f(x,y)$ and $g(x,y)$ with respect to $x$ and $y$ which is the Jacobian matrix:
$$
\mathbf{J} = \begin{bmatrix}
\nabla f \\
 \nabla g
\end{bmatrix} = \begin{bmatrix}
\frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \\
\frac{\partial g}{\partial x} & \frac{\partial g}{\partial y}
\end{bmatrix} 
$$

In [None]:
g(x, y) = x^2 + y^2

g_x(x) = g(x, b)
g_y(y) = g(a, y)

real_dg_dx = 2 * a
real_dg_dy = 2 * b

dg_dx = # TODO: estimate partial derivative w.r.t. x using central difference
dg_dy = # TODO: estimate partial derivative w.r.t. y using central difference
@show grad_g = [dg_dx, dg_dy]
@show real_grad_g = [real_dg_dx, real_dg_dy]
@show jacobian_f = [df_dx df_dy; dg_dx dg_dy];

We used an inefficient way to calculate the Jacobian. We needed to calculate the partial derivative of each function with respect to each variable. There are more efficient ways to calculate the Jacobian matrix like using colored Jacobians or automatic differentiation (next section).

In practice you can use FiniteDifferences.jl to calculate the Jacobian matrix using finite differences. This package can be installed with `Pkg.add("FiniteDifferences")`. 
Let's try this out for the function 
$$
\mathbf{f(x)} =  \begin{bmatrix}
x_1^2 + x_1*x_2 \\
x_1^2 + x_2^2
\end{bmatrix} 
$$. 
We can use the function `jacobian` from the package FiniteDifferences.jl to calculate the Jacobian matrix:

In [None]:
import FiniteDifferences
f(x) = [x[1]^2+x[1]*x[2]; 
        x[1]^2+x[2]^2]
x = [1.0, 1.0]

FiniteDifferences.jacobian(FiniteDifferences.central_fdm(2, 1), f, x)[1]

In [None]:
import FiniteDifferences
f(x) = [x[1]^2+x[1]*x[2]; 
        x[1]^2+x[2]^2]
x = [1.0, 1.0]

FiniteDifferences.jacobian(FiniteDifferences.central_fdm(2, 1), f, x)[1]

We can see that the central difference approximation is of order O(eps^2). It increases quadratically with eps and is smaller than the forward difference approximation