# Numerical Differentiation

We have already mentioned the computer taking a derivative. For example, if you do not supply a gradient to an optimization routine like Quasi-Newton BFGS, the routine will still run. How?

### Option 1: Take a derivative by hand and write a function

This is self explanatory.




### Option 2:  Finite Differences

$$ f'(x) \approx  \frac{f(x+h) - f(x)}{h} $$

where $h=\max\{ \epsilon \times | x | , \epsilon\}$ and $\epsilon$ is small, like $10^{-6}$.

We want h to be small compared to x, but numerical it cannot be too small or else we divide by something close to zero. 

**Multivariate Version**

$$ f_i(x_1,...,x_n) \approx  \frac{f(x_i+h_i;\mathbf{x}) - f(\mathbf{x})}{h_i}    $$

But the number of evaluations of $f(x)$ is $n+1$. So finite differences can add substantial computation time, compared to explicit derivatives. Plus, they are less accurate. 

**2-sided Finite Differences**

$$ f'(x) \approx  \frac{f(x+h) - f(x-h)}{2h}   $$ 

which will reduce errors but now costs twice the time. 

In [16]:
f(x) = x.^2

f (generic function with 1 method)

In [2]:
function finDiff(g,x,hh=10e-6)
    h = max(hh*x,hh)
    gg = (g(x+h) - g(x))./h
end

finDiff (generic function with 2 methods)

In [17]:
finDiff(f,1.0,1.0)

3.0

In [4]:
finDiff(f,1.0,0.5)

2.5

In [5]:
finDiff(f,1.0,0.1)

2.100000000000002

In [6]:
finDiff(f,1.0)

2.00001000001393

### Option 3: Automatic Differentiation

The computer is evaluating $f(x)$ by evaluating the individual "pieces." The logic of AD is that we can use the "sub-evaluations" the computer is already doing by expressing any derivative in terms of the chain rule. 

**Example**
                $$f(x,y,z) = (x^α + y^α + z^α)^{\gamma}$$

To find $\nabla f$...

1. Compute original function, the computer must compute $x^α$, $y^α$, $z^α$, $x^α + y^α + z^α$, and $(x^α + y^α + z^α)^{\gamma}$;
2. Store these values to use later;
3. $f_x = (x^α + y^α + z^α)^{\gamma - 1}\gamma\alpha x^{\alpha - 1}$, so we need 2 divisions and 3 multiplications using the store values...
$$ f_x = \frac{(x^α + y^α + z^α)^{\gamma}}{x^α + y^α + z^α}\gamma \alpha \frac{x^{\alpha}}{x} $$

The computational burden is much much lower than finite differences (arithmatic) and the result is the exact derivative.

Computer scientists have been developing robust routines to take adavantage of these internal computations and the chain rule. ```Julia``` has packages to compute derivatives this way although I am not aware of a ```Matlab``` package. 

## Integration (Numerical Quadrature)

In many cases we want the computer to compute a (in)definite integral of $f$ w.r.t. a weight function $w(x)$ over interval $I$ on $R^n$.

$$ \int_I f(x) w(x) dx \approx \sum_{i=1}^n w_i f(x_i)$$

### Monte Carlo Integration

**Example 1: Normal distribution**

In [35]:
import Random
Random.seed!(338734)
using StatsFuns



In [37]:
rand()

0.37486250577104263

In [39]:
rand(Int8)

-75

In [51]:
norminvcdf(0.842)

1.0027116650265493

In [41]:
a = rand()

0.7307354603031468

In [42]:
norminvcdf(a)

0.615038826758967

In [48]:
randn()

-0.1767380843071841

In [71]:
# E[x^2] where x~N(0,1) is the Variance of X. So E(X^2)=1. How close can we get?
f(x) = x.^2

f (generic function with 1 method)

In [77]:
b = randn(1,1);
c = randn(100,1);
d = randn(10000,1);
e = randn(100000,1);

In [78]:
println((1/1).*sum(f(b)))
println((1/100).*sum(f(c)))
println((1/10000).*sum(f(d)))
println((1/100000).*sum(f(e)))

0.2489236503900925
0.8764093022817661
1.005206172232836
1.0001286993607696


This is just illustrative. We could have gotten close to $E[x^2]=1$ by chance. 