# Differentiation

Compute derivatives, first by finite differences, then by automatic differentiation.

These formulas are based on the definition of the derivative:

$$f'(x) = \lim_{h\to 0}\frac{f(x+h)-f(x)}{h}.$$

In [1]:
forward_diff(f, x0, h) = (f(x0 + h) - f(x0) )/h
backward_diff(f, x0, h) = (f(x0) - f(x0-h))/h
centred_diff(f, x0, h) = (f(x0+h)-f(x0-h))/(2*h)

centred_diff (generic function with 1 method)

In [2]:
@show forward_diff(sin, 0, 0.01)
@show backward_diff(sin, 0, 0.01)
@show centred_diff(sin, 0, 0.01);

forward_diff(sin, 0, 0.01) = 0.9999833334166665
backward_diff(sin, 0, 0.01) = 0.9999833334166665
centred_diff(sin, 0, 0.01) = 0.9999833334166665


In [3]:
@show forward_diff(exp, 2, 0.01)
@show backward_diff(exp, 2, 0.01)
@show centred_diff(exp, 2, 0.01);

forward_diff(exp, 2, 0.01) = 7.426124838854253
backward_diff(exp, 2, 0.01) = 7.352233662108354
centred_diff(exp, 2, 0.01) = 7.389179250481304


## Automatic differentiation

Create an ordered pair of a number $x$ and the derivative of a function evaluated at $x$. Use algebraic differentiation rules to define elementary functions on these ordered pairs ("dual numbers").

See:

* https://www.juliabloggers.com/automatic-differentiation-with-dual-numbers/
* [Video](https://www.youtube.com/watch?v=vAp6nUMrKYg)
* [ForwardDiff documentation](https://juliadiff.org/ForwardDiff.jl/v0.8/dev/how_it_works.html)

In [21]:
struct DN   # "dual number"
    val
    deriv
end

Redefine plus, minus, product, quotient, exponentiation, etc.

In [22]:
Base.:+(a::DN, b::DN) = DN(a.val + b.val, a.deriv .+ b.deriv)
Base.:*(a::DN, b::DN) = DN(a.val * b.val, b.val .* a.deriv .+ a.val .* b.deriv)
Base.:^(a::DN, b) = DN(a.val^b, b .* a.val .^ (b .- 1) .* a.deriv);

Compute the derivative of a function of a, b, and c with respect to a. Make the derivative of a = 1 and the derivatives of b and c 0.

In [23]:
a = DN(1.0, 0.0) # da/da = 1
b = DN(0.5, 1.0) # db/da = 0
c = DN(2.0, 0.0);# dc/da = 0

Evaluate $f(a, b, c) = c(a+2b)^2$.

In [24]:
d = c*(a + DN(2.0, 0.0)*b)^2  # c(a+b)*2*1 

DN(8.0, 16.0)

Now get derivatives with respect to each variable.

In [25]:
a = DN(1.0, [1.0, 0.0, 0.0])
b = DN(0.5, [0.0, 1.0, 0.0])
c = DN(2.0, [0.0, 0.0, 1.0])
d = c*(a + DN(2.0, 0.0)*b)^2  

DN(8.0, [8.0, 16.0, 4.0])

Check. Compute d(a,b,c) and d/da, d/db, d/dc.

In [26]:
df(a,b,c) = c*(a+2*b)^2
dda(a,b,c) = c*(a+2*b)*2
ddb(a,b,c) = c*(a+2*b)*2*2
ddc(a,b,c) = (a+2*b)^2
@show df(1, 0.5, 2)
@show dda(1, 0.5, 2)
@show ddb(1, 0.5, 2)
@show ddc(1, 0.5, 2);

df(1, 0.5, 2) = 8.0
dda(1, 0.5, 2) = 8.0
ddb(1, 0.5, 2) = 16.0
ddc(1, 0.5, 2) = 4.0


Define sin, cos, exp, log, abs, etc. in the same way.

Until that's done, can't compute sin(a), a^a, log(a), exp(a), etc.

For a complete implementation of this idea, use the ForwardDiff library.

In [1]:
using ForwardDiff

In [20]:
ForwardDiff.derivative(a -> 2.0*(a + 2*0.5)^2, 1.0)

8.0

In [12]:
f(x) = sin(exp(x) + 4*log(x))
ForwardDiff.derivative(f, pi)

-20.752991086118303

Check: $\frac{d}{dx} \sin(e^x+4\log x) = \cos(e^x + 4\log x)*(e^x+4/x)$.

In [13]:
df(x) = cos(exp(x)+4*log(x))*(exp(x)+4/x)
@show df(pi)
@show df(pi) - ForwardDiff.derivative(f, pi);

df(pi) = -20.752991086118303
df(pi) - ForwardDiff.derivative(f, pi) = 0.0


In [14]:
@show centred_diff(f, pi, 1e-5)
@show centred_diff(f, pi, 1e-5) - ForwardDiff.derivative(f, pi);

centred_diff(f, pi, 1.0e-5) = -20.75299089493443
centred_diff(f, pi, 1.0e-5) - ForwardDiff.derivative(f, pi) = 1.9118387228900247e-7


Which method requires more computations? Which takes longer?

In [15]:
@time for i=1:100000
        centred_diff(f, pi*i/1000, 0.00001)
end

  0.008932 seconds


In [16]:
@time for i=1:100000
        ForwardDiff.derivative(f, pi*i/1000)
end

  0.004735 seconds


More exotic examples and some unexpected results.

In [17]:
g(x) = x^x
@show ForwardDiff.derivative(g, 2)  # log y = x log x; y' = y (1 + log x)
@show 2^2 * (1 + log(2));

ForwardDiff.derivative(g, 2) = 6.772588722239782
2 ^ 2 * (1 + log(2)) = 6.772588722239782


# 

In [18]:
@show ForwardDiff.derivative(log, 0)
@show ForwardDiff.derivative(abs, 0)
h1(x) = 1/x
@show ForwardDiff.derivative(h1, 0)
h2(x) = sin(x)/x
@show ForwardDiff.derivative(h2, 0);

ForwardDiff.derivative(log, 0) = Inf
ForwardDiff.derivative(abs, 0) = 1
ForwardDiff.derivative(h1, 0) = -Inf
ForwardDiff.derivative(h2, 0) = NaN


## Higher order derivatives

To get a second or higher order derivative, compose the derivative function with itself.

In [27]:
f(x) = 3*x^4 - x^2 + 1
df(x) = ForwardDiff.derivative(f,x)
d2f(x) = ForwardDiff.derivative(df,x)
d3f(x) = ForwardDiff.derivative(d2f,x)
d4f(x) = ForwardDiff.derivative(d3f,x)
@show f(2)
@show df(2)
@show d2f(2)
@show d3f(2)
@show d4f(2)

f(2) = 45
df(2) = 92
d2f(2) = 142
d3f(2) = 144
d4f(2) = 72


72

In [31]:
@time ForwardDiff.derivative(d4f, 2)

  0.000000 seconds


0

## Functions of more than one variable

Examples of gradients and hessians for functions from $\mathbb{R}^n \to \mathbb{R}$ and examples of jacobians for functions from $\mathbb{R}^n \to \mathbb{R^m}$ 

In [36]:
f1(x) = x[1] * x[2] ^ x[3] + sin(x[2]) - log(x[1])
ForwardDiff.gradient(f1, [1.0, 2.0, 3.0])

3-element Vector{Float64}:
  7.0
 11.583853163452858
  5.545177444479562

In [37]:
ForwardDiff.hessian(f1, [1.0, 2.0, 3.0])

3×3 Matrix{Float64}:
  1.0      12.0      5.54518
 12.0      11.0907  12.3178
  5.54518  12.3178   3.84362

In [40]:
f2(x) = [ x[1] * x[2] , x[2] - x[1], sin(x[1])*cos(x[2]) ]
ForwardDiff.jacobian(f2, [1.0, 2.0])

3×2 Matrix{Float64}:
  2.0        1.0
 -1.0        1.0
 -0.224845  -0.765147

In [41]:
# From week 5, nonlinear curves
function func(x)
    [  exp(x[2]-x[1]) - 2,
       x[1]*x[2] + x[3],
       x[2]*x[3] + x[1]^2 - x[2]
    ];
end;
   
function jac(x)
    [ 
      -exp(x[2]-x[1])  exp(x[2]-x[1])   0
       x[2]            x[1]             1
       2*x[1]          x[3]-1           x[2]
    ];
end;

In [42]:
ForwardDiff.jacobian(func, [1, 2, 3])

3×3 Matrix{Float64}:
 -2.71828  2.71828  0.0
  2.0      1.0      1.0
  2.0      2.0      2.0

In [43]:
jac([1,2,3])

3×3 Matrix{Float64}:
 -2.71828  2.71828  0.0
  2.0      1.0      1.0
  2.0      2.0      2.0

In [44]:
# show that the error is 0 for randomly selected values
total_error = 0.0
for i in 1:10000
    t = randn(3)
    total_error += sum(abs.(ForwardDiff.jacobian(func, t) .- jac(t)))
end
total_error

0.0