# Forward Mode Automatic Differentiation

At the heart of modern machine learning, so popular in 2019, is an optimization problem. Optimization means gradients, so suddenly differentiation, especially automatic differentiation (AD), is exciting.

The first time one hears about AD, it is easy to imagine what it is. Surely it is straightforward symbolic differentiation applied to code. One imagines automatically doing what is learned in a calculus class, like
$$ \frac{d}{dx}x^n = n x^{n-1}. $$
**This is not how AD works.**

Ok, then it must be numerical differentiation and to first order
$$ \frac{df}{dx} \approx \frac{f(x+h) - f(x)}{\Delta h} $$
**This is also not how AD works.**

# Babylonian sqrt

> Repeat $t \leftarrow (t + x/2)/2$ until $t$ converges to $\sqrt{x}$.

In [None]:
@inline function Babylonian(x; N = 10)
    t = (1+x)/2
    for i = 2:N
        t = (t + x/t)/2
    end
    t
end

In [None]:
α = π

Babylonian(α), √α

In [None]:
Babylonian(2), √2

In [None]:
using PyPlot

xs = 0:0.01:49

for i in 1:5
    plot(xs, [Babylonian(x; N=i) for x in xs], label="Iteration $i")
end

plot(xs, sqrt.(xs), label="sqrt", color="black")

legend()
title("Those Babylonians really knew how to √")

# ... and now the derivative, almost by magic

Ten lines of Julia code! No mention of 1/2 over sqrt(x). D for "dual number", invented by Clifford in 1873.

In [None]:
struct D <: Number
    x::Float64 # value
    ϵ::Float64 # derivative
end

import Base: +, /, convert, promote_rule
a::D + b::D = D(a.x + b.x, a.ϵ + b.ϵ) # sum rule
a::D / b::D = D(a.x / b.x, (b.x * a.ϵ - a.x * b.ϵ)/b.x^2) # quotient rule
convert(::Type{D}, x::Real) = D(x, zero(x))
promote_rule(::Type{D}, ::Type{<:Number}) = D

The same babylonian algorithm with no rewrite at all computes properly the derivative as the check shows.

In [None]:
Babylonian(D(5, 1)) |> dump

In [None]:
x = 5; Babylonian(D(x, 1)), (√x, 0.5 / √x)

In [None]:
x = π; Babylonian(D(x, 1)), (√x, 0.5 / √x)

### It just works!

In [None]:
code_llvm(Babylonian, (D,); debuginfo=:none)

# Symbolically

It may be of some value to understand that the below is mathematically equivalent, though not what the computation is doing.

In [None]:
using SymPy

In [None]:
x = symbols("x")

display("Iterations as a function of x")
for k = 1:5
    display(simplify(Babylonian(x; N=k)))
end

display("Derivatives as a function of x")
for k = 1:5
    display(simplify(diff(simplify(Babylonian(x; N=k)), x)))
end

How does AD get the answer?

Note that we can rewrite our algorithm by taking the derivative of every operation to obtain the correct result.

In [None]:
function dBabylonian(x; N=10)
    t = (1+x)/2
    dt = 1/2
    for i = 2:N
        t = (t + x/t)/2
        dt = (dt + (t-x*dt)/t^2)/2;
    end
    dt
end

In [None]:
dBabylonian(5), 0.5/√5

So, basically the trick is for the computer system to do the rewrite for you, without any loss of speed or convenience.

Important: The derivative is substituted *before* the JIT compilation, and thus efficient compiled code is executed.

# ForwardDiff.jl

In [None]:
# utility function for our small forward AD
derivative(f::Function, x::Number) = f(D(x, one(x))).ϵ

In [None]:
derivative(Babylonian, 2)

In [None]:
derivative(x -> 1/x, 4), -1/4^2

Now that we have understood how forward AD works, we can use the more feature complete package [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl).

In [None]:
using ForwardDiff

In [None]:
ForwardDiff.derivative(Babylonian, 2)

In [None]:
@edit ForwardDiff.derivative(Babylonian, 2)

(Note: [DiffRules.jl](https://github.com/JuliaDiff/DiffRules.jl))

**Example**: pressure = - d/dV(free energy)

# Some nice reads

Blog posts:

* ML in Julia: https://julialang.org/blog/2018/12/ml-language-compiler

* Nice example: https://fluxml.ai/2019/03/05/dp-vs-rl.html

* Nice interactive examples: https://fluxml.ai/experiments/

* Why Julia for ML? https://julialang.org/blog/2017/12/ml&pl

* Neural networks with differential equation layers: https://julialang.org/blog/2019/01/fluxdiffeq

* Implement Your Own Automatic Differentiation with Julia in ONE day : http://blog.rogerluo.me/2018/10/23/write-an-ad-in-one-day/

* Implement Your Own Source To Source AD in ONE day!: http://blog.rogerluo.me/2019/07/27/yassad/

Repositories:

* AD flavors, like forward and reverse mode AD: https://github.com/MikeInnes/diff-zoo (Mike is one of the smartest Julia ML heads)

Talks:

* AD is a compiler problem: https://juliacomputing.com/assets/pdf/CGO_C4ML_talk.pdf