# HW8

1. Given the binomial function $\eta(\tau, \delta) = \frac{(\tau + \delta)!}{\tau!\delta!}$, show that the following statement is true.
$$
\eta(\tau,\delta) = \sum_{k=0}^\delta \eta(\tau-1,k)
$$

*Proof.*

$$
\begin{split}
    & \eta(\tau,\delta) - \sum_{k=0}^\delta \eta(\tau-1,k) \\
    = & \frac{(\tau + \delta)!}{\tau!~\delta!} - \sum_{k=0}^\delta \frac{(\tau - 1 + k)!}{(\tau - 1)!~ k!} \\
    = & \frac{(\tau + \delta)!}{\tau!~\delta!} - \sum_{k=0}^\delta \frac{\tau~(\tau - 1 + k)!}{(\tau)!~ k!} \\
    = & \frac{1}{\tau !} \left(
            \frac{(\tau + \delta)!}{\delta!} - \frac{\tau~(\tau - 1 + \delta)!}{\delta!} - \sum_{k=0}^{\delta - 1} \frac{\tau~(\tau - 1 + k)!}{k!}
        \right)\\
    = & \frac{1}{\tau !} \left(
            \frac{(\tau + \delta - 1)!}{(\delta - 1)!} - \sum_{k=0}^{\delta - 1} \frac{\tau~(\tau - 1 + k)!}{k!}
        \right)\\
     = & \eta(\tau,\delta - 1) - \sum_{k=0}^{\delta - 1} \eta(\tau-1,k)\\
     = & \eta(\tau,0) - \eta(\tau-1,0) = 1 - 1 = 0
\end{split}
$$

2. Given the following program to compute the $l_2$-norm of a vector $x\in R^n$.

In the program, the ```abs2``` and ```sqrt``` functions can be treated as primitive functions, which means they should not be further decomposed as more elementary functions.

### Tasks
1. Rewrite the program (on paper or with code) to implement the forward mode autodiff, where you can use the notation $\dot y_i \equiv \frac{\partial y}{\partial x_i}$ to denote a derivative.
2. Rewrite the program (on paper or with code) to implement the reverse mode autodiff, where you can use the notation $\overline y \equiv \frac{\partial \mathcal{L}}{\partial y}$ to denote an adjoint, $y \rightarrow T$ to denote pushing a variable to the global stack, and $y \leftarrow T$ to denote poping a variable from the global stack. In your submission, both the forward pass and backward pass should be included.
3. Estimate how many intermediate states is cached in your reverse mode autodiff program?

In [7]:
using ForwardDiff
using Test

function poorman_norm(x::Vector{<:Real})
	nm2 = zero(real(eltype(x)))
	for i=1:length(x)
		nm2 += abs2(x[i])
	end
	ret = sqrt(nm2)
	return ret
end

poorman_norm (generic function with 1 method)

In [16]:
# task 1, function for forward autodiff

function forward_diff_norm!(x::Vector{<:Real}, partial_x::Vector{<:Real})
    nm2 = zero(real(eltype(x)))
    for i = 1:length(x)
        nm2 += abs2(x[i])
        partial_x[i] *= 2 * x[i]
    end
    ret = sqrt(nm2)
    partial_x .*= 1 / (2 * ret)
    return ret, partial_x
end

forward_diff_norm! (generic function with 1 method)

In [43]:
@testset "Forward mode autodiff test" begin
    for i in 1:100
        N = rand(1:100)
        x = rand(N)
        partial_x = [1.0 for i in 1:N]
        ret, diff_x = forward_diff_norm!(x, partial_x)
        @test ret ≈ poorman_norm(x)
        @test diff_x ≈ ForwardDiff.gradient(poorman_norm, x)        
    end
end

[0m[1mTest Summary:              | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Forward mode autodiff test | [32m 200  [39m[36m  200  [39m[0m0.0s


Test.DefaultTestSet("Forward mode autodiff test", Any[], 200, false, false, true, 1.681744114704987e9, 1.681744114729793e9)

In [44]:
# task 2

# this part is the forward process
function reverse_norm_forward!(x::Vector{<:Real}, stack::Vector{<:Real})
    nm2 = zero(real(eltype(x)))
	for i=1:length(x)
		push!(stack, x[i])
		nm2 += abs2(x[i])
	end
	push!(stack, nm2)
	ret = sqrt(nm2)
	return ret, stack
end

# this part is the backward process
function reverse_norm_backward!(partial_x::Vector{<:Real}, stack::Vector{<:Real})
	nm2 = pop!(stack)
	partial_x .*= 1 / (2 * sqrt(nm2))
	N = length(partial_x)
	for i in 1:N
		x = pop!(stack)
		partial_x[N + 1 - i] *= 2 * x
	end
	return partial_x, stack
end

reverse_norm_backward! (generic function with 1 method)

In [45]:
@testset "Reverse mode autodiff test" begin
    for i in 1:100
        N = rand(1:100)
        x = rand(N)
        partial_x = [1.0 for i in 1:N]
        stack = Vector{Real}()
        ret, stack = reverse_norm_forward!(x, stack)
        partial_x, stack = reverse_norm_backward!(partial_x, stack)
        @test ret ≈ poorman_norm(x)
        @test partial_x ≈ ForwardDiff.gradient(poorman_norm, x)        
    end
end

[0m[1mTest Summary:              | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Reverse mode autodiff test | [32m 200  [39m[36m  200  [39m[0m0.1s


Test.DefaultTestSet("Reverse mode autodiff test", Any[], 200, false, false, true, 1.681744116112714e9, 1.681744116180288e9)

In my reverse mode, there are $N + 1$ data stored in the stack, when $x$ is a N-d data.