Automatic Differentation has two modes:
- Forward mode: use when the function is $R^{N} \rarr R$
- Reverse mode: use when the function is $R \rarr R^{N}$

In [143]:
using CSV, DataFrames, RDatasets, DrWatson
using Flux, Optim
using StatsBase
using Zygote
using LossFunctions

In [138]:
datapath = datadir("exp_raw");
advertising = CSV.File(joinpath(datapath, "Advertising.csv")) |> DataFrame
advertising = advertising[!,Not(:Column1)]
n,m = size(advertising)
first(advertising,5)
X = [ones(n) standardize(ZScoreTransform,Matrix(advertising[:,Not(:Sales)]), dims=1) ][1:10,:]
y = standardize(ZScoreTransform,advertising.Sales)[1:10];

In [191]:
ŷ(X,β) = X * β

ŷ (generic function with 1 method)

In [145]:
loss = L2DistLoss()
mse(β) = sum(value(loss,ŷ(X,β),y))

mse (generic function with 1 method)

In [172]:
mse_i(β) = (i,) -> sum(value(loss,ŷ(X[i,:]',β),y[i]))

mse_i (generic function with 1 method)

In [173]:
mse_i(β)(1)

1.5906124836029452

## Checking the gradients from analytical formula and automatic differentiation

In [175]:
gradient(mse,β)[1]

4-element Vector{Float64}:
 18.553342921581297
 -5.438873520719471
  9.32915138412945
 24.629807735784837

In [193]:
-2*X'*y .+ 2*X'*ŷ(X,β)

4-element Vector{Float64}:
 18.5533429215813
 -5.438873520719472
  9.32915138412945
 24.629807735784834

----

## Analytical Gradient at each data point 

In [201]:
g_(i) = 2 * X[i,:]'ŷ(X[i,:]',β) .-2*X[i,:]'*y[i]

g_ (generic function with 1 method)

In [213]:
n = 10

10

In [214]:
G = Matrix{Float64}(undef,n,m);
for i in 1:n
    G[i,:] =g_(i)
end

In [215]:
G

10×4 Matrix{Float64}:
  2.52239    2.44022     2.46958     4.47596
  1.43215   -1.71053     1.54686     0.956537
  2.80464   -4.24162     4.27605     4.98968
  1.53231    0.0795567   1.86146     1.96624
  4.08799    1.60738    -3.4319      5.22688
  3.79889   -6.12139     6.55954     7.7528
 -0.20828    0.217228   -0.133777    0.067461
 -0.201799   0.063093    0.0498015   0.175627
 -0.231596   0.373455    0.330138    0.31428
  3.01665    1.85374    -4.19862    -1.29566

## Automatic Differentiation at each data point 

In [218]:
G2 = Matrix{Float64}(undef,n,m);
for i in 1:n
    G2[i,:] = gradient(x->mse_i(x)(i),β)[1]
end

In [219]:
G2

10×4 Matrix{Float64}:
  2.52239    2.44022     2.46958     4.47596
  1.43215   -1.71053     1.54686     0.956537
  2.80464   -4.24162     4.27605     4.98968
  1.53231    0.0795567   1.86146     1.96624
  4.08799    1.60738    -3.4319      5.22688
  3.79889   -6.12139     6.55954     7.7528
 -0.20828    0.217228   -0.133777    0.067461
 -0.201799   0.063093    0.0498015   0.175627
 -0.231596   0.373455    0.330138    0.31428
  3.01665    1.85374    -4.19862    -1.29566