# Intro
Testing my AD workflow on A Rosenbrock Function. 
$$\int_0^1$$
Define rosen and simple manual AD code from rosen.

In [43]:
import Pkg; Pkg.add("CUTEst")
using CUTEst

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\Struther\.julia\environments\v1.6\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\Struther\.julia\environments\v1.6\Manifest.toml`


In [3]:
function rosen(x)
    fVal = 0.0
    for i in 1:length(x)-1
        fVal += 100.0*(x[i+1] - x[i]^2)^2 + (1.0-x[i])^2
    end
    return fVal
end

function rosend(x, dx)
    fVal = df = 0.0; 
    for i in 1:length(x)-1
       (fVal  += 100.0*(x[i+1] - x[i]^2)^2 + (1.0-x[i])^2;
        df    += 200.0*(x[i+1] - x[i]^2)*(dx[i+1] -2.0*x[i]*dx[i] ) - 2.0*(1.0 - x[i])*dx[i])
    end
    return (fVal,df)
end

function rosendd(x, dx1, dx2)
    fVal = df = ddf = 0.0;
    for i in 1:length(x)-1
       (fVal  += 100.0*(x[i+1] - x[i]^2)^2 + (1.0-x[i])^2;
        df    += 200.0*(x[i+1] - x[i]^2)*(dx1[i+1] -2.0*x[i]*dx1[i] ) - 2.0*(1.0 - x[i])*dx1[i];
        ddf   += 
            200.0*(dx2[i+1] - 2.0*x[i]*dx2[i])*(dx1[i+1] -2.0*x[i]*dx1[i]) 
          + 200.0*(x[i+1] - x[i]^2)*(0.0 -2.0*dx2[i]*dx1[i])  + 2.0*dx2[i]*dx1[i] )
    end
    return (fVal,df,ddf)
end;

Check Hand AD against Forward Diff

In [4]:
using ForwardDiff, SparseDiffTools, LinearAlgebra
g = x -> ForwardDiff.gradient(rosen, x)
H = x -> ForwardDiff.hessian(rosen, x)
n=12;
x = rand(n)
dx1 = rand(n)
dx2 = rand(n)
(fVal,df,ddf) = rosendd(x, dx1, dx2)
norm(map(norm,[df-g(x)'*dx1, ddf - dx2'*H(x)*dx1]))

4.0194366942304644e-14

Using ForwardDiff etc in a couple of ways to compute the complete gradient and Hessian. These are all functions of x.
g contucts a vector.  H contrsucts a matrix.  and J constructs a matrix-free operator for the Hessain that can be used just like a matrix.  All the Sparse tools can detect sparsity patterns and preallocate cache. 

In [6]:
using ForwardDiff, SparseDiffTools
@time g = x -> ForwardDiff.gradient(rosen, x)
@time H = x -> ForwardDiff.hessian(rosen, x)

  0.000019 seconds (25 allocations: 1.602 KiB)
  0.000014 seconds (25 allocations: 1.602 KiB)


#19 (generic function with 1 method)

Testing the implementation using the underlying Dual structure.  It works for the first derivative with multiple directions. 

In [2]:
    ForwardDiff.Dual{1}

ForwardDiff.Dual{1, V, N} where {V, N}

In [7]:
using ForwardDiff: Dual, Partials, value, partials, gradient
DualTag1=1;
n=4;
x= rand(n); v1= rand(n); v2= rand(n);
xdv = Dual{DualTag1}.(x, v1, v2)

4-element Vector{Dual{1, Float64, 2}}:
 Dual{1}(0.5578534170156384,0.7546176671878859,0.6938672002974051)
 Dual{1}(0.16636054712892157,0.9061424012046599,0.5367495896712386)
 Dual{1}(0.45685682941551375,0.5651249560319593,0.615826918163086)
 Dual{1}(0.16132800640113865,0.3634499572348875,0.7202403333924938)

In [8]:
fd = rosen(xdv);
(fd.value-rosen(x), fd.partials, (g(x)'*v1, g(x)'*v2))

(0.0, Partials(19.426474733632293, 40.73728361100226), (19.426474733632293, 40.73728361100227))

In [9]:
typeof( fd.partials)

Partials{2, Float64}

In [18]:
[x , v1, v2]

3-element Vector{Vector{Float64}}:
 [0.5578534170156384, 0.16636054712892157, 0.45685682941551375, 0.16132800640113865]
 [0.7546176671878859, 0.9061424012046599, 0.5651249560319593, 0.3634499572348875]
 [0.6938672002974051, 0.5367495896712386, 0.615826918163086, 0.7202403333924938]

In [16]:
[x  v1 v2]

4×3 Matrix{Float64}:
 0.557853  0.754618  0.693867
 0.166361  0.906142  0.53675
 0.456857  0.565125  0.615827
 0.161328  0.36345   0.72024

In [13]:
[x; v1; v2]

12-element Vector{Float64}:
 0.5578534170156384
 0.16636054712892157
 0.45685682941551375
 0.16132800640113865
 0.7546176671878859
 0.9061424012046599
 0.5651249560319593
 0.3634499572348875
 0.6938672002974051
 0.5367495896712386
 0.615826918163086
 0.7202403333924938

In [11]:
xdv

4-element Vector{Dual{1, Float64, 2}}:
 Dual{1}(0.5578534170156384,0.7546176671878859,0.6938672002974051)
 Dual{1}(0.16636054712892157,0.9061424012046599,0.5367495896712386)
 Dual{1}(0.45685682941551375,0.5651249560319593,0.615826918163086)
 Dual{1}(0.16132800640113865,0.3634499572348875,0.7202403333924938)

Lest try a single second derivative. 

In [14]:
using ForwardDiff: Dual, Partials, value, partials, gradient
n=6;
x= rand(n); v1= rand(n); v2= rand(n); v3 = rand(n);
u1= rand(n); u2= rand(n); 
xdv = Dual{2}.(Dual{1}.(x, v1, v2, v3),u1,u2)
fd=rosen(xdv)
(fd.value.value -rosen(x) )

0.0

The partials field contains the gradients dotted with the us and the Hessian entries as partials.  This is confusing but resolvable

In [15]:
using LinearAlgebra
fdp = fd.partials
(norm([fdp[1].value,fdp[2].value]-[g(x)'*u1,g(x)'*u2]),
)

(1.4210854715202004e-14,)

Somehow the partial tag dissapears when you make an array! 

In [16]:
[fdp[1].partials  fdp[2].partials]

3×2 Matrix{Float64}:
  173.942     93.1415
   47.7124    69.5203
 -130.386   -121.599

In [17]:
using LinearAlgebra
fdp = fd.partials
norm([fdp[1].partials  fdp[2].partials]-[
    u1'*H(x)*v1 u2'*H(x)*v1;
    u1'*H(x)*v2 u2'*H(x)*v2;
    u1'*H(x)*v3 u2'*H(x)*v3
])

7.78360568894479e-14

In [33]:
using ForwardDiff: Dual, Partials, value, partials, gradient
n=465;
x= rand(n); v1= rand(n); v2= rand(n); v3 = rand(n); v4 = rand(n);
u1= rand(n); u2= rand(n); 
@time xdv = Dual{2}.(Dual{1}.(x, v1, v2, v3, v4),u1,u2)
@time fd = rosen(xdv)

  0.000453 seconds (64 allocations: 58.312 KiB)
  0.000013 seconds (1 allocation: 128 bytes)


Dual{2}(Dual{1}(9777.79462146386,7254.3724449877745,7631.306923585963,8558.02922094281,6987.698020081267),Dual{1}(6509.497911330822,-84.37677080974893,-1118.094415212752,-593.8841566220807,1149.925659777402),Dual{1}(8278.866783019565,777.5752239743123,-252.0972441865449,3691.9344578566947,-649.8305653868))

Not sure if is for a good reason or not but the double dual array is padded to be sorta square! 

In [35]:
using ForwardDiff: Dual, Partials, value, partials, gradient
n=3;
x= rand(n); v1= rand(n); v2= rand(n);
u1 = rand(n); 
@time xdv = Dual{13}.(Dual{47}.(x, v1, v2), u1)

  0.000036 seconds (5 allocations: 416 bytes)


3-element Vector{Dual{13, Dual{47, Float64, 2}, 1}}:
 Dual{13}(Dual{47}(0.37090676720479165,0.8292028429908826,0.7689828971624058),Dual{47}(0.44010535548695673,0.0,0.0))
 Dual{13}(Dual{47}(0.5272239549338538,0.017399310337576912,0.4192021550008953),Dual{47}(0.5596254923765875,0.0,0.0))
 Dual{13}(Dual{47}(0.40437868656116893,0.6479937211448588,0.32023258473282734),Dual{47}(0.6501770808386358,0.0,0.0))

In [36]:
v1

3-element Vector{Float64}:
 0.8292028429908826
 0.017399310337576912
 0.6479937211448588

In [38]:
@time fd = rosen(xdv)

  0.000004 seconds (1 allocation: 64 bytes)


Dual{13}(Dual{47}(17.40019285483623,-31.720706439642186,-16.229370237327856),Dual{47}(18.605555607166945,-76.92771703949657,-71.98073114361759))

Testing a bit. 

In [39]:
using LinearAlgebra, SparseDiffTools
n=123
x = rand(n)
dx= rand(n)
Hdx=similar(x)
J1dx=similar(x)
@time JMat1 = J1(x)
@time HMat=H(x)
@time mul!(Hdx,HMat,dx) # pre-assigned output matrix multiplication gives about 10x speed up
@time J1dx = JMat1*dx   
# @time mul!(J1dx,JMat1,dx) # J1 does not work with pre-assigned output. 
norm(Hdx - J1dx)

LoadError: UndefVarError: J1 not defined

Trying to use the tools in Sparse Diff Tools.  First a whole bunch of variants

In [40]:
using SparseDiffTools, LinearAlgebra
n=123
x=rand(n)
dx=rand(n)
@time J2dx = H(x)*dx
@time J3dx = auto_jacvec(g,x,dx)
@time J4dx = autonum_hesvec(rosen,x,dx)
@time J5dx = numauto_hesvec(rosen,x,dx)
@time J6dx = num_hesvec(rosen,x,dx)
@time J7dx = num_hesvecgrad(g,x,dx)
@time J8dx = auto_hesvecgrad(g,x,dx)
(norm(J2dx-J3dx),norm(J2dx-J4dx),norm(J2dx-J5dx),norm(J2dx-J6dx),
norm(J2dx-J2dx),norm(J2dx-J7dx) )./norm(J2dx)

  0.410434 seconds (1.27 M allocations: 67.210 MiB, 3.90% gc time, 98.93% compilation time)
  0.932944 seconds (2.87 M allocations: 164.002 MiB, 5.58% gc time, 99.93% compilation time)
  0.428926 seconds (1.60 M allocations: 96.965 MiB, 6.12% gc time, 99.95% compilation time)
  0.312750 seconds (799.15 k allocations: 46.177 MiB, 5.59% gc time, 99.92% compilation time)
  0.282994 seconds (947.54 k allocations: 59.350 MiB, 10.66% gc time, 99.91% compilation time)
  0.009052 seconds (14.37 k allocations: 1003.525 KiB, 99.06% compilation time)
  0.066195 seconds (132.43 k allocations: 7.504 MiB, 99.61% compilation time)


(1.5812629337405462e-16, 8.625070419843983e-11, 7.160759670891936e-10, 0.0008724719083415524, 0.0, 7.160759670891936e-10)

Trying to do the appropriate caching on the more efficent ones. 

In [None]:
using SparseDiffTools, LinearAlgebra, ForwardDiff
n=323
x=rand(n)
dx=rand(n)
H0dx = H(x)*dx;
HdxCache = similar(x)
Hdx = similar(x)
cache1 = similar(dx)    
cache2 = similar(dx)
cache3 = similar(dx)
println("num hesvec -cache vs no cache ")
@time num_hesvec!(HNumCdx, rosen, x, dx,
    cache1,cache2, cache3)
@time num_hesvec!(HNumNCdx, rosen,x,dx)
(norm(H0dx-HNumCdx), norm(H0dx-HNumNCdx),norm(HNumCdx-HNumNCdx))./norm(H0dx)
#
println("autonum hesvec -cache vs no cache ")
cache = ForwardDiff.GradientConfig(rosen,dx)
HAutoCdx = similar(dx)
HAutoNCdx = similar(dx)
@time numauto_hesvec!(HAutoCdx,rosen,x,dx,
                 cache, cache2, cache3)
@time numauto_hesvec!(HAutoNCdx, rosen,x,dx)
(norm(H0dx-HAutoCdx), norm(H0dx-HAutoNCdx),norm(HAutoCdx-HAutoNCdx))./norm(H0dx)
#

using ForwardDiff: Partials, Dual
println("autonum hesvec -cache vs no cache ")
HAutoCdx = similar(dx)
HAutoNCdx = similar(dx)
DeivVecTag = Vector{Float64}
cache1 = similar(dx)
cache2 = ForwardDiff.Dual{DeivVecTag}.(x, dx)
cache3 = ForwardDiff.Dual{DeivVecTag}.(x, dx)
#@time autonum_hesvec!(HAutoCdx,rosen,x,dx,   cache1, cache2, cache3)
@time autonum_hesvec!(HAutoNCdx, rosen,x,dx)
(norm(H0dx-HAutoCdx), norm(H0dx-HAutoNCdx),norm(HAutoCdx-HAutoNCdx))./norm(H0dx)

In [None]:
@time J2dx = H(x)*dx
@time J3dx = auto_jacvec(g,x,dx)
@time J5dx = numauto_hesvec(rosen,x,dx)
@time J7dx = num_hesvecgrad(g,x,dx)
@time J8dx = auto_hesvecgrad(g,x,dx)
(norm(J2dx-J3dx),norm(J2dx-J5dx),
norm(J2dx-J2dx),norm(J2dx-J7dx) )./norm(J2dx)

In [None]:
num_hesvec!(dy,f,x,dx,
             cache1 = similar(v),
             cache2 = similar(v),
             cache3 = similar(v))

num_hesvec(f,x,v)

numauto_hesvec!(dy,f,x,v,
                 cache = ForwardDiff.GradientConfig(f,v),
                 cache1 = similar(v),
                 cache2 = similar(v))

numauto_hesvec(f,x,v)

autonum_hesvec!(dy,f,x,v,
                 cache1 = similar(v),
                 cache2 = ForwardDiff.Dual{DeivVecTag}.(x, v),
                 cache3 = ForwardDiff.Dual{DeivVecTag}.(x, v))
