# Testing auto-differentiation of 'predict' method with Zygote.jl

### Model training

In [1]:
using Flux
using Flux: gradient
using LaplaceRedux
using LinearAlgebra
using Plots
using Statistics
using Zygote

In [2]:
xs, ys = LaplaceRedux.Data.toy_data_non_linear(200)
X = hcat(xs...) # bring into tabular format
data = zip(xs,ys)

zip([[1.9764911120630078, 3.3717046727909303], [1.5332334806460381, 2.2662470458201316], [4.725243619579621, 4.204196347650017], [2.0083142882646943, 2.296103599511355], [1.958251319885264, 1.4851211060165728], [0.5619834594606763, 4.134177719906611], [2.4360879530273456, 2.3849194625184746], [1.8028451988923122, 4.147546828141051], [1.8745491789968307, 1.617996622434847], [3.930201611931094, 2.316269660069935]  …  [-4.6311543853533435, 0.6343874555132953], [-4.274038323559923, 4.59304836638103], [-3.771188128971133, 4.662353946596249], [-3.7280445187317484, 0.9794305041900631], [-1.4746184224882306, 0.9617826016649691], [-4.46504078899859, 4.683960261166934], [-3.3131271526472847, 2.23542332925627], [-4.74821639591856, 3.6909095250509742], [-2.958593637582241, 1.624040141184154], [-1.880016563903081, 0.7108182904283715]], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])

In [3]:
n_hidden = 10
D = size(X,1)
nn = Chain(
    Dense(D, n_hidden, σ),
    Dense(n_hidden, 1)
)  
loss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) 

loss (generic function with 1 method)

In [4]:
using Flux.Optimise: update!, Adam
opt = Adam(1e-3)
epochs = 100
avg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))
show_every = epochs/10

for epoch = 1:epochs
  for d in data
    gs = gradient(Flux.params(nn)) do
      l = loss(d...)
    end
    update!(opt, Flux.params(nn), gs)
  end
  if epoch % show_every == 0
    println("Epoch " * string(epoch))
    @show avg_loss(data)
  end
end

│   The input will be converted, but any earlier layers may be very slow.
│   layer = Dense(2 => 10, σ)
│   summary(x) = 2-element Vector{Float64}
└ @ Flux C:\Users\marka\.julia\packages\Flux\EHgZm\src\layers\stateless.jl:60


Epoch 10


avg_loss(data) = 0.6340207658335566
Epoch 20
avg_loss(data) = 0.5270081588625908


Epoch 30
avg_loss(data) = 0.40069789480417967
Epoch 40
avg_loss(data) = 0.2957423364371061


Epoch 50
avg_loss(data) = 0.22082882829010486
Epoch 60
avg_loss(data) = 0.1701394403912127


Epoch 70
avg_loss(data) = 0.1354304112866521
Epoch 80
avg_loss(data) = 0.1106710666231811


Epoch 90
avg_loss(data) = 0.09226567465811968
Epoch 100
avg_loss(data) = 0.0781312698451802


In [5]:
la = Laplace(nn; likelihood=:classification, subset_of_weights=:all)
fit!(la, data)

200

### Auto-differentiation testing

In [6]:
# Define the function to be differentiated:
f(x) = predict(la, x)
# Differentiate it
J = jacobian(f, X)
println(J)

ErrorException: Mutating arrays is not supported -- called copyto!(SubArray{Float32, 1, Matrix{Float32}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations


Through testing it has determined that the issue arises in nested use of Zygote, the use of `jacobian` in `jacobians` causes the issue. Jacobian moditifies the input arguments and therefore cannot be auto differentiated by Zygote. The [issue](https://github.com/FluxML/Zygote.jl/issues/953) of nested use of Zygote is still open.

Cloning the inpute does not solve the issue, since the Zygote operates on the principle of reducing the function to the rules it can differentiate.

Potential solutions:
- Use of ForwardDiff.jl or Tracker.jl instead of Zygote.jl