Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gradient (and gradient!) fails on basic example #1340

Closed
biona001 opened this issue Mar 10, 2024 · 3 comments
Closed

gradient (and gradient!) fails on basic example #1340

biona001 opened this issue Mar 10, 2024 · 3 comments

Comments

@biona001
Copy link

biona001 commented Mar 10, 2024

In the following least squares objective, it does not seems like calling gradient! (or gradient) is doing anything?

MWE:

using Enzyme
using LinearAlgebra

# objective = 0.5 || y - X*beta ||^2
function ols(y, X, beta, storage=zeros(size(X, 1)))
    mul!(storage, X, beta)
    storage .= y .- storage
    return 0.5 * sum(abs2, storage)
end
ols(beta::AbstractVector) = ols(y, X, beta, storage)

# simulate data
n = 100
p = 50
X = randn(n, p)
y = randn(n)
beta = randn(p)
storage = zeros(n)
ols(y, X, beta, storage)

# autodiff grad
grad_storage = zeros(length(beta))
Enzyme.gradient!(Reverse, grad_storage, ols, beta)

# analytical grad
true_grad = X' * (y - X*beta);

Compare answers:

julia> [true_grad grad_storage]
50×2 Matrix{Float64}:
  -14.636    0.0
  -15.0858   0.0
 -184.264    0.0
    8.70142  0.0
  -71.4575   0.0
 -165.231    0.0
  -93.2602   0.0
  -49.404    0.0
  -62.5627   0.0
    ⋮        
  136.334    0.0
   81.2353   0.0
  152.26     0.0
 -198.541    0.0
   41.5824   0.0
  -13.3246   0.0
  319.853    0.0
 -180.376    0.0

This is on Julia v1.9.1 with Enzyme v0.11.19

@biona001
Copy link
Author

After reading Zygote.jl documentation I realized array mutation is hard for AD packages. Thus, I changed the objective slightly, and Enzyme.jl worked.

using Enzyme
using LinearAlgebra

# objective = 0.5 || y - X*beta ||^2
function ols(y, X, beta)
    storage = X * beta # avoid mul!
    obj = zero(eltype(beta))
    for i in eachindex(y)
        obj += abs2(y[i] - storage[i])
    end
    return 0.5obj
end
ols(beta::AbstractVector) = ols(y, X, beta)

# simulate data
n = 100
p = 50
X = randn(n, p)
y = randn(n)
beta = randn(p)
ols(y, X, beta)

# autodiff grad
grad1 = zeros(length(beta))
Enzyme.gradient!(Reverse, grad1, ols, beta) # method 1
grad2 = Enzyme.gradient(Reverse, ols, beta) # method 2
grad3 = zeros(length(beta))
Enzyme.autodiff(Reverse, ols, Active, Duplicated(beta, grad3)) # method 3

# analytical grad
true_grad = -X' * (y - X*beta)

# compare answers
[true_grad grad1 grad2 grad3]
50×4 Matrix{Float64}:
  223.421      223.421      223.421      223.421
   56.115       56.115       56.115       56.115
  162.285      162.285      162.285      162.285
  -58.6251     -58.6251     -58.6251     -58.6251
  109.732      109.732      109.732      109.732
 -152.867     -152.867     -152.867     -152.867
   51.7608      51.7608      51.7608      51.7608
  -64.8512     -64.8512     -64.8512     -64.8512
   38.8877      38.8877      38.8877      38.8877
  -62.3895     -62.3895     -62.3895     -62.3895
   92.2303      92.2303      92.2303      92.2303
  208.785      208.785      208.785      208.785
   47.6175      47.6175      47.6175      47.6175
                                       
  -26.5664     -26.5664     -26.5664     -26.5664
   65.8905      65.8905      65.8905      65.8905
  279.925      279.925      279.925      279.925
  -18.8817     -18.8817     -18.8817     -18.8817
 -169.649     -169.649     -169.649     -169.649
 -134.628     -134.628     -134.628     -134.628
  273.975      273.975      273.975      273.975
    0.220214     0.220214     0.220214     0.220214
 -130.344     -130.344     -130.344     -130.344
   -3.59516     -3.59516     -3.59516     -3.59516
  204.023      204.023      204.023      204.023
 -268.678     -268.678     -268.678     -268.678

But of course I would prefer mul!(storage, X, beta) over storage = X * beta. I'm not sure whether Enzyme.jl supports this operation.

@bgroenks96
Copy link

I'm not an Enzyme expert or anything, but this case is discussed pretty clearly in the doucmentation.

You need to explicitly tell Enzyme about your storage array, e.g:

Enzyme.autodiff(Reverse, ols, Active, Const(y), Const(X), Duplicated(beta, grad_storage), Duplicated(storage, zero(storage)))

will produce the correct gradient:

maximum(abs.(true_grad .- grad_storage))
# output: 0.0

It might also be possible to pre-wrap the variables with Duplicated in the closure... but I'm not sure.

@biona001
Copy link
Author

@bgroenks96 thank you very much. I actually saw that example, but its meaning somehow did not register with me when I first saw it. Closing this now.

using Enzyme
using LinearAlgebra

# objective = 0.5 || y - X*beta ||^2
function ols(y, X, beta, storage=zeros(size(X, 1)))
    mul!(storage, X, beta)
    storage .= y .- storage
    return 0.5 * sum(abs2, storage)
end
ols(beta::AbstractVector) = ols(y, X, beta, storage)

# simulate data
n = 100
p = 50
X = randn(n, p)
y = randn(n)
beta = randn(p)
storage = zeros(n)
ols(y, X, beta, storage)

# analytical grad
true_grad = -X' * (y - X*beta)

# Enzyme grad with inplace mul!
grad_storage = zeros(length(beta))
Enzyme.autodiff(Reverse, ols, Active, Const(y), Const(X), Duplicated(beta, grad_storage), Duplicated(storage, zero(storage)))

julia> [true_grad grad_storage]
50×2 Matrix{Float64}:
  191.052      191.052
  117.039      117.039
  -46.2872     -46.2872
  -59.5048     -59.5048
 -162.259     -162.259
 -241.862     -241.862
 -118.645     -118.645
    8.81539      8.81539
   32.7351      32.7351
   -4.78171     -4.78171
    ⋮         
  219.782      219.782
  137.305      137.305
    0.131725     0.131725
  -38.0468     -38.0468
 -107.653     -107.653
  141.702      141.702
  138.302      138.302
  107.221      107.221
   59.7225      59.7225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants