New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Box{AbstractFloat}()
calling convention
#368
Comments
Could you re-run this on a Linux machine? We get better stacktraces there. Ideally this would also be more minimized, and please provide a Manifest.toml |
On KDE neon (basically Ubuntu LTS), julia 1.7.3:
|
The code as is doesn't reproduce locally due to an undefined variable. Moreover, can you reduce this to just be a case of calling enzyme.autodiff calling a crash rather than having to discover it from the wrappers?
|
Typo fixed. It should run now. It would take a few weeks to make the reproducer on my end, but it should just be calling autodiff on the dxdt_pred function. |
Reduced to remove all of sciml. From a minimal test, I believe this error is due to the type instable using Enzyme
u0 = [1.0,0.0]
abstract type TensorProductBasis <: Function end
struct LegendreBasis <: TensorProductBasis
n::Int
end
function legendre_poly(x, p::Integer)
a::typeof(x) = one(x)
b::typeof(x) = x
if p <= 0
return a
elseif p == 1
return b
end
for j in 2:p
a, b = b, ((2j-1)*x*b - (j-1)*a) / j
end
b
end
function (basis::LegendreBasis)(x)
f = k -> legendre_poly(x,k-1)
return map(f, 1:basis.n)
end
function TL(model, x, p)
out = 1
W = reshape(p, out, Int(length(p)/out))
tensor_prod = model[1](x[1])
for i in 2:length(model)
tensor_prod = kron(tensor_prod,model[i](x[i]))
end
z = W*tensor_prod
return z
end
struct MyTensorLayer{P<:AbstractArray}
model::Array{TensorProductBasis}
p::P
in::Int
out::Int
function MyTensorLayer(model,out,p=nothing)
number_of_weights = 1
for basis in model
number_of_weights *= basis.n
end
if p === nothing
p = randn(out*number_of_weights)
end
new{typeof(p)}(model,p,length(model),out)
end
end
function fn(layer::MyTensorLayer, x,p)
model,out = layer.model,layer.out
W = reshape(p, out, Int(length(p)/out))
tensor_prod = model[1](x[1])
for i in 2:length(model)
tensor_prod = kron(tensor_prod,model[i](x[i]))
end
z = W*tensor_prod
return z
end
const A = [LegendreBasis(10), LegendreBasis(10)]
const nn = MyTensorLayer(A, 1)
function dxdt_pred(u,p)
fn(nn, u,p[3:end])[1]
end
th = zeros(102)
dp = zero(th)
du0 = zero(u0)
Enzyme.API.printall!(true)
dxdt_pred(u0, th)
Enzyme.autodiff(dxdt_pred, Active{Float64}, Duplicated(u0, du0), Duplicated(th, dp)) |
using Enzyme
abstract type TensorProductBasis <: Function end
struct LegendreBasis <: TensorProductBasis
n::Int
end
function (basis::LegendreBasis)(x)
return x
end
struct MyTensorLayer
model::Array{TensorProductBasis}
end
function fn(layer::MyTensorLayer, x)
model = layer.model
return model[1](x)
end
const nn = MyTensorLayer([LegendreBasis(10)])
function dxdt_pred(x)
return fn(nn, x)
end
Enzyme.API.printall!(true)
dxdt_pred(1.0)
Enzyme.autodiff(dxdt_pred, Active(1.0))
|
There might be some dynamicness, yes. The main issue here isn't that it doesn't work (I didn't expect it to) it's mostly that it segfaulted Julia. In the real use case this was done in a try catch to check Enzyme compatibility, but the way the error is currently given makes it not recoverable. |
Having not yet fully reduced (nor yet investigated), my guess is what's happening is that the dynamism creates the use of split mode internally -- which triggers a bad interaction with the garbage collector. If that is indeed the case (still need to investigate to be sure), in order for you to check in advance you need either:
|
Never mind, seems to be something in the calling convention. |
Ok |
Both also hit this case were they hit a null-pointer, but our tests indicated it was correct. |
Box{Int}()
calling convention
Box{Int}()
calling conventionBox{AbstractFloat}()
calling convention
Box is dead, long live sret |
MWE:
It might be due to dynamism, but it would be good if this could throw instead of segfaulting so we can catch it (we have this in a
try/catch
.The text was updated successfully, but these errors were encountered: