Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing on functions involving array operations #24

Closed
NMUrban opened this issue Oct 5, 2018 · 3 comments
Closed

Crashing on functions involving array operations #24

NMUrban opened this issue Oct 5, 2018 · 3 comments

Comments

@NMUrban
Copy link

NMUrban commented Oct 5, 2018

This is probably due to a known unimplemented feature, but I'm having problems getting gradients with respect to functions involving arrays. Sometimes f'(p) works but derivative(f, p) does not. I want to use the latter because I need gradients with respect to parameter vectors, not scalars as in these minimal examples.

EXAMPLE 1:

function f1(p)
    x = [1.0, 1.0]
    x[2] = p * x[1]
    return x[2]
end
julia> f1'(1.5)
0.9999999999991082

julia> derivative(f1, 1.5)
ERROR: $(Expr(:boundscheck))
Stacktrace:
 [1] error(::Expr) at ./error.jl:42
 [2] exprtype(::Core.Compiler.IRCode, ::Expr) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/tools/ir.jl:54
 [3] _broadcast_getindex_evalf at ./broadcast.jl:574 [inlined]
 [4] _broadcast_getindex at ./broadcast.jl:547 [inlined]
 [5] getindex at ./broadcast.jl:507 [inlined]
 [6] copyto_nonleaf!(::Array{DataType,1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(Zygote.exprtype),Tuple{Base.RefValue{Core.Compiler.IRCode},Base.Broadcast.Extruded{Array{Any,1},Tuple{Bool},Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) at ./broadcast.jl:923
 [7] copy(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(Zygote.exprtype),Tuple{Base.RefValue{Core.Compiler.IRCode},Array{Any,1}}}) at ./broadcast.jl:786
 [8] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(Zygote.exprtype),Tuple{Base.RefValue{Core.Compiler.IRCode},Array{Any,1}}}) at ./broadcast.jl:748
 [9] record!(::Core.Compiler.IRCode) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/reverse.jl:142
 [10] #grad_ir#50(::Nothing, ::Function, ::Core.Compiler.IRCode) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/reverse.jl:319
 [11] (::getfield(Zygote, Symbol("#kw##grad_ir")))(::NamedTuple{(:varargs,),Tuple{Nothing}}, ::typeof(Zygote.grad_ir), ::Core.Compiler.IRCode) at ./none:0
 [12] _lookup_grad(::Type) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/emit.jl:118
 [13] #s19#533 at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/interface2.jl:18 [inlined]
 [14] #s19#533(::Any, ::Any, ::Any) at ./none:0
 [15] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:506
 [16] f1 at ./REPL[9]:3 [inlined]
 [17] (::Zygote.J{Tuple{typeof(f1),Float64},Any})(::Int64) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/interface2.jl:0
 [18] (::getfield(Zygote, Symbol("##51#52")){Zygote.J{Tuple{typeof(f1),Float64},Any}})(::Int64) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/interface.jl:28
 [19] gradient(::Function, ::Float64) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/interface.jl:34
 [20] derivative(::Function, ::Float64) at /Users/nurban/.julia/packages/Zygote/g8WMA/src/compiler/interface.jl:37
 [21] top-level scope at none:0

The next example with a 'for' loop actually segfaults.

EXAMPLE 2:

function f2(p)
    x = [1.0, 1.0]
    for i in 1:1
        x[i+1] = p * x[i]
    end
    return x[2]
end
julia> f2'(1.5)
0.9999999999991082

julia> derivative(f2, 1.5)

signal (11): Segmentation fault: 11
in expression starting at no file:0

Clearly f'(p) and derivative(f, p) work differently, although I would have expected the former to be syntactic sugar for the latter. This is true even for simple working examples: f(x) = x^2 gives f'(1.5) == 2.9999999999973244 while derivative(f, 1.5) == 3.0.

Also, these problems seem to have to do with array operations, because it works if I write out the array operations by hand as variables:

function g2(p)
    x1 = 1.0
    x2 = 1.0
    x2 = p * x1
    return x2
end
julia> derivative(g2, 1.5)
1.0

Am I misunderstanding how derivative() is to be used?

@MikeInnes
Copy link
Member

No misunderstanding, these are definitely bugs on Zygote's end, at least insofar as they should give more helpful errors.

Your first example does, now:

julia> f1'(1)
ERROR: Mutating arrays is not supported

You'll have to write that differently until we add support for mutation.

The reason the loop example fails in a worse way is that although it has the same behaviour, in nonetheless emits more complex code (containing a loop), and Zygote has to deal with that, because it runs before any other optimisations are applied.

@MikeInnes
Copy link
Member

Just a heads up, this example now works on Zygote#mutate.

julia> f1'(1)
1.0

@MikeInnes
Copy link
Member

Closing for now as I think this is giving the right error (we can track mutation support in #61).

Keno added a commit that referenced this issue Feb 24, 2019
Right now Zygote inserts stacks whenever it needs to use an ssa value
not defined in the first basic block. This is of course unnecessary.
The condition for needing stacks is that the basic block that defines
it is self-reachable (i.e. in a loop). Otherwise, we can simply insert
phi nodes to thread the desired SSA value through to the exit block
(we don't need to do anything in the adjoint, since the reversal of
the CFG ensures dominance). Removing stacks allows for both more
efficient code generation and enables higher order auto-diff (since
we use control flow in Zygote, but can't handle differentiating code
that contains stacks). The headline example is something like the following:

```
function foo(b, x)
    if b
        sin(x)
    else
        cos(x)
    end
end
```

Then looking at `@code_typed derivative(x->foo(true, x), 1.0)`, we get:

Before:
```
CodeInfo(
1 ── %1  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int8,1}, svec(Any, Int64), :(:ccall), 2, Array{Int8,1}, 0, 0))::Array{Int8,1}
│    %2  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), :(:ccall), 2, Array{Any,1}, 0, 0))::Array{Any,1}
│    %3  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), :(:ccall), 2, Array{Any,1}, 0, 0))::Array{Any,1}
│    %4  = Base.sin::typeof(sin)
│          invoke %4(_3::Float64)::Float64
│    %6  = %new(##334#335{Float64}, x)::##334#335{Float64}
│    %7  = %new(##758#back#336{##334#335{Float64}}, %6)::##758#back#336{##334#335{Float64}}
[snip]
23 ─ %52 = invoke %47(1::Int8)::Tuple{Nothing,Nothing,Any}
│    %53 = Base.getfield(%52, 3, true)::Any
└───       goto #24
24 ─       return %53
) => Any
```

After:
```
CodeInfo(
1 ─ %1 = Base.sin::typeof(sin)
│        invoke %1(_3::Float64)::Float64
│   %3 = Core.Intrinsics.not_int(true)::Bool
└──      goto #3 if not %3
2 ─      invoke Zygote.notnothing(nothing::Nothing)::Union{}
└──      $(Expr(:unreachable))::Union{}
3 ┄ %7 = invoke Zygote.cos(_3::Float64)::Float64
│   %8 = Base.mul_float(1.0, %7)::Float64
└──      goto #4
4 ─      goto #5
5 ─      goto #6
6 ─      goto #7
7 ─      return %8
) => Float64
```

Which is essentially perfect (there's a bit of junk left over, but LLVM
can take care of that. The only thing that doesn't get removed is the
useless invocation of `sin`, but that's a separate and known issue).
Keno added a commit that referenced this issue Mar 6, 2019
Right now Zygote inserts stacks whenever it needs to use an ssa value
not defined in the first basic block. This is of course unnecessary.
The condition for needing stacks is that the basic block that defines
it is self-reachable (i.e. in a loop). Otherwise, we can simply insert
phi nodes to thread the desired SSA value through to the exit block
(we don't need to do anything in the adjoint, since the reversal of
the CFG ensures dominance). Removing stacks allows for both more
efficient code generation and enables higher order auto-diff (since
we use control flow in Zygote, but can't handle differentiating code
that contains stacks). The headline example is something like the following:

```
function foo(b, x)
    if b
        sin(x)
    else
        cos(x)
    end
end
```

Then looking at `@code_typed derivative(x->foo(true, x), 1.0)`, we get:

Before:
```
CodeInfo(
1 ── %1  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int8,1}, svec(Any, Int64), :(:ccall), 2, Array{Int8,1}, 0, 0))::Array{Int8,1}
│    %2  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), :(:ccall), 2, Array{Any,1}, 0, 0))::Array{Any,1}
│    %3  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), :(:ccall), 2, Array{Any,1}, 0, 0))::Array{Any,1}
│    %4  = Base.sin::typeof(sin)
│          invoke %4(_3::Float64)::Float64
│    %6  = %new(##334#335{Float64}, x)::##334#335{Float64}
│    %7  = %new(##758#back#336{##334#335{Float64}}, %6)::##758#back#336{##334#335{Float64}}
[snip]
23 ─ %52 = invoke %47(1::Int8)::Tuple{Nothing,Nothing,Any}
│    %53 = Base.getfield(%52, 3, true)::Any
└───       goto #24
24 ─       return %53
) => Any
```

After:
```
CodeInfo(
1 ─ %1 = Base.sin::typeof(sin)
│        invoke %1(_3::Float64)::Float64
│   %3 = Core.Intrinsics.not_int(true)::Bool
└──      goto #3 if not %3
2 ─      invoke Zygote.notnothing(nothing::Nothing)::Union{}
└──      $(Expr(:unreachable))::Union{}
3 ┄ %7 = invoke Zygote.cos(_3::Float64)::Float64
│   %8 = Base.mul_float(1.0, %7)::Float64
└──      goto #4
4 ─      goto #5
5 ─      goto #6
6 ─      goto #7
7 ─      return %8
) => Float64
```

Which is essentially perfect (there's a bit of junk left over, but LLVM
can take care of that. The only thing that doesn't get removed is the
useless invocation of `sin`, but that's a separate and known issue).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants