Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add @autosize #2078

Merged
merged 10 commits into from
Oct 10, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Flux Release Notes

## v0.13.7
* Added [`@autosize` macro](https://github.com/FluxML/Flux.jl/pull/2078)

## v0.13.4
* Added [`PairwiseFusion` layer](https://github.com/FluxML/Flux.jl/pull/1983)

Expand Down
1 change: 1 addition & 0 deletions src/Flux.jl
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ include("layers/show.jl")
include("loading.jl")

include("outputsize.jl")
export @autosize

include("data/Data.jl")
using .Data
Expand Down
170 changes: 168 additions & 2 deletions src/outputsize.jl
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,12 @@ outputsize(m::AbstractVector, input::Tuple...; padbatch=false) = outputsize(Chai

## bypass statistics in normalization layers

for layer in (:LayerNorm, :BatchNorm, :InstanceNorm, :GroupNorm)
@eval (l::$layer)(x::AbstractArray{Nil}) = x
for layer in (:BatchNorm, :InstanceNorm, :GroupNorm) # LayerNorm works fine
@eval function (l::$layer)(x::AbstractArray{Nil})
l.chs == size(x, ndims(x)-1) || throw(DimensionMismatch(
string($layer, " expected ", l.chs, " channels, but got size(x) == ", size(x))))
x
end
end

## fixes for layers that don't work out of the box
Expand All @@ -168,3 +172,165 @@ for (fn, Dims) in ((:conv, DenseConvDims),)
end
end
end


"""
@autosize (size...,) Chain(Layer(_ => 2), Layer(_), ...)

Returns the specified model, with each `_` replaced by an inferred number,
for input of the given `size`.

The unknown sizes are usually the second-last dimension of that layer's input,
which Flux regards as the channel dimension.
(A few layers, `Dense` & [`LayerNorm`](@ref), instead always use the first dimension.)
The underscore may appear as an argument of a layer, or inside a `=>`.
It may be used in further calculations, such as `Dense(_ => _÷4)`.

# Examples
```
julia> @autosize (3, 1) Chain(Dense(_ => 2, sigmoid), BatchNorm(_, affine=false))
Chain(
Dense(3 => 2, σ), # 8 parameters
BatchNorm(2, affine=false),
)

julia> img = [28, 28];

julia> @autosize (img..., 1, 32) Chain( # size is only needed at runtime
Chain(c = Conv((3,3), _ => 5; stride=2, pad=SamePad()),
p = MeanPool((3,3)),
b = BatchNorm(_),
f = Flux.flatten),
Dense(_ => _÷4, relu, init=Flux.rand32), # can calculate output size _÷4
SkipConnection(Dense(_ => _, relu), +),
Dense(_ => 10),
) |> gpu # moves to GPU after initialisation
Chain(
Chain(
c = Conv((3, 3), 1 => 5, pad=1, stride=2), # 50 parameters
p = MeanPool((3, 3)),
b = BatchNorm(5), # 10 parameters, plus 10
f = Flux.flatten,
),
Dense(80 => 20, relu), # 1_620 parameters
SkipConnection(
Dense(20 => 20, relu), # 420 parameters
+,
),
Dense(20 => 10), # 210 parameters
) # Total: 10 trainable arrays, 2_310 parameters,
# plus 2 non-trainable, 10 parameters, summarysize 10.469 KiB.

julia> outputsize(ans, (28, 28, 1, 32))
(10, 32)
```

Limitations:
* While `@autosize (5, 32) Flux.Bilinear(_ => 7)` is OK, something like `Bilinear((_, _) => 7)` will fail.
* While `Scale(_)` and `LayerNorm(_)` are fine (and use the first dimension), `Scale(_,_)` and `LayerNorm(_,_)`
will fail if `size(x,1) != size(x,2)`.
* RNNs won't work: `@autosize (7, 11) LSTM(_ => 5)` fails, because `outputsize(RNN(3=>7), (3,))` also fails, a known issue.
"""
macro autosize(size, model)
Meta.isexpr(size, :tuple) || error("@autosize's first argument must be a tuple, the size of the input")
Meta.isexpr(model, :call) || error("@autosize's second argument must be something like Chain(layers...)")
ex = _makelazy(model)
@gensym m
quote
$m = $ex
$outputsize($m, $size)
$striplazy($m)
end |> esc
end

function _makelazy(ex::Expr)
n = _underscoredepth(ex)
n == 0 && return ex
n == 1 && error("@autosize doesn't expect an underscore here: $ex")
n == 2 && return :($LazyLayer($(string(ex)), $(_makefun(ex)), nothing))
n > 2 && return Expr(ex.head, ex.args[1], map(_makelazy, ex.args[2:end])...)
end
_makelazy(x) = x

function _underscoredepth(ex::Expr)
# Meta.isexpr(ex, :tuple) && :_ in ex.args && return 10
ex.head in (:call, :kw, :(->), :block) || return 0
ex.args[1] == :(=>) && ex.args[2] == :_ && return 1
m = maximum(_underscoredepth, ex.args)
m == 0 ? 0 : m+1
end
_underscoredepth(ex) = Int(ex == :_)

function _makefun(ex)
T = Meta.isexpr(ex, :call) ? ex.args[1] : Type
@gensym x s
Expr(:(->), x, Expr(:block, :($s = $autosizefor($T, $x)), _replaceunderscore(ex, s)))
end

"""
autosizefor(::Type, x)

If an `_` in your layer's constructor, used within `@autosize`, should
*not* mean the 2nd-last dimension, then you can overload this.

For instance `autosizefor(::Type{<:Dense}, x::AbstractArray) = size(x, 1)`
is needed to make `@autosize (2,3,4) Dense(_ => 5)` return
`Dense(2 => 5)` rather than `Dense(3 => 5)`.
"""
autosizefor(::Type, x::AbstractArray) = size(x, max(1, ndims(x)-1))
autosizefor(::Type{<:Dense}, x::AbstractArray) = size(x, 1)
autosizefor(::Type{<:LayerNorm}, x::AbstractArray) = size(x, 1)

_replaceunderscore(e, s) = e == :_ ? s : e
_replaceunderscore(ex::Expr, s) = Expr(ex.head, map(a -> _replaceunderscore(a, s), ex.args)...)

mutable struct LazyLayer
str::String
make::Function
layer
end

function (l::LazyLayer)(x::AbstractArray)
l.layer == nothing || return l.layer(x)
lay = l.make(x)
y = lay(x)
l.layer = lay # mutate after we know that call worked
return y
end

#=

Flux.outputsize(Chain(Dense(2=>3)), (4,)) # nice error
Flux.outputsize(Dense(2=>3), (4,)) # no nice error
@autosize (4,) Dense(2=>3) # no nice error

@autosize (3,) Dense(2 => _) # shouldn't work, weird error


@autosize (3,5,6) LayerNorm(_,_) # no complaint, but
ans(rand(3,5,6)) # this fails

=#

@functor LazyLayer

function striplazy(x)
fs, re = functor(x)
re(map(striplazy, fs))
end
striplazy(l::LazyLayer) = l.layer == nothing ? error("should be initialised!") : l.layer

# Could make LazyLayer usable outside of @autosize, for instance allow Chain(@lazy Dense(_ => 2))?
# But then it will survive to produce weird structural gradients etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we force users to call recursive_striplazy(model, input_size) or something before using an incrementally constructed network like this? Maybe define a rrule which throws an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

striplazy should be fully recursive. We could make a function that calls this after outputsize & returns the model. And indeed an rrule would be one way to forbid you not to strip the model before using it for real.

I suppose the other policy would just be to allow these things to survive in the model. As long as you never change it, and don't care about the cost of the if & type instability, it should work?

But any use outside of @autosize probably needs another macro... writing Flux.LazyLayer("", x -> Dense(size(x,1) => 10), nothing) seems sufficiently obscure that perhaps it's OK to say that's obviously at own risk, for now? @autosize can be the only API until we decide if we want more.


function Base.show(io::IO, l::LazyLayer)
printstyled(io, "LazyLayer(", color=:light_black)
if l.layer == nothing
printstyled(io, l.str, color=:red)
else
printstyled(io, l.layer, color=:green)
end
printstyled(io, ")", color=:light_black)
end

_big_show(io::IO, l::LazyLayer, indent::Int=0, name=nothing) = _layer_show(io, l, indent, name)
65 changes: 65 additions & 0 deletions test/outputsize.jl
Original file line number Diff line number Diff line change
Expand Up @@ -142,16 +142,81 @@ end
m = LayerNorm(32)
@test outputsize(m, (32, 32, 3, 16)) == (32, 32, 3, 16)
@test outputsize(m, (32, 32, 3); padbatch=true) == (32, 32, 3, 1)
m2 = LayerNorm(3, 2)
@test outputsize(m2, (3, 2)) == (3, 2) == size(m2(randn(3, 2)))
@test outputsize(m2, (3,)) == (3, 2) == size(m2(randn(3, 2)))

m = BatchNorm(3)
@test outputsize(m, (32, 32, 3, 16)) == (32, 32, 3, 16)
@test outputsize(m, (32, 32, 3); padbatch=true) == (32, 32, 3, 1)
@test_throws Exception m(randn(Float32, 32, 32, 5, 1))
@test_throws DimensionMismatch outputsize(m, (32, 32, 5, 1))

m = InstanceNorm(3)
@test outputsize(m, (32, 32, 3, 16)) == (32, 32, 3, 16)
@test outputsize(m, (32, 32, 3); padbatch=true) == (32, 32, 3, 1)
@test_throws Exception m(randn(Float32, 32, 32, 5, 1))
@test_throws DimensionMismatch outputsize(m, (32, 32, 5, 1))

m = GroupNorm(16, 4)
@test outputsize(m, (32, 32, 16, 16)) == (32, 32, 16, 16)
@test outputsize(m, (32, 32, 16); padbatch=true) == (32, 32, 16, 1)
@test_throws Exception m(randn(Float32, 32, 32, 15, 4))
@test_throws DimensionMismatch outputsize(m, (32, 32, 15, 4))
end

@testset "autosize macro" begin
m = @autosize (3,) Dense(_ => 4)
@test randn(3) |> m |> size == (4,)

m = @autosize (3, 1) Chain(Dense(_ => 4), Dense(4 => 10), softmax)
@test randn(3, 5) |> m |> size == (10, 5)

m = @autosize (2, 3, 4, 5) Dense(_ => 10) # goes by first dim, not 2nd-last
@test randn(2, 3, 4, 5) |> m |> size == (10, 3, 4, 5)

m = @autosize (9,) Dense(_ => div(_,2))
@test randn(9) |> m |> size == (4,)

m = @autosize (3,) Chain(one = Dense(_ => 4), two = softmax) # needs kw
@test randn(3) |> m |> size == (4,)

m = @autosize (3, 45) Maxout(() -> Dense(_ => 6, tanh), 2) # needs ->, block
@test randn(3, 45) |> m |> size == (6, 45)

# here Parallel gets two inputs, no problem:
m = @autosize (3,) Chain(SkipConnection(Dense(_ => 4), Parallel(vcat, Dense(_ => 5), Dense(_ => 6))), Flux.Scale(_))
@test randn(3) |> m |> size == (11,)

# like Dense, LayerNorm goes by the first dimension:
m = @autosize (3, 4, 5) LayerNorm(_)
@test rand(3, 6, 7) |> m |> size == (3, 6, 7)

m = @autosize (3, 3, 10) LayerNorm(_, _) # does not check that sizes match
@test rand(3, 3, 10) |> m |> size == (3, 3, 10)

m = @autosize (3,) Flux.Bilinear(_ => 10)
@test randn(3) |> m |> size == (10,)

m = @autosize (3, 1) Flux.Bilinear(_ => 10)
@test randn(3, 4) |> m |> size == (10, 4)

@test_throws Exception @eval @autosize (3,) Flux.Bilinear((_,3) => 10)

# first docstring example
m = @autosize (3, 1) Chain(Dense(_ => 2, sigmoid), BatchNorm(_, affine=false))
@test randn(3, 4) |> m |> size == (2, 4)

# evil docstring example
img = [28, 28];
m = @autosize (img..., 1, 32) Chain( # size is only needed at runtime
Chain(c = Conv((3,3), _ => 5; stride=2, pad=SamePad()),
p = MeanPool((3,3)),
b = BatchNorm(_),
f = Flux.flatten),
Dense(_ => _÷4, relu, init=Flux.rand32), # can calculate output size _÷4
SkipConnection(Dense(_ => _, relu), +),
Dense(_ => 10),
) |> gpu # moves to GPU after initialisation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcabbott I missed this in the review, but GPU tests are failing because the |> here is binding more tightly than @autosize and thus the model isn't actually moved onto the GPU. Is there anything we can do about that other than adding more parens? Whatever changes made would affect the @autosize docstring as well, since it shares this example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no I thought I checked this, sorry. It's not working but the binding is what I expected:

julia> (@autosize (1,2) Dense(_, 3) |> f64).bias
3-element Vector{Float32}:
 0.0
 0.0
 0.0

julia> :(@autosize (1,2) Dense(_, 3) |> f64) |> dump
Expr
  head: Symbol macrocall
  args: Array{Any}((4,))
    1: Symbol @autosize
    2: LineNumberNode
      line: Int64 1
      file: Symbol REPL[16]
    3: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Int64 1
        2: Int64 2
    4: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol |>
        2: Expr
          head: Symbol call
          args: Array{Any}((3,))
            1: Symbol Dense
            2: Symbol _
            3: Int64 3
        3: Symbol f64

It seems the gpu walk is taking place too early. Maybe I never implemented my scheme to delay it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My scheme was I think aimed at adapt. There you can grab the function being mapped, and replace the layer's maker function with one composed with this. But for Functors.jl I think that's impossible.

So we should just make it an error. And remove this use from the docs. Call gpu once it's done, i.e. with brackets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@test randn(Float32, img..., 1, 32) |> gpu |> m |> size == (10, 32)
end