-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add @autosize
#2078
Add @autosize
#2078
Conversation
striplazy(l::LazyLayer) = l.layer == nothing ? error("should be initialised!") : l.layer | ||
|
||
# Could make LazyLayer usable outside of @autosize, for instance allow Chain(@lazy Dense(_ => 2))? | ||
# But then it will survive to produce weird structural gradients etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we force users to call recursive_striplazy(model, input_size)
or something before using an incrementally constructed network like this? Maybe define a rrule
which throws an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
striplazy
should be fully recursive. We could make a function that calls this after outputsize
& returns the model. And indeed an rrule
would be one way to forbid you not to strip the model before using it for real.
I suppose the other policy would just be to allow these things to survive in the model. As long as you never change it, and don't care about the cost of the if
& type instability, it should work?
But any use outside of @autosize
probably needs another macro... writing Flux.LazyLayer("", x -> Dense(size(x,1) => 10), nothing)
seems sufficiently obscure that perhaps it's OK to say that's obviously at own risk, for now? @autosize
can be the only API until we decide if we want more.
8cf6d0e
to
fcfb195
Compare
Dense(_ => _÷4, relu, init=Flux.rand32), # can calculate output size _÷4 | ||
SkipConnection(Dense(_ => _, relu), +), | ||
Dense(_ => 10), | ||
) |> gpu # moves to GPU after initialisation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcabbott I missed this in the review, but GPU tests are failing because the |>
here is binding more tightly than @autosize
and thus the model isn't actually moved onto the GPU. Is there anything we can do about that other than adding more parens? Whatever changes made would affect the @autosize
docstring as well, since it shares this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no I thought I checked this, sorry. It's not working but the binding is what I expected:
julia> (@autosize (1,2) Dense(_, 3) |> f64).bias
3-element Vector{Float32}:
0.0
0.0
0.0
julia> :(@autosize (1,2) Dense(_, 3) |> f64) |> dump
Expr
head: Symbol macrocall
args: Array{Any}((4,))
1: Symbol @autosize
2: LineNumberNode
line: Int64 1
file: Symbol REPL[16]
3: Expr
head: Symbol tuple
args: Array{Any}((2,))
1: Int64 1
2: Int64 2
4: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol |>
2: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol Dense
2: Symbol _
3: Int64 3
3: Symbol f64
It seems the gpu
walk is taking place too early. Maybe I never implemented my scheme to delay it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My scheme was I think aimed at adapt
. There you can grab the function being mapped, and replace the layer's maker function with one composed with this. But for Functors.jl I think that's impossible.
So we should just make it an error. And remove this use from the docs. Call gpu
once it's done, i.e. with brackets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
The macro replaces
_
with an inferred input size:PR Checklist