Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add @autosize #2078

Merged
merged 10 commits into from
Oct 10, 2022
Merged

Add @autosize #2078

merged 10 commits into from
Oct 10, 2022

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Oct 5, 2022

The macro replaces _ with an inferred input size:

julia> @autosize (2,3,4) Chain(Dense(_ => 5, tanh), Flux.flatten, Maxout(() -> Dense(_ => 3), 2))
Chain(
  Dense(2 => 5, tanh),                  # 15 parameters
  Flux.flatten,
  Maxout(
    Dense(15 => 3),                     # 48 parameters
    Dense(15 => 3),                     # 48 parameters
  ),
)                   # Total: 6 arrays, 111 parameters, 892 bytes.

julia> ans(rand(2,3,4)) |> size
(3, 4)

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable

@mcabbott mcabbott mentioned this pull request Oct 5, 2022
92 tasks
striplazy(l::LazyLayer) = l.layer == nothing ? error("should be initialised!") : l.layer

# Could make LazyLayer usable outside of @autosize, for instance allow Chain(@lazy Dense(_ => 2))?
# But then it will survive to produce weird structural gradients etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we force users to call recursive_striplazy(model, input_size) or something before using an incrementally constructed network like this? Maybe define a rrule which throws an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

striplazy should be fully recursive. We could make a function that calls this after outputsize & returns the model. And indeed an rrule would be one way to forbid you not to strip the model before using it for real.

I suppose the other policy would just be to allow these things to survive in the model. As long as you never change it, and don't care about the cost of the if & type instability, it should work?

But any use outside of @autosize probably needs another macro... writing Flux.LazyLayer("", x -> Dense(size(x,1) => 10), nothing) seems sufficiently obscure that perhaps it's OK to say that's obviously at own risk, for now? @autosize can be the only API until we decide if we want more.

@mcabbott mcabbott force-pushed the autosize branch 2 times, most recently from 8cf6d0e to fcfb195 Compare October 7, 2022 03:34
@mcabbott mcabbott marked this pull request as ready for review October 7, 2022 04:18
docs/src/outputsize.md Show resolved Hide resolved
src/outputsize.jl Show resolved Hide resolved
@mcabbott mcabbott merged commit 1ec32c2 into FluxML:master Oct 10, 2022
@mcabbott mcabbott deleted the autosize branch October 10, 2022 18:34
Dense(_ => _÷4, relu, init=Flux.rand32), # can calculate output size _÷4
SkipConnection(Dense(_ => _, relu), +),
Dense(_ => 10),
) |> gpu # moves to GPU after initialisation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcabbott I missed this in the review, but GPU tests are failing because the |> here is binding more tightly than @autosize and thus the model isn't actually moved onto the GPU. Is there anything we can do about that other than adding more parens? Whatever changes made would affect the @autosize docstring as well, since it shares this example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no I thought I checked this, sorry. It's not working but the binding is what I expected:

julia> (@autosize (1,2) Dense(_, 3) |> f64).bias
3-element Vector{Float32}:
 0.0
 0.0
 0.0

julia> :(@autosize (1,2) Dense(_, 3) |> f64) |> dump
Expr
  head: Symbol macrocall
  args: Array{Any}((4,))
    1: Symbol @autosize
    2: LineNumberNode
      line: Int64 1
      file: Symbol REPL[16]
    3: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Int64 1
        2: Int64 2
    4: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol |>
        2: Expr
          head: Symbol call
          args: Array{Any}((3,))
            1: Symbol Dense
            2: Symbol _
            3: Int64 3
        3: Symbol f64

It seems the gpu walk is taking place too early. Maybe I never implemented my scheme to delay it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My scheme was I think aimed at adapt. There you can grab the function being mapped, and replace the layer's maker function with one composed with this. But for Functors.jl I think that's impossible.

So we should just make it an error. And remove this use from the docs. Call gpu once it's done, i.e. with brackets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants