Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testmode! back for normalization layers #1044

Merged
merged 11 commits into from
Mar 1, 2020
Merged

Add testmode! back for normalization layers #1044

merged 11 commits into from
Mar 1, 2020

Conversation

darsnack
Copy link
Member

Fixed #909

I added testmode!(m, mode) back to Flux as per v0.9. Now the mode can be false, true, or :auto/nothing with the default being :auto for newly constructed layers. In :auto mode, the istraining() functions added in v0.10 are used to determine whether we are evaluating within an AD trace or not.

Also plan on adding a doc section in an additional commit.

@CarloLucibello
Copy link
Member

CarloLucibello commented Feb 21, 2020

I'm dubious about defaulting to :auto in testmode!(m, mode=:auto) because it would be more natural
to interpret the statement testmode!(m) as "set the model in test mode regardless of anything else".
Viable alternatives for the signature are

testmode!(m, mode=true)     # set to test mode when called 1 arg
testmode!(m, mode)             # force to input the mode

@CarloLucibello
Copy link
Member

btw, rebase has gone wrong here

@darsnack
Copy link
Member Author

I'm dubious about defaulting to :auto in testmode!(m, mode=:auto) because it would be more natural
to interpret the statement testmode!(m) as "set the model in test mode regardless of anything else".
Viable alternatives for the signature are

testmode!(m, mode=true)     # set to test mode when called 1 arg
testmode!(m, mode)             # force to input the mode

Yeah I agree. I'll change that.

btw, rebase has gone wrong here

Any suggestions on how to resolve? I see two options — wait until the other PR I have outstanding has merged into master, or roll back the commits above then try to rebase from upstream. I haven't done the latter before, but I can figure it out. The weird commit history is why I set this PR as a draft.

@CarloLucibello
Copy link
Member

CarloLucibello commented Feb 21, 2020

your other PR is orthogonal to this, so you could have just branched from current master without any fear about the merging order.

I don't know what would be the best way to fix this. What I would do, since the PR small, would be something a little bit dirty:

-- save the changes from this PR in some untracked file
git  checkout master
git branch -d mybranch  #delete branch locally
git checkout -b mybranch
-- apply and commit changes
git push origin +mybranch  # force push changes here

@darsnack
Copy link
Member Author

I ended up rebasing from before the other PR. The commit history should be clean now.

@darsnack darsnack marked this pull request as ready for review February 25, 2020 19:55
@darsnack darsnack closed this Feb 26, 2020
@darsnack darsnack reopened this Feb 26, 2020
@CarloLucibello
Copy link
Member

@dhairyagandhi96 @MikeInnes @xukai92 any comments on this? This is non-breaking, but once it gets in reverting would be breaking

@CarloLucibello
Copy link
Member

@darsnack should we also add trainmode! for convenience?

@darsnack
Copy link
Member Author

@darsnack should we also add trainmode! for convenience?

Are you thinking of something like trainmode!(m) = testmode!(m, false)? Because I think having both trainmode!(m, mode) and testmode!(m, mode) might be redundant/confusing (i.e. two functions with similar signatures that serve the same purpose except the boolean argument is reversed).

@darsnack
Copy link
Member Author

I did update testmode! to return the model, removing the need for trainmode in the tests.

@CarloLucibello
Copy link
Member

CarloLucibello commented Mar 1, 2020

Are you thinking of something like trainmode!(m) = testmode!(m, false)? Because I think having both trainmode!(m, mode) and testmode!(m, mode) might be redundant/confusing (i.e. two functions with similar signatures that serve the same purpose except the boolean argument is reversed).

yeah, I was thinking

trainmode!(m, mode=true) = mode isa Bool ?  testmode!(m, !mode) : testmode!(m, mode) 

I know it's redundant, but it seems more natural to not break arbitrarily the symmetry toward testmode in the interface. This way, one would do testmode!(m) while testing and trainmode!(m) while training. I don't know, it feels a bit more natural to me, but I don't have a strong opinion, we could just go with testmode! if you think it's better

@CarloLucibello
Copy link
Member

Also, we should warn in the docstring to set back the model to train once test phase is over, or something along these lines

src/functor.jl Outdated Show resolved Hide resolved
@CarloLucibello
Copy link
Member

ok, looks good, we can merge whenever you are ready. If you also bump Flux's minor version in the project.toml we can also proceed with tagging a new release

@darsnack
Copy link
Member Author

darsnack commented Mar 1, 2020

I think this is ready to merge

@CarloLucibello
Copy link
Member

wait, it looks like the version was already bumped, sorry, could your change it back?

@darsnack
Copy link
Member Author

darsnack commented Mar 1, 2020

Oops should probably have double-checked that after merging. Should be good to go again.

@CarloLucibello
Copy link
Member

nice, thanks!

bors r+

@bors
Copy link
Contributor

bors bot commented Mar 1, 2020

Build succeeded

@bors bors bot merged commit 069d228 into FluxML:master Mar 1, 2020
@darsnack darsnack deleted the feature/istraining branch March 1, 2020 20:01
@anoojpatel
Copy link

This is awesome! Could this be tagged? I would love using this as soon as possible! :)

@darsnack
Copy link
Member Author

darsnack commented Mar 3, 2020

This turns out to be breaking for cases where people instantiate a layer by manually specifying the fields instead of using the outer constructor. (e.g. ObjectDetector.jl)

@DhairyaLGandhi
Copy link
Member

I would've thought that simply defining the istraining function would be sufficient?

@CarloLucibello
Copy link
Member

This turns out to be breaking for cases where people instantiate a layer by manually specifying the fields instead of using the outer constructor. (e.g. ObjectDetector.jl)

ouch

@IanButterworth
Copy link
Contributor

@darsnack
Copy link
Member Author

darsnack commented Mar 3, 2020

I would've thought that simply defining the istraining function would be sufficient?

Sufficient for the functionality added in this PR? I don't follow how that allows users to force train/test mode on a per layer basis.

@DhairyaLGandhi
Copy link
Member

That would prevent the breakages too, I think

@IanButterworth
Copy link
Contributor

If the usage in ObjectDetector is unusual, I'd happily take a PR to correct it

@DhairyaLGandhi
Copy link
Member

for freezing layers, the recommended way would be via the parameters anyway, so I think it's an orthogonal concern, unless that's specifically desired

@darsnack
Copy link
Member Author

darsnack commented Mar 3, 2020

If the usage in ObjectDetector is unusual, I'd happily take a PR to correct it

Even if it is unusual, adding fields to a type is a breaking change unless the field defaults to a value. @ianshmean your PR should have been part of this one — that was my bad.

for freezing layers, the recommended way would be via the parameters anyway, so I think it's an orthogonal concern, unless that's specifically desired

The normalization layers update some fields on the forward pass when part of an AD trace. I don't think freezing by excluding the parameters will stop this update. This is standard, but v0.10 made the change of automatically deciding whether to update these fields or not. There are use-cases that are not standard training where the fields should not be update even though a gradient is being computed.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Mar 3, 2020

The parameters that we collect are the ones we update, I believe

@darsnack
Copy link
Member Author

darsnack commented Mar 3, 2020

I think I am missing something. If you could explain how to address #909 with the istraining and parameter freezing, then I am happy to work on a PR for a downstream major version change.

@MikeInnes
Copy link
Member

I suggest reverting this and working on it some more. I have a bunch of concerns about the API, and the code is pretty strange (e.g. why are there a bunch of additions to functor.jl when this has nothing to do with the functor impl? If anything this should use functor because then it'd work with custom layers etc.)

@darsnack
Copy link
Member Author

I added the generic functions that work on any layer to functor.jl. Seemed like it had parallels organizationally to trainable, but I could have put it in layers/basic.jl.

If anything this should use functor because then it'd work with custom layers etc.

By this do you mean freezing via trainable like @dhairyagandhi96 suggested? I'm sorry but I still don't understand this route. (I'm just going to talk myself through an example). If we take BatchNorm, we have the following line (L171):

trainable(bn::BatchNorm) = (bn.β, bn.γ)

This affects what gets updated during the gradient step. I don't see how that affects L189-190:

μ = mean(x, dims = axes)
σ² = sum((x .- μ) .^ 2, dims = axes) ./ m

These are recomputed every pass that is contained within an AD trace regardless of what trainable(bn::BatchNorm) returns. In contrast, when the layer is in test mode, L183-184 don't recompute the mean and variance:

μ = reshape(BN.μ, affine_shape...)
σ² = reshape(BN.σ², affine_shape...)

In my mind, latching onto functor/trainable means rewriting the normalization layers so that all updates are part of the gradient computation (i.e. custom adjoints).

@MikeInnes
Copy link
Member

The fmap would not be for the normalisation layers themselves but for larger models that have normalisation inside them – like Chain. You've made Chain a special case, but if you use fmap instead you could apply testmode! to any model, including user-defined structs, without defining a method for each one specifically.

bors bot added a commit that referenced this pull request Dec 24, 2020
1432: Generalize train/testmode! to all Functors r=CarloLucibello a=ToucheSir

Addresses #1044 (comment). See also https://discourse.julialang.org/t/do-i-have-to-implement-flux-testmode-for-my-own-models/52038.

### PR Checklist

- [x] Tests are added
- [ ] Entry in NEWS.md
- [ ] Final review from `@dhairyagandhi96` (for API changes).


Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limitation of Flux.istraining()
6 participants