New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
freezingdocs #2397
base: master
Are you sure you want to change the base?
freezingdocs #2397
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2397 +/- ##
===========================================
+ Coverage 43.04% 73.94% +30.89%
===========================================
Files 32 32
Lines 1856 1911 +55
===========================================
+ Hits 799 1413 +614
+ Misses 1057 498 -559 ☔ View full report in Codecov by Sentry. |
## Static freezing per model definition | ||
Sometimes some parts of the model ([`Flux.@layer`](@ref)) needn't to be trained at all but these params | ||
still need to reside on the GPU (these params are still needed in the forward | ||
and/or backward pass). | ||
```julia | ||
struct MaskedLayer{T} | ||
chain::Chain | ||
mask::T | ||
end | ||
Flux.@layer MyLayer trainable=(chain,) | ||
# mask field will not be updated in the training loop | ||
|
||
function (m::MaskedLayer)(x) | ||
# mask field will still move to to gpu for efficient operations: | ||
return m.chain(x) + x + m.mask | ||
end | ||
|
||
model = MaskedLayer(...) # this model will not have the `mask` field trained | ||
``` | ||
Note how this method permanently sets some model fields to be excluded from | ||
training without on-the-fly changing. | ||
|
||
## Excluding from model definition | ||
Sometimes some parameters aren't just "not trainable" but they shouldn't even | ||
transfer to the GPU (or be part of the functor). All scalar fields are like this | ||
by default, so things like learning rate multipliers are not trainable nor | ||
transferred to the GPU by default. | ||
```julia | ||
struct CustomLayer{T, F} | ||
chain::Chain | ||
activation_results::Vector{F} | ||
lr_multiplier::Float32 | ||
end | ||
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results` | ||
|
||
function (m::CustomLayer)(x) | ||
result = m.chain(x) + x | ||
|
||
# `activation_results` are not part of the GPU loop, hence we could do | ||
# things like `push!` | ||
push!(m.activation_results, mean(result)) | ||
return result | ||
end | ||
``` | ||
See more about this in [`Flux.@functor`](@ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great. I also wonder if it'd belong better in advanced.md
. This docs page can then have a big, prominent link to the relevant sections in that file with a one or two-sentence explanation to look there for methods of including and excluding parts of the model at definition time.
To elaborate, I still think there is utility in separating out concepts and tools which apply at definition time vs those which are dynamic and apply at "runtime". That's not to say they can't be mentioned on the same page, but they should be introduced separately. One way to bridge the gap might be to retain reduced code snippets on this page alongside the link(s) I mentioned earlier. e.g.
# struct CustomLayer chain; activation_results; end
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`
Rewrite of the prev pull #2385 , god i hate git
@ToucheSir