Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freezingdocs #2397

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

freezingdocs #2397

wants to merge 1 commit into from

Conversation

isentropic
Copy link

@isentropic isentropic commented Mar 13, 2024

Rewrite of the prev pull #2385 , god i hate git
@ToucheSir

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.94%. Comparing base (5f84b68) to head (d3b800b).

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2397       +/-   ##
===========================================
+ Coverage   43.04%   73.94%   +30.89%     
===========================================
  Files          32       32               
  Lines        1856     1911       +55     
===========================================
+ Hits          799     1413      +614     
+ Misses       1057      498      -559     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +45 to +89
## Static freezing per model definition
Sometimes some parts of the model ([`Flux.@layer`](@ref)) needn't to be trained at all but these params
still need to reside on the GPU (these params are still needed in the forward
and/or backward pass).
```julia
struct MaskedLayer{T}
chain::Chain
mask::T
end
Flux.@layer MyLayer trainable=(chain,)
# mask field will not be updated in the training loop

function (m::MaskedLayer)(x)
# mask field will still move to to gpu for efficient operations:
return m.chain(x) + x + m.mask
end

model = MaskedLayer(...) # this model will not have the `mask` field trained
```
Note how this method permanently sets some model fields to be excluded from
training without on-the-fly changing.

## Excluding from model definition
Sometimes some parameters aren't just "not trainable" but they shouldn't even
transfer to the GPU (or be part of the functor). All scalar fields are like this
by default, so things like learning rate multipliers are not trainable nor
transferred to the GPU by default.
```julia
struct CustomLayer{T, F}
chain::Chain
activation_results::Vector{F}
lr_multiplier::Float32
end
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`

function (m::CustomLayer)(x)
result = m.chain(x) + x

# `activation_results` are not part of the GPU loop, hence we could do
# things like `push!`
push!(m.activation_results, mean(result))
return result
end
```
See more about this in [`Flux.@functor`](@ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great. I also wonder if it'd belong better in advanced.md. This docs page can then have a big, prominent link to the relevant sections in that file with a one or two-sentence explanation to look there for methods of including and excluding parts of the model at definition time.

To elaborate, I still think there is utility in separating out concepts and tools which apply at definition time vs those which are dynamic and apply at "runtime". That's not to say they can't be mentioned on the same page, but they should be introduced separately. One way to bridge the gap might be to retain reduced code snippets on this page alongside the link(s) I mentioned earlier. e.g.

# struct CustomLayer chain; activation_results; end
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants