freezingdocs #2397

isentropic · 2024-03-13T05:07:48Z

Rewrite of the prev pull #2385 , god i hate git
@ToucheSir

codecov · 2024-03-13T05:26:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.94%. Comparing base (5f84b68) to head (d3b800b).

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2397       +/-   ##
===========================================
+ Coverage   43.04%   73.94%   +30.89%     
===========================================
  Files          32       32               
  Lines        1856     1911       +55     
===========================================
+ Hits          799     1413      +614     
+ Misses       1057      498      -559

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ToucheSir · 2024-03-15T03:23:44Z

docs/src/tutorials/misc-model-tweaking.md

+## Static freezing per model definition
+Sometimes some parts of the model ([`Flux.@layer`](@ref)) needn't to be trained at all but these params
+still need to reside on the GPU (these params are still needed in the forward
+and/or backward pass).
+```julia
+struct MaskedLayer{T}
+    chain::Chain
+    mask::T
+end
+Flux.@layer MyLayer trainable=(chain,)
+# mask field will not be updated in the training loop
+
+function (m::MaskedLayer)(x)
+    # mask field will still move to to gpu for efficient operations:
+  return m.chain(x) + x + m.mask
+end
+
+model = MaskedLayer(...) # this model will not have the `mask` field trained
+```
+Note how this method permanently sets some model fields to be excluded from
+training without on-the-fly changing.
+
+## Excluding from model definition
+Sometimes some parameters aren't just "not trainable" but they shouldn't even
+transfer to the GPU (or be part of the functor). All scalar fields are like this
+by default, so things like learning rate multipliers are not trainable nor
+transferred to the GPU by default.
+```julia
+struct CustomLayer{T, F}
+    chain::Chain
+    activation_results::Vector{F}
+    lr_multiplier::Float32
+end
+Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`
+
+function (m::CustomLayer)(x)
+    result = m.chain(x) + x
+
+    # `activation_results` are not part of the GPU loop, hence we could do
+    # things like `push!`
+    push!(m.activation_results, mean(result))
+    return result
+end
+```
+See more about this in [`Flux.@functor`](@ref)


I think this is great. I also wonder if it'd belong better in advanced.md. This docs page can then have a big, prominent link to the relevant sections in that file with a one or two-sentence explanation to look there for methods of including and excluding parts of the model at definition time.

To elaborate, I still think there is utility in separating out concepts and tools which apply at definition time vs those which are dynamic and apply at "runtime". That's not to say they can't be mentioned on the same page, but they should be introduced separately. One way to bridge the gap might be to retain reduced code snippets on this page alongside the link(s) I mentioned earlier. e.g.

# struct CustomLayer chain; activation_results; end Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`

freezingdocs

d3b800b

ToucheSir reviewed Mar 15, 2024

View reviewed changes

ToucheSir added the documentation label Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

freezingdocs #2397

freezingdocs #2397

isentropic commented Mar 13, 2024 •

edited

codecov bot commented Mar 13, 2024

ToucheSir Mar 15, 2024

freezingdocs #2397

Are you sure you want to change the base?

freezingdocs #2397

Conversation

isentropic commented Mar 13, 2024 • edited

codecov bot commented Mar 13, 2024

Codecov Report

ToucheSir Mar 15, 2024

Choose a reason for hiding this comment

isentropic commented Mar 13, 2024 •

edited