How to apply L2 regularization to a subset of parameters? #1284

jonathan-laurent · 2020-07-18T00:42:37Z

When training a neural network with an L2 regularization, it is often advised not to regularize the bias parameters (in contrast with weight parameters).

I implemented this as follows in AlphaZero.jl:

regularized_params_(l) = []
regularized_params_(l::Flux.Dense) = [l.W]
regularized_params_(l::Flux.Conv) = [l.weight]

function foreach_flux_node(f::Function, x, seen = IdDict())
  Functors.isleaf(x) && return
  haskey(seen, x) && return
  seen[x] = true
  f(x)
  for child in Flux.trainable(x)
    foreach_flux_node(f, child, seen)
  end
end

function regularized_params(net::FluxNetwork)
  ps = Flux.Params()
  foreach_flux_node(net) do p
    for r in regularized_params_(p)
      any(x -> x === r, ps) || push!(ps, r)
    end
  end
  return ps
end

regularization_term(nn) = sum(sum(w .* w) for w in regularized_params(nn))

This feels a bit hackish though (and also it relies on internals and so it tends to break at every new Flux release).
Do you see any better way? Shouldn't we make this easier?

CarloLucibello · 2020-07-18T07:06:58Z

maybe you can obtain some slight simplification using delete! instead

nonregularized_params_(net) = ....

function regularized_params(net::FluxNetwork)
  ps = Flux.params(net)
  for p in nonregularized_params_(net)
      delete!(ps, p)
  end
  return ps
end

Either you push! or delete!, you can avoid the presence check, it's done internally

CarloLucibello · 2020-07-18T07:14:30Z

Maybe we can implement in Flux something similar to foreach_flux_node, like a modules function. What do you think?

jonathan-laurent · 2020-07-18T12:25:41Z

I don't see how the delete! solution would work. Indeed, Flux.params returns a collection of arrays and there is no way to distinguish bias parameters from weight parameters in those arrays, right?

CarloLucibello · 2020-07-18T17:06:39Z

sorry, forget what I said, I was being stupid. Yes, I don't see how to simplify this, besides making a modules function part of Flux.
Another option for you would be to filter out AbstractVector parameters and keep AbstractArrays, but that would not play well with BatchǸorm layers

DhairyaLGandhi · 2020-07-18T17:15:21Z

Currently there isn't a simple way to filter out biases specifically. I can see this becoming a real need for bigger models. It will be a bit of a manual process with the current infrastructure since we currently don't distinguish between weights and biases as both are assumed to be parameters, but with defining a functor definition that splits these out would do the trick

CarloLucibello mentioned this issue Jul 25, 2020

define modules function #1294

Closed

MacKenzieHnC mentioned this issue Sep 8, 2020

Flux.Zeros conflicts with Flux.loadparams! #1277

Closed

CarloLucibello mentioned this issue Dec 27, 2020

define modules function #1444

Merged

4 tasks

bors bot closed this as completed in e2b73e1 Mar 10, 2021

ToucheSir mentioned this issue Oct 14, 2021

Functorizing Functors FluxML/Functors.jl#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to apply L2 regularization to a subset of parameters? #1284

How to apply L2 regularization to a subset of parameters? #1284

jonathan-laurent commented Jul 18, 2020 •

edited

Loading

CarloLucibello commented Jul 18, 2020

CarloLucibello commented Jul 18, 2020

jonathan-laurent commented Jul 18, 2020

CarloLucibello commented Jul 18, 2020

DhairyaLGandhi commented Jul 18, 2020

How to apply L2 regularization to a subset of parameters? #1284

How to apply L2 regularization to a subset of parameters? #1284

Comments

jonathan-laurent commented Jul 18, 2020 • edited Loading

CarloLucibello commented Jul 18, 2020

CarloLucibello commented Jul 18, 2020

jonathan-laurent commented Jul 18, 2020

CarloLucibello commented Jul 18, 2020

DhairyaLGandhi commented Jul 18, 2020

jonathan-laurent commented Jul 18, 2020 •

edited

Loading