Handle tied weights in `update!` #42

mcabbott · 2022-01-30T00:54:29Z

This alters setup to record the "address" of any tied weights, and then update! to first add the gradient of the second to the first (of each pair), then update as normal, and finally re-create the tie in the updated model.

I've tried to match the existing update! as much as possible. The format in which the "address" is stored is just a tuple of property names. The function to "pick out" a gradient component based on this is easy. The one to "place back" the modified one is trickier, as it needs to re-create missing branches -- and not just the minimal branch, but all the other empty fields. So it walks the model in parallel to the gradient, a lot like the existing update!. This isn't something that e.g. Setfield.jl thinks about.

Maybe this could all be abstracted away somehow, moved up into Functors? We seem to need quite a few patterns which aren't like fmap.

Surely the "address" could be stored in a more compile-away-able format, alla Setfield.jl. None of this is type stable, and it takes a few μs. I think that's true of everything in Functors.jl too. And it might even be desirable, for startup speed with deeply nested models.

~~This is on top of #41.~~ Uses FluxML/Functors.jl#33, which is copied in here for now.

ToucheSir · 2022-01-30T03:47:55Z

Following from our discussion on design, how does composition of states work with this approach? Since ties, pick and place work on absolute addresses from the tree root, it's not clear to me how to keep things in sync when optimizing a subset (e.g. gradual unfreezing) or a superset (e.g. adding a classifier head and fine-tuning) of the original state.

mcabbott · 2022-01-30T04:33:42Z

Good question. These are all from the root. You can't accidentally take a branch of the state & model and lose them, since the outermost Tree type will mean the types don't line up. If instead you pasted these, whole, into some larger model's pair, I bet they would just work? Unless doing that created further ties which neither half saw before.

Explicitly messing with the state tree with your bare hands seems obviously at-own-risk. Better tools for explicitly feezing etc... I guess these would need to interact with this, re-run the check for ties which setup does here?

mcabbott · 2022-01-30T04:43:52Z

src/interface.jl

+function setup(rule, x; ties = Pair[], cache = IdDict())
+  tree = _setup(rule, x, (); ties, cache)
+  isempty(ties) ? tree : Tree(ties, tree)
+end


Mostly accidentally, this method in fact lets you construct ties where they aren't inferred:

julia> using StaticArrays julia> st = Optimisers.setup(Descent(-0.01), (SA[1,2], SA[3,4]), ties = [(2,) => (1,)]) Optimisers.Tree([(2,) => (1,)], (Leaf(Descent{Float64}(-0.01), nothing), Leaf(Descent{Float64}(-0.01), nothing))) julia> Optimisers.update!(st, (SA[1,2], SA[3,4]), (SA[5,6], SA[7,8])) (Optimisers.Tree([(2,) => (1,)], (Leaf(Descent{Float64}(-0.01), nothing), Leaf(Descent{Float64}(-0.01), nothing))), ([1.12, 2.14], [1.12, 2.14]))

darsnack · 2022-01-31T22:07:10Z

Had the time to go through this and FluxML/Functors.jl#33. I'm happy with both PRs. I had one bike shed comment, but after reading the discussion, I think it's relevant too.

Should Tied be something more broad like InitializedTree. Something to indicate that this is a tree + auxiliary information? In the future we might consider adding different kinds of auxiliary information. Going over the two cases above:

Doing a superset like backbone + classifier is probably simple enough to just splat the addresses correctly.
More complicated is gradual (un)freezing where branches come in and out. In this case, I think it's easier to keep the trees complete and include auxiliary information that says "this branch is frozen, don't recurse or update." This also opens this piece of the design up so that other packages can write iterative freezing algorithms and all they have to do is use utilities like fsimilar, etc.

ToucheSir · 2022-02-01T06:08:09Z

I have to admit the implementation is a tad brain-warping, though all that complexity is necessary so there's not much more to say. I agree with Kyle that we should give this a go. If some unforeseen case rears its ugly head then hey, Optimisers.jl isn't anywhere near stable yet and the tied logic is sufficiently decoupled from the core Optimisers functionality.

mcabbott · 2022-02-01T13:16:08Z

Re interaction with other things:

Storing this information in some struct at the base of the state tree alters its shape, and means that anything else walking this tree needs to know about this storage struct, the one which doesn't have a parallel in the model. Perhaps that's an argument against this way of doing it. But storing tie information (say) at the first leaf of each pair instead, seems tricky too --- would the address stored there still be relative to the root? That seems more dangerous, as grafting branches would make it wrong.
To freeze some weights there's a similar question of where to store the information. It could also be stored at the base (either expanding Tied to have two jobs, or by composing two such things, each with their own update! dispatch). Such base storage would need updating on any graft / backbone + classifier operation. It could also be stored at the leaf, e.g. by adding a "freeze" flag to Leaf. Or it could be stored by inserting a struct at the base of the sub-tree which is entirely frozen. My guess is that per-leaf sounds the simplest in fact. Unlike ties you aren't forced to think about the "address" of particular leaves.

Edit: #49 is a sketch of how freezing at the leaf would look.

src/interface.jl

ToucheSir · 2022-02-09T06:14:46Z

Anything you want to add here before we consider letting users have at it?

mcabbott · 2022-02-09T12:26:44Z

~~It still copies in most of FluxML/Functors.jl#33, here: https://github.com/FluxML/Optimisers.jl/pull/42/files#diff-b000beedbfab2d74910823607528b8c42d0ce41a9955e02a3e572e5d40a9762bR14-R36~~ (Now resolved.)

mcabbott · 2022-02-09T12:33:38Z

src/interface.jl

+function setup(rule, x; ties = Pair[], cache = IdDict())
+  tree = _setup(rule, x, (); ties, cache)
+  isempty(ties) ? tree : Tied(ties, tree)
+end
+
+function _setup(rule, x, addr; ties, cache)
+  usecache = !isbits(x) && cache !== false


At present this does not use the cache for isbits things, and allows cache=false to disable it completely. Ideally this would probably be harmonised with Functors.

ToucheSir · 2022-06-24T05:01:40Z

A thought I had today was whether we couldn't obviate the need for Tied by making Leaf a mutable struct. Then tying two or more params could be represented by sharing the same leaf instance in the tree. This would not solve the composition problem, but it could reduce quite a bit of internal complexity and would work well with #49.

mcabbott · 2022-06-29T03:48:54Z

A thought I had today was whether we couldn't obviate the need for Tied by making Leaf a mutable struct. Then tying two or more params could be represented by sharing the same leaf instance in the tree.

I'm not sure I see how this would work. Certainly setup could produce such a thing, with a cache alla Functors. But on the walk for update!, you need to know when you get to the first of a tied pair, so that you accumulate the two gradients and apply the rule once. To do this, update! would I think have to build something much like Tied as a first pass, before proceeding much like this.

ToucheSir · 2022-07-04T06:06:52Z

The idea is that update! on a non-leaf runs two passes: first accumulate gradients into a cache, and then do the usual traversal with a visited set of leaves to avoid running rules twice. I tried writing some pseudocode for this, but it quickly metastasized into a full-blown branch. Will try to toss up a PR showing off the approach this week.

mcabbott force-pushed the duplicated branch 3 times, most recently from c19e259 to d9f4d80 Compare January 30, 2022 01:53

This comment has been minimized.

Sign in to view

mcabbott force-pushed the duplicated branch from d9f4d80 to 2ff2022 Compare January 30, 2022 04:23

mcabbott commented Jan 30, 2022

View reviewed changes

ToucheSir mentioned this pull request Jan 30, 2022

Extract common functionality into fold FluxML/Functors.jl#32

Open

This was referenced Jan 31, 2022

fvec FluxML/Functors.jl#31

Closed

Add more Base types FluxML/Functors.jl#28

Closed

mcabbott force-pushed the duplicated branch from 013c743 to 1d7e153 Compare January 31, 2022 19:49

mcabbott mentioned this pull request Feb 1, 2022

Per-leaf freezing #49

Closed

mcabbott force-pushed the duplicated branch from a40225a to cb2500e Compare February 7, 2022 17:30

ToucheSir mentioned this pull request Feb 8, 2022

Register 0.2 #52

Closed

ToucheSir reviewed Feb 8, 2022

View reviewed changes

src/interface.jl Show resolved Hide resolved

mcabbott commented Feb 9, 2022

View reviewed changes

mcabbott force-pushed the duplicated branch from cb2500e to f13feb0 Compare February 20, 2022 14:55

ToucheSir mentioned this pull request Apr 29, 2022

Use Optimisers.jl FluxML/FluxTraining.jl#112

Closed

mcabbott added the enhancement New feature or request label Jun 7, 2022

mcabbott mentioned this pull request Jun 24, 2022

Functor Transpose et. al. FluxML/Functors.jl#33

Merged

mcabbott force-pushed the duplicated branch from f13feb0 to 4230fd8 Compare June 24, 2022 19:04

mcabbott added 2 commits June 28, 2022 21:20

handle tied weights via pick + place

a9978f8

renaming

891e670

mcabbott added 9 commits June 28, 2022 21:20

extend to handle tied transposed matrices

009d101

tweak

0fa0890

doc section tied weights

a5fd7fa

doc section trainable

af154c1

other doc updates

858e31d

add some headings, more logical order, no real changes

30c5ea0

rm usecache to simplify

269b26d

rm contents of FluxML/Functors.jl#33

c57f1fc

fix tests

e56c67a

mcabbott force-pushed the duplicated branch from c4b38aa to e56c67a Compare June 29, 2022 03:20

mcabbott mentioned this pull request Jun 29, 2022

use Functors 0.3 in Flux FluxML/Flux.jl#2007

Merged

3 tasks

mcabbott mentioned this pull request Jul 7, 2022

"Optimisers.jl does not at present handle tied weights, sorry." #97

Closed

ToucheSir mentioned this pull request Jul 10, 2022

Transparent handling of tied weights #100

Closed

mcabbott mentioned this pull request Oct 12, 2022

Allow shared parameters, take III #106

Merged

mcabbott closed this in #106 Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle tied weights in `update!` #42

Handle tied weights in `update!` #42

mcabbott commented Jan 30, 2022 •

edited

Loading

This comment has been minimized.

ToucheSir commented Jan 30, 2022

mcabbott commented Jan 30, 2022

mcabbott Jan 30, 2022

darsnack commented Jan 31, 2022 •

edited

Loading

ToucheSir commented Feb 1, 2022

mcabbott commented Feb 1, 2022 •

edited

Loading

ToucheSir commented Feb 9, 2022

mcabbott commented Feb 9, 2022 •

edited

Loading

mcabbott Feb 9, 2022

ToucheSir commented Jun 24, 2022

mcabbott commented Jun 29, 2022

ToucheSir commented Jul 4, 2022

Handle tied weights in update! #42

Handle tied weights in update! #42

Conversation

mcabbott commented Jan 30, 2022 • edited Loading

This comment has been minimized.

ToucheSir commented Jan 30, 2022

mcabbott commented Jan 30, 2022

mcabbott Jan 30, 2022

Choose a reason for hiding this comment

darsnack commented Jan 31, 2022 • edited Loading

ToucheSir commented Feb 1, 2022

mcabbott commented Feb 1, 2022 • edited Loading

ToucheSir commented Feb 9, 2022

mcabbott commented Feb 9, 2022 • edited Loading

mcabbott Feb 9, 2022

Choose a reason for hiding this comment

ToucheSir commented Jun 24, 2022

mcabbott commented Jun 29, 2022

ToucheSir commented Jul 4, 2022

Handle tied weights in `update!` #42

Handle tied weights in `update!` #42

mcabbott commented Jan 30, 2022 •

edited

Loading

darsnack commented Jan 31, 2022 •

edited

Loading

mcabbott commented Feb 1, 2022 •

edited

Loading

mcabbott commented Feb 9, 2022 •

edited

Loading