Omnibus PR, including switch to explicit style differentiation #251

ablaom · 2024-04-30T02:12:45Z

This PR combines a number of changes, which for technical reasons could not be easily split up. The most important change, anticipating a Flux 0.15 breakage, is the switch to explicit differentiation; so this PR replaces #230. Shout out to @pat-alt for outlining a solution there.

Closes #221.

To do:

Replace implicit style parameter updates with explicit style parameter updates, in
line with planned Zygote/Flux deprecations.
Refactor code to use optimisers from Optimisers.jl with setup/update pattern in
place of update! pattern. Also, rename private methods train! -> train_epoch
and fit! -> train to reflect new non-mutating behaviour. This possibly breaks
some "custom" models that have chosen to overload these technically private methods.
(RNG changes.) Change the default value of the model field rng from
Random.GLOBAL_RNG to Random.default_rng(). Change the seeded RNG, obtained by
specifying an integer value for rng, from MersenneTwister to Xoshiro.
Update the Short builder so that the rng argument of build(::Short, rng, ...)
is passed on to the Dropout layer, as these layers now support this on a GPU, at
least for rng=Random.default_rng().
Change the implementation of L1/L2 regularization from explicit loss penalization to
weight/sign decay (internally chained with the user-specified
optimiser). Breaking: The losses reported in the history will no longer be
penalized, because the penalty is not explicitly computed.
Update documentation to reflect use of Optimisers.jl optimisers, instead of Flux.jl native optimisers. And on changes to the rng defaults. Waiting on 🚀 Instate documentation for MLJFlux #252

ablaom · 2024-05-03T04:06:24Z

This PR is above average in complexity. This means a review is particularly important but it's also going to be more work than usual. @pat-alt Do you have any interest and time to review over the next 3 weeks, say?

My apologies in advance for slightly messy commits. I temporarily lost access to a GPU for local testing and was shooting blind for a while.

ablaom · 2024-05-03T04:13:35Z

test/mlj_model_interface.jl

+        for i in 1:n_batches
+            batch_loss, gs = Flux.withgradient(chain) do m
+                yhat = m(X[i])
+                loss(yhat, y[i]) + sum(penalty, Optimisers.trainables(m))/n_batches


@pat-alt Just FYI: Optimisers.jl added trainables which offers this solution to refactor along your original line. It superficially seems like Flux.params, but doesn't seem to suffer the same problems we were seeing (if you can remember back that far 😉)

Thanks!

So from a user-persective, I'm currently unclear about a few things (perhaps I've just not looked carefully enough!):

Is the regularised_optimiser method user-facing? Where/when is it used here in this specific unit test?

Would it make sense to ship the Penalizer?

ablaom · 2024-05-22T22:26:13Z

@pat-alt Sorry to re-ping, but I'm not sure who else to ask for a review here.
@tiemvanderdeure Would you consider reviewing?

If possible, hoping for a merge in the next 3 weeks. Even a cursory look, would be much appreciated!

pat-alt · 2024-05-23T09:55:46Z

Hi! I'll try to have a look as soon as I can (probably on the weekend or next week).

tiemvanderdeure · 2024-05-23T17:31:19Z

If it is helpful I can also give reviewing this PR a go. Probably won't have time the next few days but early next week should be feasible.

ablaom · 2024-05-23T20:23:57Z

Let's see if @pat-alt is able to finds some time.

pat-alt

I mostly have a few comments about 1) deprecation and 2) clarification of how regularization is actually implemented on the user-end.

Otherwise, this looks good to me!

pat-alt · 2024-05-29T09:15:50Z

src/builders.jl

@@ -14,13 +14,12 @@
 abstract type Builder <: MLJModelInterface.MLJType end

 """
-    Linear(; σ=Flux.relu, rng=Random.GLOBAL_RNG)


Is it worth deprecating this?

pat-alt · 2024-05-29T09:20:34Z

src/core.jl

 end


 """
-    fit!(model::MLJFlux.MLJFluxModel, penalty, chain, optimiser, epochs, verbosity, X, y)


I think this, too, might be worth deprecating. If I understand this correctly, existing extensions that overload MLJFlux.fit! like here won't work anymore? As in, they should now be overloading the train function?

pat-alt · 2024-05-29T09:22:13Z

src/mlj_model_interface.jl

@@ -43,7 +49,26 @@ end
 const ERR_BUILDER =
    "Builder does not appear to build an architecture compatible with supplied data. "

-true_rng(model) = model.rng isa Integer ? MersenneTwister(model.rng) : model.rng
+true_rng(model) = model.rng isa Integer ? Random.Xoshiro(model.rng) : model.rng


Just of curiosity, why this change?

pat-alt · 2024-05-29T09:26:48Z

src/mlj_model_interface.jl

+# ensure penalization over an epoch does not scale with the choice of batch size; see
+# https://github.com/FluxML/MLJFlux.jl/issues/213.
+
+function regularized_optimiser(model, nbatches)


So if I read this correctly, it is not possible to specify either L1 or L2? The regularized optimiser will always apply both?

pat-alt · 2024-05-29T09:38:41Z

test/mlj_model_interface.jl

+        for i in 1:n_batches
+            batch_loss, gs = Flux.withgradient(chain) do m
+                yhat = m(X[i])
+                loss(yhat, y[i]) + sum(penalty, Optimisers.trainables(m))/n_batches


Thanks!

So from a user-persective, I'm currently unclear about a few things (perhaps I've just not looked carefully enough!):

Is the regularised_optimiser method user-facing? Where/when is it used here in this specific unit test?

Would it make sense to ship the Penalizer?

ablaom · 2024-05-29T22:43:54Z

Thanks @pat-alt for your review. Much appreciated. I've made a few tweaks in response to the comments.

ablaom added 2 commits April 29, 2024 16:21

add Optimisers to Project [deps]

8ed2d15

use explicit AD; refactor optimisers; disable l1/l2 reglrztn temporarily

8924a61

ablaom marked this pull request as draft April 30, 2024 02:12

ablaom added 6 commits May 1, 2024 12:46

test adjustment: GPU's don't support StableRNGs

52c0078

further RNG fixes in tests

1ed8b49

remove invalid test of optimiser reproducibility on GPU

14615f9

oops - forgotten end

bf8f461

oops again - use accel instead of acceleration

bd5aa8b

add some debugging lines

c806dc5

ablaom mentioned this pull request May 1, 2024

add constructor for binary classifiers #248

Open

2 tasks

ablaom added 7 commits May 2, 2024 08:57

use default_rng when comparing acceleration methods

99100b8

disable test comparing losses b/w CPU/GPU in presence of dropout

f60be3f

oops. disabled test for wrong model

3c39a26

fix test no longer broken; dump some redundant testing of optimiser

bf66132

revert to using StableRNGs when comparing accelerations for images

14b1993

refactor regularization (loss penalties)

ffb20bd

simplify: use MLP builder instead of custom Short2 in tests

b3b41ac

ablaom mentioned this pull request May 3, 2024

221 stop using implicit style differentiating #230

Closed

2 tasks

ablaom marked this pull request as ready for review May 3, 2024 03:59

ablaom commented May 3, 2024

View reviewed changes

pat-alt self-requested a review May 29, 2024 09:08

pat-alt reviewed May 29, 2024

View reviewed changes

ablaom added 2 commits May 30, 2024 10:29

address review: add logic to regularized_optimiser()

4af84e5

update docstrings for train and train_epoch

943965d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omnibus PR, including switch to explicit style differentiation #251

Omnibus PR, including switch to explicit style differentiation #251

ablaom commented Apr 30, 2024 •

edited

ablaom commented May 3, 2024

ablaom May 3, 2024

pat-alt May 29, 2024

ablaom commented May 22, 2024

pat-alt commented May 23, 2024

tiemvanderdeure commented May 23, 2024

ablaom commented May 23, 2024

pat-alt left a comment

pat-alt May 29, 2024

pat-alt May 29, 2024

pat-alt May 29, 2024

pat-alt May 29, 2024

pat-alt May 29, 2024

ablaom commented May 29, 2024

Omnibus PR, including switch to explicit style differentiation #251

Are you sure you want to change the base?

Omnibus PR, including switch to explicit style differentiation #251

Conversation

ablaom commented Apr 30, 2024 • edited

ablaom commented May 3, 2024

ablaom May 3, 2024

Choose a reason for hiding this comment

pat-alt May 29, 2024

Choose a reason for hiding this comment

ablaom commented May 22, 2024

pat-alt commented May 23, 2024

tiemvanderdeure commented May 23, 2024

ablaom commented May 23, 2024

pat-alt left a comment

Choose a reason for hiding this comment

pat-alt May 29, 2024

Choose a reason for hiding this comment

pat-alt May 29, 2024

Choose a reason for hiding this comment

pat-alt May 29, 2024

Choose a reason for hiding this comment

pat-alt May 29, 2024

Choose a reason for hiding this comment

pat-alt May 29, 2024

Choose a reason for hiding this comment

ablaom commented May 29, 2024

ablaom commented Apr 30, 2024 •

edited