Skip to content

Commit

Permalink
Cleanup for v0.14 release (#2283)
Browse files Browse the repository at this point in the history
* prepare release

* cleanup

* compat warning

* 0.13 -> 0.14 deprecations

* deprecate default_rng_value

* fix quickstart and readme

* tweak gpu support section

* import default_rng

* address review comments

* fix GroupNorm tests

* fix train!
  • Loading branch information
CarloLucibello committed Jul 12, 2023
1 parent 45258e5 commit 5d3eaa9
Show file tree
Hide file tree
Showing 23 changed files with 134 additions and 348 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

Flux is an elegant approach to machine learning. It's a 100% pure-Julia stack, and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable.

Works best with [Julia 1.8](https://julialang.org/downloads/) or later. Here's a very short example to try it out:
Works best with [Julia 1.9](https://julialang.org/downloads/) or later. Here's a very short example to try it out:
```julia
using Flux, Plots
data = [([x], 2x-x^3) for x in -2:0.1f0:2]
Expand Down
13 changes: 12 additions & 1 deletion docs/src/gpu.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
# GPU Support

NVIDIA GPU support should work out of the box on systems with CUDA and CUDNN installed. For more details see the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) readme.
Starting with v0.14, Flux doesn't force a specific GPU backend and the corresponding package dependencies on the users.
Thanks to the [package extension mechanism](
https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)) introduced in julia v1.9, Flux conditionally load GPU specific code once a GPU package is made available (e.g. through `using CUDA`).

NVIDIA GPU support requires the packages `CUDA.jl` and `cuDNN.jl` to be installed in the environment. In the julia REPL, type `] add CUDA, cuDNN` to install them. For more details see the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) readme.

AMD GPU support is available since Julia 1.9 on systems with ROCm and MIOpen installed. For more details refer to the [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) repository.

Metal GPU acceleration is available on Apple Silicon hardware. For more details refer to the [Metal.jl](https://github.com/JuliaGPU/Metal.jl) repository. Metal support in Flux is experimental and many features are not yet available.

In order to trigger GPU support in Flux, you need to call `using CUDA`, `using AMDGPU` or `using Metal`
in your code. Notice that for CUDA, explicitely loading also `cuDNN` is not required, but the package has to be installed in the environment.


!!! compat "Flux ≤ 0.13"
Old versions of Flux automatically installed CUDA.jl to provide GPU support. Starting from Flux v0.14, CUDA.jl is not a dependency anymore and has to be installed manually.

## Checking GPU Availability

By default, Flux will run the checks on your system to see if it can support GPU functionality. You can check if Flux identified a valid GPU setup by typing the following:
Expand Down
3 changes: 2 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ Flux is a library for machine learning. It comes "batteries-included" with many

### Installation

Download [Julia 1.9](https://julialang.org/downloads/) or later, preferably the current stable release. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt. This will automatically install several other packages, including [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) for Nvidia GPU support.
Download [Julia 1.9](https://julialang.org/downloads/) or later, preferably the current stable release. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt.
For Nvidia GPU support, you will also need to install the `CUDA` and the `cuDNN` packages. For AMD GPU support, install the `AMDGPU` package. For acceleration on Apple Silicon, install the `Metal` package.

### Learning Flux

Expand Down
4 changes: 2 additions & 2 deletions docs/src/models/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ However, doing this requires the `struct` to have a corresponding constructor th

When it is desired to not include all the model parameters (for e.g. transfer learning), we can simply not pass in those layers into our call to `params`.

!!! compat "Flux ≤ 0.13"
!!! compat "Flux ≤ 0.14"
The mechanism described here is for Flux's old "implicit" training style.
When upgrading for Flux 0.14, it should be replaced by [`freeze!`](@ref Flux.freeze!) and `thaw!`.
When upgrading for Flux 0.15, it should be replaced by [`freeze!`](@ref Flux.freeze!) and `thaw!`.

Consider a simple multi-layer perceptron model where we want to avoid optimising the first two `Dense` layers. We can obtain
this using the slicing features `Chain` provides:
Expand Down
2 changes: 1 addition & 1 deletion docs/src/models/layers.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Perhaps `Scale` isn't quite fully connected, but it may be thought of as `Dense(

!!! compat "Flux ≤ 0.12"
Old versions of Flux accepted only `Dense(in, out, act)` and not `Dense(in => out, act)`.
This notation makes a `Pair` object. If you get an error like `MethodError: no method matching Dense(::Pair{Int64,Int64})`, this means that you should upgrade to Flux 0.13.
This notation makes a `Pair` object. If you get an error like `MethodError: no method matching Dense(::Pair{Int64,Int64})`, this means that you should upgrade to newer Flux versions.


## Convolution Models
Expand Down
8 changes: 4 additions & 4 deletions docs/src/models/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ If you have used neural networks before, then this simple example might be helpf
If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.

```julia
# With Julia 1.7+, this will prompt if neccessary to install everything, including CUDA:
using Flux, Statistics, ProgressMeter
# This will prompt if neccessary to install everything, including CUDA:
using Flux, CUDA, Statistics, ProgressMeter

# Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32}
Expand Down Expand Up @@ -102,7 +102,7 @@ for epoch in 1:1_000
end
```

!!! compat "Implicit-style training, Flux ≤ 0.13"
!!! compat "Implicit-style training, Flux ≤ 0.14"
Until recently Flux's training worked a bit differently.
Any code which looks like
```
Expand All @@ -113,5 +113,5 @@ end
train!((x,y) -> loss(model, x, y), Flux.params(model), loader, opt)
```
(with `Flux.params`) is in the old "implicit" style.
This still works on Flux 0.13, but will be removed from Flux 0.14.
This still works on Flux 0.14, but will be removed from Flux 0.15.
See the [training section](@ref man-training) for more details.
10 changes: 4 additions & 6 deletions docs/src/training/callbacks.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

```@docs
Flux.throttle
Flux.stop
Flux.skip
```

## Patience Helpers
Expand All @@ -26,7 +24,7 @@ end
es = early_stopping(loss, 2; init_score = 9)

# this will stop at the 6th (4 decreasing + 2 increasing calls) epoch
@epochs 10 begin
for epoch in 1:10
es() && break
end
```
Expand All @@ -43,7 +41,7 @@ end
es = early_stopping(acc, 3; delta = (best_score, score) -> score - best_score)

# this will iterate until the 10th epoch
@epochs 10 begin
for epoch in 1:10
es() && break
end
```
Expand All @@ -60,12 +58,12 @@ Both `predicate` in `patience` and `f` in `early_stopping` / `plateau` can accep
trigger = patience((a; b) -> a > b, 3)

# this will iterate until the 10th epoch
@epochs 10 begin
for epoch in 1:10
trigger(1; b = 2) && break
end

# this will stop at the 3rd epoch
@epochs 10 begin
for epoch in 1:10
trigger(3; b = 2) && break
end
```
Expand Down
37 changes: 6 additions & 31 deletions docs/src/training/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Because of this:
* Flux defines its own version of `setup` which checks this assumption.
(Using instead `Optimisers.setup` will also work, they return the same thing.)

The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.13, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
The available rules are listed the [optimisation rules](@ref man-optimisers) page here;
see the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.

Expand All @@ -37,11 +37,11 @@ Optimisers.freeze!
Optimisers.thaw!
```

## Implicit style (Flux ≤ 0.13)
## Implicit style (Flux ≤ 0.14)

Flux used to handle gradients, training, and optimisation rules quite differently.
The new style described above is called "explicit" by Zygote, and the old style "implicit".
Flux 0.13 is the transitional version which supports both; Flux 0.14 will remove the old.
Flux 0.13 and 0.14 are the transitional version which supports both; Flux 0.15 will remove the old.

!!! compat "How to upgrade"
The blue-green boxes in the [training section](@ref man-training) describe
Expand All @@ -62,26 +62,6 @@ Flux.Optimise.update!(opt::Flux.Optimise.AbstractOptimiser, xs::AbstractArray, g
Flux.Optimise.train!(loss, ps::Flux.Params, data, opt::Flux.Optimise.AbstractOptimiser; cb)
```

Note that, by default, `train!` only loops over the data once (a single "epoch").
A convenient way to run multiple epochs from the REPL is provided by `@epochs`.

```julia
julia> using Flux: @epochs

julia> @epochs 2 println("hello")
[ Info: Epoch 1
hello
[ Info: Epoch 2
hello

julia> @epochs 2 Flux.train!(...)
# Train for two epochs
```
```@docs
Flux.@epochs
```
## Callbacks

Implicit `train!` takes an additional argument, `cb`, that's used for callbacks so that you can observe the training process. For example:
Expand All @@ -98,14 +78,9 @@ A more typical callback might look like this:
test_x, test_y = # ... create single batch of test data ...
evalcb() = @show(loss(test_x, test_y))
throttled_cb = throttle(evalcb, 5)
Flux.@epochs 20 Flux.train!(objective, ps, data, opt, cb = throttled_cb)
```
Calling `Flux.stop()` in a callback will exit the training loop early.
```julia
cb = function ()
accuracy() > 0.9 && Flux.stop()
for epoch in 1:20
@info "Epoch $epoch"
Flux.train!(objective, ps, data, opt, cb = throttled_cb)
end
```

Expand Down
14 changes: 7 additions & 7 deletions docs/src/training/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,14 @@ It is also important that every `update!` step receives a newly gradient compute
as this will be change whenever the model's parameters are changed, and for each new data point.

!!! compat "Implicit gradients"
Flux ≤ 0.13 used Zygote's "implicit" mode, in which `gradient` takes a zero-argument function.
Flux ≤ 0.14 used Zygote's "implicit" mode, in which `gradient` takes a zero-argument function.
It looks like this:
```
pars = Flux.params(model)
grad = gradient(() -> loss(model(input), label), pars)
```
Here `pars::Params` and `grad::Grads` are two dictionary-like structures.
Support for this will be removed from Flux 0.14, and these blue (teal?) boxes
Support for this will be removed from Flux 0.15, and these blue (teal?) boxes
explain what needs to change.

## Loss Functions
Expand All @@ -90,7 +90,7 @@ like [`mse`](@ref Flux.Losses.mse) for mean-squared error or [`crossentropy`](@r
are available from the [`Flux.Losses`](../models/losses.md) module.

!!! compat "Implicit-style loss functions"
Flux ≤ 0.13 needed a loss function which closed over a reference to the model,
Flux ≤ 0.14 needed a loss function which closed over a reference to the model,
instead of being a pure function. Thus in old code you may see something like
```
loss(x, y) = sum((model(x) .- y).^2)
Expand Down Expand Up @@ -211,7 +211,7 @@ Or explicitly writing the anonymous function which this `do` block creates,
!!! compat "Implicit-style `train!`"
This is a new method of `train!`, which takes the result of `setup` as its 4th argument.
The 1st argument is a function which accepts the model itself.
Flux versions ≤ 0.13 provided a method of `train!` for "implicit" parameters,
Flux versions ≤ 0.14 provided a method of `train!` for "implicit" parameters,
which works like this:
```
train!((x,y) -> loss(model(x), y), Flux.params(model), train_set, Adam())
Expand Down Expand Up @@ -342,7 +342,7 @@ for epoch in 1:1000
end
```

!!! compat "Flux ≤ 0.13"
!!! compat "Flux ≤ 0.14"
With the old "implicit" optimiser, `opt = Adam(0.1)`, the equivalent was to
directly mutate the `Adam` struct, `opt.eta = 0.001`.

Expand Down Expand Up @@ -374,7 +374,7 @@ train!(loss, bimodel, data, opt_state)
Flux.thaw!(opt_state)
```

!!! compat "Flux ≤ 0.13"
!!! compat "Flux ≤ 0.14"
The earlier "implicit" equivalent was to pass to `gradient` an object referencing only
part of the model, such as `Flux.params(bimodel.layers.enc)`.

Expand All @@ -383,7 +383,7 @@ Flux.thaw!(opt_state)

Flux used to handle gradients, training, and optimisation rules quite differently.
The new style described above is called "explicit" by Zygote, and the old style "implicit".
Flux 0.13 is the transitional version which supports both.
Flux 0.13 and 0.14 are the transitional versions which support both.

The blue-green boxes above describe the changes.
For more details on training in the implicit style, see [Flux 0.13.6 documentation](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
Expand Down
4 changes: 2 additions & 2 deletions docs/src/training/zygote.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ Zygote.hessian_reverse
Zygote.diaghessian
```

## Implicit style (Flux ≤ 0.13)
## Implicit style (Flux ≤ 0.14)

Flux used to use what Zygote calls "implicit" gradients, [described here](https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1) in its documentation.
However, support for this will be removed from Flux 0.14.
However, support for this will be removed from Flux 0.15.

!!! compat "Training"
The blue-green boxes in the [training section](@ref man-training) describe
Expand Down
11 changes: 7 additions & 4 deletions docs/src/tutorials/2021-01-26-mlp.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ To run this example, we need the following packages:
```julia
using Flux, Statistics
using Flux.Data: DataLoader
using Flux: onehotbatch, onecold, logitcrossentropy, throttle, @epochs, params
using Flux: onehotbatch, onecold, logitcrossentropy, throttle, params
using Base.Iterators: repeated
using CUDA
using MLDatasets
Expand Down Expand Up @@ -138,8 +138,11 @@ function train(; kws...)
## Training
evalcb = () -> @show(loss_all(train_data, m))
opt = Adam(args.rate)

@epochs args.epochs Flux.train!(loss, params(m), train_data, opt, cb = evalcb)

for epoch in 1:args.epochs
@info "Epoch $epoch"
Flux.train!(loss, params(m), train_data, opt, cb = evalcb)
end

@show accuracy(train_data, m)

Expand All @@ -153,7 +156,7 @@ end
* **Initializes the model parameters:** Creates the `args` object that contains the defult values for training our model.
* **Loads the train and test data:** Calls the function `getdata` we defined above.
* **Constructs the model:** Builds the model and loads the train and test data sets, and our model onto the GPU (if available).
* **Trains the model:** Defines the *callback* function `evalcb` to show the value of the `loss_all` function during the training process. Then, it sets [Adam](@ref Flux.Optimise.Adam) as the optimiser for training out model. Finally, it runs the training process with the macro `@epochs` for `10` epochs (as defined in the `args` object) and shows the `accuracy` value for the train and test data.
* **Trains the model:** Defines the *callback* function `evalcb` to show the value of the `loss_all` function during the training process. Then, it sets [Adam](@ref Flux.Optimise.Adam) as the optimiser for training out model. Finally, it runs the training process for `10` epochs (as defined in the `args` object) and shows the `accuracy` value for the train and test data.


To see the full version of this example, see [Simple multi-layer perceptron - model-zoo](https://github.com/FluxML/model-zoo/blob/master/vision/mlp_mnist/mlp_mnist.jl).
Expand Down
1 change: 0 additions & 1 deletion docs/src/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ These functions call:

```@docs
Flux.rng_from_array
Flux.default_rng_value
Flux.nfan
```

Expand Down
4 changes: 1 addition & 3 deletions src/Flux.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ using MacroTools: @forward
using MLUtils
import Optimisers: Optimisers, trainable, destructure # before v0.13, Flux owned these functions
using Optimisers: freeze!, thaw!, adjust!

using Random: default_rng
using Zygote, ChainRulesCore
using Zygote: Params, @adjoint, gradient, pullback
using Zygote.ForwardDiff: value
Expand All @@ -32,8 +32,6 @@ export Chain, Dense, Embedding, Maxout, SkipConnection, Parallel, PairwiseFusion

include("optimise/Optimise.jl")
using .Optimise
using .Optimise: @epochs
using .Optimise: skip
export Descent, Adam, Momentum, Nesterov, RMSProp,
AdaGrad, AdaMax, AdaDelta, AMSGrad, NAdam, OAdam,
AdamW, RAdam, AdaBelief, InvDecay, ExpDecay,
Expand Down
28 changes: 8 additions & 20 deletions src/deprecations.jl
Original file line number Diff line number Diff line change
@@ -1,19 +1,3 @@
# v0.12 deprecations

function ones(dims...)
Base.depwarn("Flux.ones(size...) is deprecated, please use Flux.ones32(size...) or Base.ones(Float32, size...)", :ones, force=true)
Base.ones(Float32, dims...)
end
ones(T::Type, dims...) = Base.ones(T, dims...)

function zeros(dims...)
Base.depwarn("Flux.zeros(size...) is deprecated, please use Flux.zeros32(size...) or Base.zeros(Float32, size...)", :zeros, force=true)
Base.zeros(Float32, dims...)
end
zeros(T::Type, dims...) = Base.zeros(T, dims...)

ones32(::Type, dims...) = throw(ArgumentError("Flux.ones32 is always Float32, use Base.ones to specify the element type"))
zeros32(::Type, dims...) = throw(ArgumentError("Flux.zeros32 is always Float32, use Base.zeros to specify the element type"))

# v0.13 deprecations

Expand Down Expand Up @@ -59,7 +43,7 @@ function loadparams!(m, xs)
end

# Channel notation: Changed to match Conv, but very softly deprecated!
# Perhaps change to @deprecate for v0.14, but there is no plan to remove these.
# Perhaps change to @deprecate for v0.15, but there is no plan to remove these.
Dense(in::Integer, out::Integer, σ = identity; kw...) =
Dense(in => out, σ; kw...)
Bilinear(in1::Integer, in2::Integer, out::Integer, σ = identity; kw...) =
Expand All @@ -86,7 +70,7 @@ Base.@deprecate_binding Data Flux false "Sub-module Flux.Data has been removed.

@deprecate paramtype(T,m) _paramtype(T,m) false # internal method, renamed to make this clear

@deprecate rng_from_array() default_rng_value()
@deprecate rng_from_array() Random.default_rng()

function istraining()
Base.depwarn("Flux.istraining() is deprecated, use NNlib.within_gradient(x) instead", :istraining)
Expand Down Expand Up @@ -216,13 +200,17 @@ ChainRulesCore.@non_differentiable _greek_ascii_depwarn(::Any...)


# v0.14 deprecations
@deprecate default_rng_value() Random.default_rng()


# v0.15 deprecations

# Enable these when 0.14 is released, and delete const ClipGrad = Optimise.ClipValue etc:
# Enable these when 0.15 is released, and delete const ClipGrad = Optimise.ClipValue etc:
# Base.@deprecate_binding Optimiser OptimiserChain
# Base.@deprecate_binding ClipValue ClipGrad

# train!(loss::Function, ps::Zygote.Params, data, opt) = throw(ArgumentError(
# """On Flux 0.14, `train!` no longer accepts implicit `Zygote.Params`.
# """On Flux 0.15, `train!` no longer accepts implicit `Zygote.Params`.
# Instead of `train!(loss_xy, Flux.params(model), data, Adam())`
# it now needs `opt = Flux.setup(Adam(), model); train!(loss_mxy, model, data, opt)`
# where `loss_mxy` accepts the model as its first argument.
Expand Down
Loading

0 comments on commit 5d3eaa9

Please sign in to comment.