Cleanup for v0.14 release (#2283)

* prepare release * cleanup * compat warning * 0.13 -> 0.14 deprecations * deprecate default_rng_value * fix quickstart and readme * tweak gpu support section * import default_rng * address review comments * fix GroupNorm tests * fix train!
FluxML · Jul 12, 2023 · 5d3eaa9 · 5d3eaa9
1 parent 45258e5
commit 5d3eaa9
Show file tree

Hide file tree

Showing 23 changed files with 134 additions and 348 deletions.
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@
 
 Flux is an elegant approach to machine learning. It's a 100% pure-Julia stack, and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable.
 
-Works best with [Julia 1.8](https://julialang.org/downloads/) or later. Here's a very short example to try it out:
+Works best with [Julia 1.9](https://julialang.org/downloads/) or later. Here's a very short example to try it out:
 ```julia
 using Flux, Plots
 data = [([x], 2x-x^3) for x in -2:0.1f0:2]

diff --git a/docs/src/gpu.md b/docs/src/gpu.md
@@ -1,11 +1,22 @@
 # GPU Support
 
-NVIDIA GPU support should work out of the box on systems with CUDA and CUDNN installed. For more details see the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) readme.
+Starting with v0.14, Flux doesn't force a specific GPU backend and the corresponding package dependencies on the users. 
+Thanks to the [package extension mechanism](
+https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)) introduced in julia v1.9, Flux conditionally load GPU specific code once a GPU package is made available (e.g. through `using CUDA`).
+
+NVIDIA GPU support requires the packages `CUDA.jl` and `cuDNN.jl` to be installed in the environment. In the julia REPL, type `] add CUDA, cuDNN` to install them. For more details see the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) readme.
 
 AMD GPU support is available since Julia 1.9 on systems with ROCm and MIOpen installed. For more details refer to the [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) repository.
 
 Metal GPU acceleration is available on Apple Silicon hardware. For more details refer to the [Metal.jl](https://github.com/JuliaGPU/Metal.jl) repository. Metal support in Flux is experimental and many features are not yet available.
 
+In order to trigger GPU support in Flux, you need to call `using CUDA`, `using AMDGPU` or `using Metal`
+in your code. Notice that for CUDA, explicitely loading also `cuDNN` is not required, but the package has to be installed in the environment. 
+
+
+!!! compat "Flux ≤ 0.13"
+    Old versions of Flux automatically installed CUDA.jl to provide GPU support. Starting from Flux v0.14, CUDA.jl is not a dependency anymore and has to be installed manually.
+
 ## Checking GPU Availability
 
 By default, Flux will run the checks on your system to see if it can support GPU functionality. You can check if Flux identified a valid GPU setup by typing the following:

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -8,7 +8,8 @@ Flux is a library for machine learning. It comes "batteries-included" with many
 
 ### Installation
 
-Download [Julia 1.9](https://julialang.org/downloads/) or later, preferably the current stable release. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt. This will automatically install several other packages, including [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) for Nvidia GPU support.
+Download [Julia 1.9](https://julialang.org/downloads/) or later, preferably the current stable release. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt. 
+For Nvidia GPU support, you will also need to install the `CUDA` and the `cuDNN` packages. For AMD GPU support, install the `AMDGPU` package. For acceleration on Apple Silicon, install the `Metal` package.
 
 ### Learning Flux
 

diff --git a/docs/src/models/advanced.md b/docs/src/models/advanced.md
@@ -69,9 +69,9 @@ However, doing this requires the `struct` to have a corresponding constructor th
 
 When it is desired to not include all the model parameters (for e.g. transfer learning), we can simply not pass in those layers into our call to `params`.
 
-!!! compat "Flux ≤ 0.13"
+!!! compat "Flux ≤ 0.14"
     The mechanism described here is for Flux's old "implicit" training style.
-    When upgrading for Flux 0.14, it should be replaced by [`freeze!`](@ref Flux.freeze!) and `thaw!`.
+    When upgrading for Flux 0.15, it should be replaced by [`freeze!`](@ref Flux.freeze!) and `thaw!`.
 
 Consider a simple multi-layer perceptron model where we want to avoid optimising the first two `Dense` layers. We can obtain
 this using the slicing features `Chain` provides:

diff --git a/docs/src/models/layers.md b/docs/src/models/layers.md
@@ -29,7 +29,7 @@ Perhaps `Scale` isn't quite fully connected, but it may be thought of as `Dense(
 
 !!! compat "Flux ≤ 0.12"
     Old versions of Flux accepted only `Dense(in, out, act)` and not `Dense(in => out, act)`.
-    This notation makes a `Pair` object. If you get an error like `MethodError: no method matching Dense(::Pair{Int64,Int64})`, this means that you should upgrade to Flux 0.13.
+    This notation makes a `Pair` object. If you get an error like `MethodError: no method matching Dense(::Pair{Int64,Int64})`, this means that you should upgrade to newer Flux versions.
 
 
 ## Convolution Models

diff --git a/docs/src/models/quickstart.md b/docs/src/models/quickstart.md
@@ -5,8 +5,8 @@ If you have used neural networks before, then this simple example might be helpf
 If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.
 
 ```julia
-# With Julia 1.7+, this will prompt if neccessary to install everything, including CUDA:
-using Flux, Statistics, ProgressMeter
+# This will prompt if neccessary to install everything, including CUDA:
+using Flux, CUDA, Statistics, ProgressMeter
 
 # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
 noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
@@ -102,7 +102,7 @@ for epoch in 1:1_000
 end
 ```
 
-!!! compat "Implicit-style training, Flux ≤ 0.13"
+!!! compat "Implicit-style training, Flux ≤ 0.14"
     Until recently Flux's training worked a bit differently. 
     Any code which looks like 
     ```
@@ -113,5 +113,5 @@ end
     train!((x,y) -> loss(model, x, y), Flux.params(model), loader, opt)
     ```
     (with `Flux.params`) is in the old "implicit" style.
-    This still works on Flux 0.13, but will be removed from Flux 0.14.
+    This still works on Flux 0.14, but will be removed from Flux 0.15.
     See the [training section](@ref man-training) for more details.
diff --git a/docs/src/training/callbacks.md b/docs/src/training/callbacks.md
@@ -2,8 +2,6 @@
 
 ```@docs
 Flux.throttle
-Flux.stop
-Flux.skip
 ```
 
 ## Patience Helpers
@@ -26,7 +24,7 @@ end
 es = early_stopping(loss, 2; init_score = 9)
 
 # this will stop at the 6th (4 decreasing + 2 increasing calls) epoch
-@epochs 10 begin
+for epoch in 1:10
   es() && break
 end
 ```
@@ -43,7 +41,7 @@ end
 es = early_stopping(acc, 3; delta = (best_score, score) -> score - best_score)
 
 # this will iterate until the 10th epoch
-@epochs 10 begin
+for epoch in 1:10
   es() && break
 end
 ```
@@ -60,12 +58,12 @@ Both `predicate` in `patience` and `f` in `early_stopping` / `plateau` can accep
 trigger = patience((a; b) -> a > b, 3)
 
 # this will iterate until the 10th epoch
-@epochs 10 begin
+for epoch in 1:10
   trigger(1; b = 2) && break
 end
 
 # this will stop at the 3rd epoch
-@epochs 10 begin
+for epoch in 1:10
   trigger(3; b = 2) && break
 end
 ```

diff --git a/docs/src/training/reference.md b/docs/src/training/reference.md
@@ -10,7 +10,7 @@ Because of this:
 * Flux defines its own version of `setup` which checks this assumption.
   (Using instead `Optimisers.setup` will also work, they return the same thing.)
 
-The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.13, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
+The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
 The available rules are listed the [optimisation rules](@ref man-optimisers) page here;
 see the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.
 
@@ -37,11 +37,11 @@ Optimisers.freeze!
 Optimisers.thaw!
 ```
 
-## Implicit style (Flux ≤ 0.13)
+## Implicit style (Flux ≤ 0.14)
 
 Flux used to handle gradients, training, and optimisation rules quite differently.
 The new style described above is called "explicit" by Zygote, and the old style "implicit".
-Flux 0.13 is the transitional version which supports both; Flux 0.14 will remove the old.
+Flux 0.13 and 0.14 are the transitional version which supports both; Flux 0.15 will remove the old.
 
 !!! compat "How to upgrade"
     The blue-green boxes in the [training section](@ref man-training) describe
@@ -62,26 +62,6 @@ Flux.Optimise.update!(opt::Flux.Optimise.AbstractOptimiser, xs::AbstractArray, g
 Flux.Optimise.train!(loss, ps::Flux.Params, data, opt::Flux.Optimise.AbstractOptimiser; cb)
 ```
 
-Note that, by default, `train!` only loops over the data once (a single "epoch").
-A convenient way to run multiple epochs from the REPL is provided by `@epochs`.
-
-```julia
-julia> using Flux: @epochs
-
-julia> @epochs 2 println("hello")
-[ Info: Epoch 1
-hello
-[ Info: Epoch 2
-hello
-
-julia> @epochs 2 Flux.train!(...)
-# Train for two epochs
-```
-
-```@docs
-Flux.@epochs
-```
-
 ## Callbacks
 
 Implicit `train!` takes an additional argument, `cb`, that's used for callbacks so that you can observe the training process. For example:
@@ -98,14 +78,9 @@ A more typical callback might look like this:
 test_x, test_y = # ... create single batch of test data ...
 evalcb() = @show(loss(test_x, test_y))
 throttled_cb = throttle(evalcb, 5)
-Flux.@epochs 20 Flux.train!(objective, ps, data, opt, cb = throttled_cb)
-```
-
-Calling `Flux.stop()` in a callback will exit the training loop early.
-
-```julia
-cb = function ()
-  accuracy() > 0.9 && Flux.stop()
+for epoch in 1:20
+  @info "Epoch $epoch"
+  Flux.train!(objective, ps, data, opt, cb = throttled_cb)
 end
 ```
 

diff --git a/docs/src/training/training.md b/docs/src/training/training.md
@@ -65,14 +65,14 @@ It is also important that every `update!` step receives a newly gradient compute
 as this will be change whenever the model's parameters are changed, and for each new data point.
 
 !!! compat "Implicit gradients"
-    Flux ≤ 0.13 used Zygote's "implicit" mode, in which `gradient` takes a zero-argument function.
+    Flux ≤ 0.14 used Zygote's "implicit" mode, in which `gradient` takes a zero-argument function.
     It looks like this:
     ```
     pars = Flux.params(model)
     grad = gradient(() -> loss(model(input), label), pars)
     ```
     Here `pars::Params` and `grad::Grads` are two dictionary-like structures.
-    Support for this will be removed from Flux 0.14, and these blue (teal?) boxes
+    Support for this will be removed from Flux 0.15, and these blue (teal?) boxes
     explain what needs to change.
 
 ## Loss Functions
@@ -90,7 +90,7 @@ like [`mse`](@ref Flux.Losses.mse) for mean-squared error or [`crossentropy`](@r
 are available from the [`Flux.Losses`](../models/losses.md) module.
 
 !!! compat "Implicit-style loss functions"
-    Flux ≤ 0.13 needed a loss function which closed over a reference to the model,
+    Flux ≤ 0.14 needed a loss function which closed over a reference to the model,
     instead of being a pure function. Thus in old code you may see something like
     ```
     loss(x, y) = sum((model(x) .- y).^2)
@@ -211,7 +211,7 @@ Or explicitly writing the anonymous function which this `do` block creates,
 !!! compat "Implicit-style `train!`"
     This is a new method of `train!`, which takes the result of `setup` as its 4th argument.
     The 1st argument is a function which accepts the model itself.
-    Flux versions ≤ 0.13 provided a method of `train!` for "implicit" parameters,
+    Flux versions ≤ 0.14 provided a method of `train!` for "implicit" parameters,
     which works like this:
     ```
     train!((x,y) -> loss(model(x), y), Flux.params(model), train_set, Adam())
@@ -342,7 +342,7 @@ for epoch in 1:1000
 end
 ```
 
-!!! compat "Flux ≤ 0.13"
+!!! compat "Flux ≤ 0.14"
     With the old "implicit" optimiser, `opt = Adam(0.1)`, the equivalent was to
     directly mutate the `Adam` struct, `opt.eta = 0.001`. 
 
@@ -374,7 +374,7 @@ train!(loss, bimodel, data, opt_state)
 Flux.thaw!(opt_state)
 ```
 
-!!! compat "Flux ≤ 0.13"
+!!! compat "Flux ≤ 0.14"
     The earlier "implicit" equivalent was to pass to `gradient` an object referencing only
     part of the model, such as `Flux.params(bimodel.layers.enc)`.
 
@@ -383,7 +383,7 @@ Flux.thaw!(opt_state)
 
 Flux used to handle gradients, training, and optimisation rules quite differently.
 The new style described above is called "explicit" by Zygote, and the old style "implicit".
-Flux 0.13 is the transitional version which supports both.
+Flux 0.13 and 0.14 are the transitional versions which support both.
 
 The blue-green boxes above describe the changes.
 For more details on training in the implicit style, see [Flux 0.13.6 documentation](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).

diff --git a/docs/src/training/zygote.md b/docs/src/training/zygote.md
@@ -18,10 +18,10 @@ Zygote.hessian_reverse
 Zygote.diaghessian
 ```
 
-## Implicit style (Flux ≤ 0.13)
+## Implicit style (Flux ≤ 0.14)
 
 Flux used to use what Zygote calls "implicit" gradients, [described here](https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1) in its documentation.
-However, support for this will be removed from Flux 0.14.
+However, support for this will be removed from Flux 0.15.
 
 !!! compat "Training"
     The blue-green boxes in the [training section](@ref man-training) describe

diff --git a/docs/src/tutorials/2021-01-26-mlp.md b/docs/src/tutorials/2021-01-26-mlp.md
@@ -7,7 +7,7 @@ To run this example, we need the following packages:
 ```julia
 using Flux, Statistics
 using Flux.Data: DataLoader
-using Flux: onehotbatch, onecold, logitcrossentropy, throttle, @epochs, params
+using Flux: onehotbatch, onecold, logitcrossentropy, throttle, params
 using Base.Iterators: repeated
 using CUDA
 using MLDatasets
@@ -138,8 +138,11 @@ function train(; kws...)
     ## Training
     evalcb = () -> @show(loss_all(train_data, m))
     opt = Adam(args.rate)
-
-    @epochs args.epochs Flux.train!(loss, params(m), train_data, opt, cb = evalcb)
+
+    for epoch in 1:args.epochs
+        @info "Epoch $epoch"
+        Flux.train!(loss, params(m), train_data, opt, cb = evalcb)
+    end
 
     @show accuracy(train_data, m)
 
@@ -153,7 +156,7 @@ end
 * **Initializes the model parameters:** Creates the `args` object that contains the defult values for training our model.
 * **Loads the train and test data:** Calls the function `getdata` we defined above.
 * **Constructs the model:** Builds the model and loads the train and test data sets, and our model  onto the GPU (if available).
-* **Trains the model:** Defines the *callback* function `evalcb` to show the value of the `loss_all` function during the training process. Then, it sets [Adam](@ref Flux.Optimise.Adam) as the optimiser for training out model. Finally, it runs the training process with the macro `@epochs` for `10` epochs (as defined in the `args` object) and shows the `accuracy` value for the train and test data.
+* **Trains the model:** Defines the *callback* function `evalcb` to show the value of the `loss_all` function during the training process. Then, it sets [Adam](@ref Flux.Optimise.Adam) as the optimiser for training out model. Finally, it runs the training process for `10` epochs (as defined in the `args` object) and shows the `accuracy` value for the train and test data.
 
 
 To see the full version of this example, see [Simple multi-layer perceptron - model-zoo](https://github.com/FluxML/model-zoo/blob/master/vision/mlp_mnist/mlp_mnist.jl).

diff --git a/docs/src/utilities.md b/docs/src/utilities.md
@@ -49,7 +49,6 @@ These functions call:
 
 ```@docs
 Flux.rng_from_array
-Flux.default_rng_value
 Flux.nfan
 ```
 

diff --git a/src/Flux.jl b/src/Flux.jl
@@ -10,7 +10,7 @@ using MacroTools: @forward
 using MLUtils
 import Optimisers: Optimisers, trainable, destructure  # before v0.13, Flux owned these functions
 using Optimisers: freeze!, thaw!, adjust!
-
+using Random: default_rng
 using Zygote, ChainRulesCore
 using Zygote: Params, @adjoint, gradient, pullback
 using Zygote.ForwardDiff: value
@@ -32,8 +32,6 @@ export Chain, Dense, Embedding, Maxout, SkipConnection, Parallel, PairwiseFusion
 
 include("optimise/Optimise.jl")
 using .Optimise
-using .Optimise: @epochs
-using .Optimise: skip
 export Descent, Adam, Momentum, Nesterov, RMSProp,
   AdaGrad, AdaMax, AdaDelta, AMSGrad, NAdam, OAdam,
   AdamW, RAdam, AdaBelief, InvDecay, ExpDecay,

diff --git a/src/deprecations.jl b/src/deprecations.jl
@@ -1,19 +1,3 @@
-# v0.12 deprecations
-
-function ones(dims...)
-  Base.depwarn("Flux.ones(size...) is deprecated, please use Flux.ones32(size...) or Base.ones(Float32, size...)", :ones, force=true)
-  Base.ones(Float32, dims...)
-end
-ones(T::Type, dims...) = Base.ones(T, dims...)
-
-function zeros(dims...)
-  Base.depwarn("Flux.zeros(size...) is deprecated, please use Flux.zeros32(size...) or Base.zeros(Float32, size...)", :zeros, force=true)
-  Base.zeros(Float32, dims...)
-end
-zeros(T::Type, dims...) = Base.zeros(T, dims...)
-
-ones32(::Type, dims...) = throw(ArgumentError("Flux.ones32 is always Float32, use Base.ones to specify the element type"))
-zeros32(::Type, dims...) = throw(ArgumentError("Flux.zeros32 is always Float32, use Base.zeros to specify the element type"))
 
 # v0.13 deprecations
 
@@ -59,7 +43,7 @@ function loadparams!(m, xs)
 end
 
 # Channel notation: Changed to match Conv, but very softly deprecated!
-# Perhaps change to @deprecate for v0.14, but there is no plan to remove these.
+# Perhaps change to @deprecate for v0.15, but there is no plan to remove these.
 Dense(in::Integer, out::Integer, σ = identity; kw...) =
   Dense(in => out, σ; kw...)
 Bilinear(in1::Integer, in2::Integer, out::Integer, σ = identity; kw...) =
@@ -86,7 +70,7 @@ Base.@deprecate_binding Data Flux false "Sub-module Flux.Data has been removed.
 
 @deprecate paramtype(T,m) _paramtype(T,m) false  # internal method, renamed to make this clear
 
-@deprecate rng_from_array() default_rng_value()
+@deprecate rng_from_array() Random.default_rng()
 
 function istraining()
   Base.depwarn("Flux.istraining() is deprecated, use NNlib.within_gradient(x) instead", :istraining)
@@ -216,13 +200,17 @@ ChainRulesCore.@non_differentiable _greek_ascii_depwarn(::Any...)
 
 
 # v0.14 deprecations
+@deprecate default_rng_value() Random.default_rng()
+
+
+# v0.15 deprecations
 
-# Enable these when 0.14 is released, and delete const ClipGrad = Optimise.ClipValue etc: 
+# Enable these when 0.15 is released, and delete const ClipGrad = Optimise.ClipValue etc: 
 # Base.@deprecate_binding Optimiser OptimiserChain
 # Base.@deprecate_binding ClipValue ClipGrad
 
 # train!(loss::Function, ps::Zygote.Params, data, opt) = throw(ArgumentError(
-#   """On Flux 0.14, `train!` no longer accepts implicit `Zygote.Params`.
+#   """On Flux 0.15, `train!` no longer accepts implicit `Zygote.Params`.
 #   Instead of `train!(loss_xy, Flux.params(model), data, Adam())`
 #   it now needs `opt = Flux.setup(Adam(), model); train!(loss_mxy, model, data, opt)`
 #   where `loss_mxy` accepts the model as its first argument.