Skip to content

Commit

Permalink
Update readme (#38)
Browse files Browse the repository at this point in the history
* update readme

* add forgotten figure

* use RDatasets instead of local iris
  • Loading branch information
ablaom committed Jun 8, 2020
1 parent 7e163b3 commit 6b19067
Show file tree
Hide file tree
Showing 4 changed files with 192 additions and 68 deletions.
257 changes: 191 additions & 66 deletions README.md
@@ -1,97 +1,150 @@
# MLJFlux
# MLJFlux

An interface to Flux deep learning models for the [MLJ](https://github.com/alan-turing-institute/MLJ.jl) machine learning framework

[![Build Status](https://travis-ci.com/alan-turing-institute/MLJFlux.jl.svg?branch=master)](https://travis-ci.com/alan-turing-institute/MLJFlux.jl) [![Coverage Status](https://coveralls.io/repos/github/alan-turing-institute/MLJFlux.jl/badge.svg?branch=master)](https://coveralls.io/github/alan-turing-institute/MLJFlux.jl?branch=master)

MLJFlux.jl makes a variety of deep learning models available to users
of the MLJ machine learning toolbox by providing an interface to
[Flux](https://github.com/FluxML/Flux.jl) framework.
MLJFlux makes it possible to apply the machine learning
meta-algorithms provided by MLJ - such as out-of-sample performance
evaluation and hyper-parameter optimization - to some classes of
supervised deep learning models. It does this by providing an
interface to the [Flux](https://fluxml.ai/Flux.jl/stable/)
framework.

This package is a work-in-progess and does not have a stable
API. Presently, the user should be familiar with building a Flux
chain.

### Models
### Basic idea

In MLJ a *model* is a mutable struct storing hyperparameters for some learning algorithm indicated by the model name. MLJFlux provides three such models:
Each MLJFlux model has a *builder* hyperparameter, an object encoding
instructions for creating a neural network given the data that the
model eventually sees (e.g., the number of classes in a classification
problem). While each MLJ model has a simple default builder, users
will generally need to define their own builders to get good results,
and this will require familiarity with the [Flux
API](https://fluxml.ai/Flux.jl/stable/) for defining a neural network
chain.

- `NeuralNetworkRegressor`
- `MultivariateNeuralNetworkRegressor`
- `NeuralNetworkClassifier`
- `ImageClassifier`
In the future MLJFlux may provided an assortment of more sophisticated
canned builders.

*Warning:* In Flux the term "model" has another meaning. However, as all
Flux "models" used in MLJFLux are `Flux.Chain` objects, we call them
*chains*, and restrict use of "model" to models in the MLJ sense.

### Example

### Constructing a model
Following is an introductory example using a default builder and no
standardization of input features.

Construction begins by defining an auxiliary struct called a
*builder*, and an associated `fit` method, for generating a
`Flux.Chain` object compatible with the data (bound later to the MLJ
model). The struct must be derived from MLJFlux.Builder, as in this
example:

#### Loading some data and instantiating a model

```julia
mutable struct MyNetwork <: MLJFlux.Builder
n1 :: Int
n2 :: Int
end
using MLJ
import RDatasets
iris = RDatasets.dataset("datasets", "iris");
y, X = unpack(iris, ==(:Species), colname -> true, rng=123);
@load NeuralNetworkClassifier

julia> clf = NeuralNetworkClassifier()
NeuralNetworkClassifier(
builder = Short(
n_hidden = 0,
dropout = 0.5,
σ = NNlib.σ),
finaliser = NNlib.softmax,
optimiser = ADAM(0.001, (0.9, 0.999), IdDict{Any,Any}()),
loss = Flux.crossentropy,
epochs = 10,
batch_size = 1,
lambda = 0.0,
alpha = 0.0,
optimiser_changes_trigger_retraining = false) @ 160
```

#### Inspecting evolution of out-of-sample performance

```julia
r = range(clf, :epochs, lower=1, upper=200, scale=:log10)
curve = learning_curve(clf, X, y,
range=r,
resampling=Holdout(fraction_train=0.7),
measure=cross_entropy)
using Plots
plot(curve.parameter_values,
curve.measurements,
xlab=curve.parameter_name,
xscale=curve.parameter_scale,
ylab = "Cross Entropy")

function MLJFlux.fit(nn::MyNetwork, a, b)
return Chain(Dense(a, nn.n1), Dense(nn.n1, nn.n2), Dense(nn.n2, b))
end
```

*Notes:*
![learning_curve.png](learning_curve.png)

- The attributes of the MyNetwork struct `n1`, `n2` can be anything. What matters is the result of the `fit` function.
- Here `a` is the the number of input features, inferred from
the data by MLJ when the model is trained. (It may be this argument is ignored, as in an
initial convolution layer for image classification).
- Here `b` is the dimension of the target variable
(`NeuralNetworkRegressor`) or the number of (univariate) target
levels (`NeuralNetworkClassifier` or `ImageClassifier`) - again inferred from the data.

Now that we have a builder, we can instantiate an MLJ model. For example:
### Models

```julia
nn_regressor = NeuralNetworkRegressor(builder=MyNetwork(32, 16),
loss=Flux.mse, epochs=5)
```
In MLJ a *model* is a mutable struct storing hyperparameters for some
learning algorithm indicated by the model name, and that's all. In
particular, an MLJ model does not store learned parameters.

The object `nn_regressor` behaves like any other MLJ model. It can be wrapped inside an MLJ `machine`, and you can do anything you'd do with
an MLJ machine.
*Warning:* In Flux the term "model" has another meaning. However, as all
Flux "models" used in MLJFLux are `Flux.Chain` objects, we call them
*chains*, and restrict use of "model" to models in the MLJ sense.

```julia
mach = machine(nn_regressor, X, y)
fit!(mach, verbosity=2)
yhat = predict(mach, rows = train)
```
and so on.
MLJFlux provides four model types, for use with input features `X` and
targets `y` of the [scientific
type](https://alan-turing-institute.github.io/MLJScientificTypes.jl/dev/)
indicated in the table below. The parameters `n_in` and `n_out`
refer to information passed to the builder, as described under
[Defining a new builder](defining-a-new-builder) below.

model type | prediction type | `scitype(X) <: _` | `scitype(y) <: _`
-----------|-----------------|---------------|----------------------------
`NeuralNetworkRegressor` | `Deterministic` | `Table(Continuous)` with `n_in` columns | `AbstractVector{<:Continuous)` (`n_out = 1`)
`MultivariateNeuralNetworkRegressor` | `Deterministic` | `Table(Continuous)` with `n_in` columns | `<: Table(Continuous)` with `n_out` columns
`NeuralNetworkClassifier` | `Probabilistic` | `<:Table(Continuous)` with `n_in` columns | `AbstractVector{<:Finite}` with `n_out` classes
`ImageClassifier` | `Probabilistic` | `AbstractVector(<:Image{W,H})` with `n_in = (W, H)` | `AbstractVector{<:Finite}` with `n_out` classes

> Table 1. Input and output types for MLJFlux models
#### Matrix input

Any `AbstractMatrix{<:AbstractFloat}` object `Xmat` can be forced to
have scitype `Table(Continuous)` by replacing it with ` X =
MLJ.table(Xmat)`. Furthermore, this wrapping, and subsequent
unwrapping under the hood, will compile to a no-op. At present this
includes support for sparse matrix data, but the implementation has
not been optimized for sparse data at this time and so should be used
with caution.

Instructions for coercing common image formats into some
`AbstractVector{<:Image}` are
[here](https://alan-turing-institute.github.io/MLJScientificTypes.jl/dev/#Type-coercion-for-image-data-1)


#### Built-in builders

MLJ provides two simple builders out of the box:

### Loss functions.
- `MLJFlux.Linear(σ=...)` builds a fully connected two layer
network with `n_in` inputs and `n_out` outputs, with activation
function `σ`, defaulting to a `MLJFlux.relu`.

The loss function specified by `loss=...` is applied internally by
Flux and needs to conform to the Flux API. You cannot, for example,
supply one of MLJ's probablistic loss functions, such as
`MLJ.cross_entropy` to one of the classifiers. Unless, you are
familiar with this API, it is recommended you use one of the [loss
functions provided by
Flux](https://github.com/FluxML/Flux.jl/blob/v0.8.3/src/layers/stateless.jl)
or leave `loss` unspecified to invoke the default. For a binary classification problem you might also consider `Flux.binarycrossentropy`, while for a classification problem with more than two classes (most image problems) consider `Flux.logitbinarycrossentropy`, as these have better numerical stability than vanilla `Flux.crossentropy`.
- `MLJFlux.Short(n_hidden=..., dropout=..., σ=...)` builds a
full-connected two-layer network with `n_in` inputs and `n_out`
outputs using `n_hidden` nodes in the hidden layer and the specified
`dropout` (defaulting to 0.5). An activation function `σ` is applied
between the hidden and final layers. If `n_hidden=0` (the default)
then `n_hidden` is the geometric mean of the number of input and
output nodes.

See Table 1 above to see how `n_in` and `n_out` relate to the data.

### Hyperparameters.

`NeuralNetworkRegressor` and `NeuralNetworkClassifier` have the following hyperparameters:
### Model hyperparameters.

All models share the following hyper-parameters:

1. `builder`: Default = `MLJFlux.Linear(σ=Flux.relu)` (regressors) or
`MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ)` (classifiers)

1. `builder`: An instance of some concrete subtype of
`MLJFlux.Builder`, as in the above example

2. `optimiser`: The optimiser to use for training. Default =
`Flux.ADAM()`

Expand All @@ -103,11 +156,17 @@ Flux](https://github.com/FluxML/Flux.jl/blob/v0.8.3/src/layers/stateless.jl)

6. `lambda`: The regularization strength. Default = 0. Range = [0, ∞)

7. `alpha`: The L2/L1 mix of regularization. Default = 0. Range = [0, 1]
7. `alpha`: The L2/L1 mix of regularization. Default = 0. Range = [0, 1]

8. `optimiser_changes_trigger_retraining`: True if fitting an
associated machine should trigger retraining from scratch whenever
the optimiser changes. Default = false
the optimiser changes. Default = `false`

The classifiers have an additional hyperparameter `finaliser` (default
= `Flux.softmax`) which is the operation applied to the unnormalized
output of the final layer to obtain probabilities (outputs summing to
one). Default = `Flux.softmax`. It should return a vector of the same
length as its input.

<!-- 9. `embedding_choice`: The embedding to use for handling categorical features. Options = :onehot, :entity_embedding. Default = :onehot. -->

Expand All @@ -117,3 +176,69 @@ Flux](https://github.com/FluxML/Flux.jl/blob/v0.8.3/src/layers/stateless.jl)
<!-- number of levels in the pool of the categorical feature. If the -->
<!-- value is <= 0, this means that the dimension will be equal to (the -->
<!-- number of unique values of the feature) / 2. Default = -1 -->


### Defining a new builder

Following is an example defining a new builder for creating a simple
fully-connected neural network with two hidden layers, with `n1` nodes
in the first hidden layer, and `n2` nodes in the second, for use in
any of the first three models in Table 1. The definition includes one
mutable struct and one method:

```julia
mutable struct MyNetwork <: MLJFlux.Builder
n1 :: Int
n2 :: Int
end

function MLJFlux.fit(nn::MyNetwork, n_in, n_out)
return Chain(Dense(n_in, nn.n1), Dense(nn.n1, nn.n2), Dense(nn.n2, n_out))
end
```

Note here that `n_in` and `n_out` depend on the size of the data (see
Table 1).

More generally, defining a new builder means defining a new struct
(sub-typing `MLJFlux.Builder` to get pretty printing) `MyNetwork`, say,
and defining a new `MLJFlux.fit` method with one of these signatures:

```julia
MLJFlux.fit(builder::MyNetwork, n_in, n_out)
MLJFlux.fit(builder::MyNetwork, n_in, n_out, n_channels) # for use with `ImageClassifier`
```

This method must return a `Flux.Chain` instance, `chain`, subject to the
following conditions:

- `chain(x)` must make sense:

- for any `x <: Vector{<:AbstractFloat}` of length `n_in` (for use
with one of the first three model types); or

- for any `x <: Array{<:Float32, 3}` of size
`(W, H, n_channels)`, where `n_in = (W, H)` and `n_channels` is
1 or 3 (for use with `ImageClassifier`)

- The object returned by `chain(MLJFlux.reformat(X))` must be an
`AbstractFloat` vector of length `n_out`


### Loss functions

Currently, the loss function specified by `loss=...` is applied
internally by Flux and needs to conform to the Flux API. You cannot,
for example, supply one of MLJ's probabilistic loss functions, such as
`MLJ.cross_entropy` to one of the classifiers constructors, although
you *should* use MLJ loss functions in MLJ meta-algorithms.

<!-- Unless, you are familiar with this API, it is recommended you use one -->
<!-- of the [loss functions provided by -->
<!-- Flux](https://github.com/FluxML/Flux.jl/blob/v0.8.3/src/layers/stateless.jl) -->
<!-- or leave `loss` unspecified to invoke the default. For a binary -->
<!-- classification problem you might also consider -->
<!-- `Flux.binarycrossentropy`, while for a classification problem with -->
<!-- more than two classes (most image problems) consider -->
<!-- `Flux.logitbinarycrossentropy`, as these have better numerical -->
<!-- stability than vanilla `Flux.crossentropy`. -->
Binary file added learning_curve.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/core.jl
Expand Up @@ -140,7 +140,7 @@ fit(builder::Linear, n::Integer, m::Integer) =

# baby example 2:
mutable struct Short <: Builder
n_hidden::Int # if zero use geometric mean of input/outpu
n_hidden::Int # if zero use geometric mean of input/output
dropout::Float64
σ
end
Expand Down
1 change: 0 additions & 1 deletion src/image.jl
Expand Up @@ -62,7 +62,6 @@ function MLJModelInterface.predict(model::ImageClassifier, fitresult, Xnew)
return MLJModelInterface.UnivariateFinite(levels, probs)
end


function MLJModelInterface.update(model::ImageClassifier,
verbosity::Int,
old_fitresult,
Expand Down

0 comments on commit 6b19067

Please sign in to comment.