# Machine Learning in Julia

An introduction to the
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/stable/)
toolbox.

### Set-up

Inspect Julia version:

In [1]:
VERSION

v"1.6.3"

The following instantiates a package environment.

The package environment has been created using **Julia 1.6** and may not
instantiate properly for other Julia versions.

In [2]:
using Pkg
Pkg.activate("env")
Pkg.instantiate()

  Activating environment at `~/GoogleDrive/Julia/MLJ/MLJTutorial/notebooks/05_composition/env/Project.toml`


## General resources

- [MLJ Cheatsheet](https://alan-turing-institute.github.io/MLJ.jl/dev/mlj_cheatsheet/)
- [Common MLJ Workflows](https://alan-turing-institute.github.io/MLJ.jl/dev/common_mlj_workflows/)
- [MLJ manual](https://alan-turing-institute.github.io/MLJ.jl/dev/)
- [Data Science Tutorials in Julia](https://juliaai.github.io/DataScienceTutorials.jl/)

## Part 5 - Advanced Model Composition

> **Goals:**
> 1. Learn how to build a prototypes of a composite model, called a *learning network*
> 2. Learn how to use the `@from_network` macro to export a learning network as a new stand-alone model type

`@pipeline` is great for composing models in an unbranching
sequence. Another built-in type of model composition is a model
*stack*; see
[here](https://alan-turing-institute.github.io/MLJ.jl/dev/model_stacking/#Model-Stacking)
for details. For other complicated model compositions you'll want to
use MLJ's generic model composition syntax. There are two main
steps:

- **Prototype** the composite model by building a *learning
  network*, which can be tested on some (dummy) data as you build
  it.

- **Export** the learning network as a new stand-alone model type.

Like pipeline models, instances of the exported model type behave
like any other model (and are not bound to any data, until you wrap
them in a machine).

### Building a pipeline using the generic composition syntax

To warm up, we'll do the equivalent of

In [3]:
using MLJ
LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels
pipe = @pipeline Standardizer LogisticClassifier;

[ Info: For silent loading, specify `verbosity=0`. 
import MLJLinearModels ✔


using the generic syntax.

Here's some dummy data we'll be using to test our learning network:

In [4]:
X, y = make_blobs(5, 3)
pretty(X)

┌────────────┬────────────┬────────────┐
│ x1         │ x2         │ x3         │
│ Float64    │ Float64    │ Float64    │
│ Continuous │ Continuous │ Continuous │
├────────────┼────────────┼────────────┤
│ 2.53013    │ -1.89957   │ 5.54883    │
│ 0.291037   │ -1.67608   │ 3.07317    │
│ -3.91176   │ -1.47851   │ 11.7301    │
│ -1.01752   │ 8.28512    │ 7.01315    │
│ 2.8618     │ -2.03601   │ 3.03607    │
└────────────┴────────────┴────────────┘


**Step 0** - Proceed as if you were combining the models "by hand",
using all the data available for training, transforming and
prediction:

In [5]:
stand = Standardizer();
linear = LogisticClassifier();

mach1 = machine(stand, X);
fit!(mach1);
Xstand = transform(mach1, X);

mach2 = machine(linear, Xstand, y);
fit!(mach2);
yhat = predict(mach2, Xstand)

[ Info: Training Machine{Standardizer,…}.
[ Info: Training Machine{LogisticClassifier,…}.


5-element MLJBase.UnivariateFiniteVector{ScientificTypesBase.Multiclass{3}, Int64, UInt32, Float64}:
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.0671, 2=>0.0688, 3=>0.864)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.0781, 2=>0.0733, 3=>0.849)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.128, 2=>0.714, 3=>0.158)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.658, 2=>0.136, 3=>0.206)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.0449, 2=>0.0317, 3=>0.923)

**Step 1** - Edit your code as follows:

- pre-wrap the data in `Source` nodes

- delete the `fit!` calls

In [6]:
X = source(X)  # or X = source() if not testing
y = source(y)  # or y = source()

stand = Standardizer();
linear = LogisticClassifier();

mach1 = machine(stand, X);
Xstand = transform(mach1, X);

mach2 = machine(linear, Xstand, y);
yhat = predict(mach2, Xstand)

Node{Machine{LogisticClassifier,…}}
  args:
    1:	Node{Machine{Standardizer,…}}
  formula:
    predict(
        [0m[1mMachine{LogisticClassifier,…}[22m, 
        transform(
            [0m[1mMachine{Standardizer,…}[22m, 
            Source @873))

Now `X`, `y`, `Xstand` and `yhat` are *nodes* ("variables" or
"dynammic data") instead of data. All training, predicting and
transforming is now executed lazily, whenever we `fit!` one of these
nodes. We *call* a node to retrieve the data it represents in the
original manual workflow.

In [7]:
fit!(Xstand)
Xstand() |> pretty

[ Info: Training Machine{Standardizer,…}.
┌────────────┬────────────┬────────────┐
│ x1         │ x2         │ x3         │
│ Float64    │ Float64    │ Float64    │
│ Continuous │ Continuous │ Continuous │
├────────────┼────────────┼────────────┤
│ 0.856089   │ -0.474923  │ -0.148276  │
│ 0.0504786  │ -0.425292  │ -0.839008  │
│ -1.46166   │ -0.381415  │ 1.57636    │
│ -0.420332  │ 1.78685    │ 0.260282   │
│ 0.975421   │ -0.505224  │ -0.84936   │
└────────────┴────────────┴────────────┘


In [8]:
fit!(yhat);
yhat()

[ Info: Not retraining Machine{Standardizer,…}. Use `force=true` to force.
[ Info: Training Machine{LogisticClassifier,…}.


5-element MLJBase.UnivariateFiniteVector{ScientificTypesBase.Multiclass{3}, Int64, UInt32, Float64}:
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.0671, 2=>0.0688, 3=>0.864)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.0781, 2=>0.0733, 3=>0.849)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.128, 2=>0.714, 3=>0.158)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.658, 2=>0.136, 3=>0.206)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.0449, 2=>0.0317, 3=>0.923)

The node `yhat` is the "descendant" (in an associated DAG we have
defined) of a unique source node:

In [9]:
sources(yhat)

2-element Vector{Any}:
 Source @873 ⏎ `ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}`
 Source @119 ⏎ `AbstractVector{ScientificTypesBase.Multiclass{3}}`

The data at the source node is replaced by `Xnew` to obtain a
new prediction when we call `yhat` like this:

In [10]:
Xnew, _ = make_blobs(2, 3);
yhat(Xnew)

2-element MLJBase.UnivariateFiniteVector{ScientificTypesBase.Multiclass{3}, Int64, UInt32, Float64}:
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.057, 2=>0.000505, 3=>0.942)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.00281, 2=>1.31e-5, 3=>0.997)

**Step 2** - Export the learning network as a new stand-alone model type

Now, somewhat paradoxically, we can wrap the whole network in a
special machine - called a *learning network machine* - before have
defined the new model type. Indeed doing so is a necessary step in
the export process, for this machine will tell the export macro:

- what kind of model the composite will be (`Deterministic`,
  `Probabilistic` or `Unsupervised`)a

- which source nodes are input nodes and which are for the target

- which nodes correspond to each operation (`predict`, `transform`,
  etc) that we might want to define

In [11]:
surrogate = Probabilistic()     # a model with no fields!
mach = machine(surrogate, X, y; predict=yhat)

Machine{ProbabilisticSurrogate,…} trained 0 times; does not cache data
  args: 
    1:	Source @873 ⏎ `ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}`
    2:	Source @119 ⏎ `AbstractVector{ScientificTypesBase.Multiclass{3}}`


Although we have no real need to use it, this machine behaves like
you'd expect it to:

In [12]:
Xnew, _ = make_blobs(2, 3)
fit!(mach)
predict(mach, Xnew)

[ Info: Not retraining Machine{Standardizer,…}. Use `force=true` to force.
[ Info: Not retraining Machine{LogisticClassifier,…}. Use `force=true` to force.


2-element MLJBase.UnivariateFiniteVector{ScientificTypesBase.Multiclass{3}, Int64, UInt32, Float64}:
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.00935, 2=>0.169, 3=>0.822)
 UnivariateFinite{ScientificTypesBase.Multiclass{3}}(1=>0.00551, 2=>0.0161, 3=>0.978)

Now we create a new model type using a Julia `struct` definition
appropriately decorated:

In [13]:
@from_network mach begin
    mutable struct YourPipe
        standardizer = stand
        classifier = linear::Probabilistic
    end
end

Instantiating and evaluating on some new data:

In [14]:
pipe = YourPipe()
X, y = @load_iris;   # built-in data set
mach = machine(pipe, X, y)
evaluate!(mach, measure=misclassification_rate, operation=predict_mode)



PerformanceEvaluation object with these fields:
  measure, measurement, operation, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_pairs
Extract:
┌─────────────────────────┬─────────────┬──────────────┬────────────────────────
│[22m measure                 [0m│[22m measurement [0m│[22m operation    [0m│[22m per_fold             [0m ⋯
├─────────────────────────┼─────────────┼──────────────┼────────────────────────
│ MisclassificationRate() │ 0.08        │ predict_mode │ [0.0, 0.04, 0.08, 0.0 ⋯
└─────────────────────────┴─────────────┴──────────────┴────────────────────────
[36m                                                                1 column omitted[0m


### A composite model to average two regressor predictors

The following is condensed version of
[this](https://github.com/alan-turing-institute/MLJ.jl/blob/master/binder/MLJ_demo.ipynb)
tutorial. We will define a composite model that:

- standardizes the input data

- learns and applies a Box-Cox transformation to the target variable

- blends the predictions of two supervised learning models - a ridge
 regressor and a random forest regressor; we'll blend using a simple
 average (for a more sophisticated stacking example, see
 [here](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/stacking/))

- applies the *inverse* Box-Cox transformation to this blended prediction

In [15]:
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels

[ Info: For silent loading, specify `verbosity=0`. 
import MLJDecisionTreeInterface ✔
[ Info: For silent loading, specify `verbosity=0`. 
import MLJLinearModels ✔


MLJLinearModels.RidgeRegressor

**Input layer**

In [16]:
X = source()
y = source()

Source @764 ⏎ `Nothing`

**First layer and target transformation**

In [17]:
std_model = Standardizer()
stand = machine(std_model, X)
W = MLJ.transform(stand, X)

box_model = UnivariateBoxCoxTransformer()
box = machine(box_model, y)
z = MLJ.transform(box, y)

Node{Machine{UnivariateBoxCoxTransformer,…}}
  args:
    1:	Source @764
  formula:
    transform(
        [0m[1mMachine{UnivariateBoxCoxTransformer,…}[22m, 
        Source @764)

**Second layer**

In [18]:
ridge_model = RidgeRegressor(lambda=0.1)
ridge = machine(ridge_model, W, z)

forest_model = RandomForestRegressor(n_trees=50)
forest = machine(forest_model, W, z)

ẑ = 0.5*predict(ridge, W) + 0.5*predict(forest, W)

Node{Nothing}
  args:
    1:	Node{Nothing}
    2:	Node{Nothing}
  formula:
    +(
        #134(
            predict(
                [0m[1mMachine{RidgeRegressor,…}[22m, 
                transform(
                    [0m[1mMachine{Standardizer,…}[22m, 
                    Source @518))),
        #134(
            predict(
                [0m[1mMachine{RandomForestRegressor,…}[22m, 
                transform(
                    [0m[1mMachine{Standardizer,…}[22m, 
                    Source @518))))

**Output**

In [19]:
ŷ = inverse_transform(box, ẑ)

Node{Machine{UnivariateBoxCoxTransformer,…}}
  args:
    1:	Node{Nothing}
  formula:
    inverse_transform(
        [0m[1mMachine{UnivariateBoxCoxTransformer,…}[22m, 
        +(
            #134(
                predict(
                    [0m[1mMachine{RidgeRegressor,…}[22m, 
                    transform(
                        [0m[1mMachine{Standardizer,…}[22m, 
                        Source @518))),
            #134(
                predict(
                    [0m[1mMachine{RandomForestRegressor,…}[22m, 
                    transform(
                        [0m[1mMachine{Standardizer,…}[22m, 
                        Source @518)))))

With the learning network defined, we're ready to export:

In [20]:
@from_network machine(Deterministic(), X, y, predict=ŷ) begin
    mutable struct CompositeModel
        rgs1 = ridge_model
        rgs2 = forest_model
    end
end

Let's instantiate the new model type and try it out on some data:

In [21]:
composite = CompositeModel()

CompositeModel(
    rgs1 = RidgeRegressor(
            lambda = 0.1,
            fit_intercept = true,
            penalize_intercept = false,
            solver = nothing),
    rgs2 = RandomForestRegressor(
            max_depth = -1,
            min_samples_leaf = 1,
            min_samples_split = 2,
            min_purity_increase = 0.0,
            n_subfeatures = -1,
            n_trees = 50,
            sampling_fraction = 0.7,
            pdf_smoothing = 0.0,
            rng = Random._GLOBAL_RNG()))

In [22]:
X, y = @load_boston;
mach = machine(composite, X, y);
evaluate!(mach,
          resampling=CV(nfolds=6, shuffle=true),
          measures=[rms, mae])



PerformanceEvaluation object with these fields:
  measure, measurement, operation, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_pairs
Extract:
┌────────────────────────┬─────────────┬───────────┬────────────────────────────
│[22m measure                [0m│[22m measurement [0m│[22m operation [0m│[22m per_fold                 [0m ⋯
├────────────────────────┼─────────────┼───────────┼────────────────────────────
│ RootMeanSquaredError() │ 3.76        │ predict   │ [2.9, 4.98, 3.72, 3.71, 2 ⋯
│ MeanAbsoluteError()    │ 2.42        │ predict   │ [2.1, 2.93, 2.37, 2.47, 2 ⋯
└────────────────────────┴─────────────┴───────────┴────────────────────────────
[36m                                                                1 column omitted[0m


### Resources for Part 5

- From the MLJ manual:
   - [Learning Networks](https://alan-turing-institute.github.io/MLJ.jl/stable/composing_models/#Learning-Networks-1)
- From Data Science Tutorials:
    - [Learning Networks](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/learning-networks/)
    - [Learning Networks 2](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/learning-networks-2/)

    - [Stacking](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/stacking/): an advanced example of model composition

    - [Finer Control](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Method-II:-Finer-control-(advanced)-1):
      exporting learning networks without a macro for finer control

<a id='solutions-to-exercises'></a>

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*