## Building a simple learning network in MLJ

Boot up MLJ and load some demonstration data (the Boston dataset):

In [1]:
using MLJ
using DataFrames
using Statistics

Xraw, yraw = datanow();
train, test = partition(eachindex(yraw), 0.7); # 70:30 split
Xtrain, ytrain = Xraw[train,:], yraw[train];
Xtest, ytest = Xraw[test,:], yraw[test];

Every network needs source nodes where data enters the network. The data to be used for training is placed at the source node (at its `data` field, which is mutable):

In [2]:
@constant X = node(Xtrain)

[0m[1mX[22m

In [3]:
@constant y = node(ytrain)

[0m[1my[22m

(Using the `@constant` macro is equivalent to making a `const` declaration but with the name of the bound variable registered for REPL display.)

We want to fit a K-nearest neighbor to our data, but with the input data standardized. That is, we want to rescale all inputs so the columns in the training data have zero mean and unit standard deviation. 

The standardization of a `DataFrame` is described by a single hyperparameter, namely a list of  features (column names) to be standardized. The default is an empty list, indicating that all numerical features should be included:

In [4]:
stand_model = Standardizer()

# [0m[1mStandardizer @ 5…72[22m: 
features                =>   0-element Array{Symbol,1}



Since our rescaling transformation will look to `X` for its training data, we write:

In [5]:
@constant stand = trainable(stand_model, X)

trainable([0m[1mStandardizer @ 5…72[22m, [0m[1mX[22m)

This code creates a `TrainableModel` object, `stand`, which will store the means and standard deviations of the training data columns. Transformed data can be fetched from a new node we will call `Xstand`:

In [6]:
@constant Xstand = transform(stand, X)

transform([32m[1mstand[22m[39m, [0m[1mX[22m)

Like all nodes, `Xstand` is a callable object. To see the outcome of our rescaling on the training data, we call `Xstand` with no arguments:

In [7]:
Xstand() 

ErrorException: [0m[1mTrainableModel @ stand[22m with model [0m[1mStandardizer @ 5…72[22m is not trained and so cannot transform.

Oops! To fetch this data we will need to train the node first:

In [8]:
fit!(Xstand)
@assert std(Xstand()[:Age]) ≈ 1
Xstand() |> head


┌ Info: Training [0m[1mTrainableModel @ stand[22m whose model is [0m[1mStandardizer @ 5…72[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88


Unnamed: 0_level_0,Crim,Zn,Indus,NOx,Rm,Age,Dis,Rad,Tax,PTRatio,Black,LStat
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,-0.613361,0.0849045,-0.996041,0.274966,0.253795,0.163317,-0.188693,-2.1573,-0.214089,-1.10018,0.406865,-0.91672
2,-0.580553,-0.606926,-0.214888,-0.399953,0.0254423,0.645338,0.23768,-1.53592,-1.00783,0.0415407,0.406865,-0.209159
3,-0.580584,-0.606926,-0.214888,-0.399953,1.15831,0.019063,0.23768,-1.53592,-1.00783,0.0415407,0.305854,-1.0783
4,-0.572644,-0.606926,-1.01738,-0.507548,0.881022,-0.519252,0.770027,-0.914528,-1.30181,0.452562,0.350527,-1.2637
5,-0.515311,-0.606926,-1.01738,-0.507548,1.10196,-0.223706,0.770027,-0.914528,-1.30181,0.452562,0.406865,-0.85719
6,-0.576583,-0.606926,-1.01738,-0.507548,0.0387876,-0.0653786,0.770027,-0.914528,-1.30181,0.452562,0.33787,-0.8776


Notice that fitting the node triggered training of the `TrainableModel` object `stand`.

Unfortunately , the K-nearest neighbor model expects `Array` type training data, so we create a new node `Xarray`:

In [9]:
@constant Xarray = array(Xstand)

array(transform([32m[1mstand[22m[39m, [0m[1mX[22m))

Note that implicit in the definition of `Xarray` is the entire network, beginning at the source `X`, which we can see from the above REPL output.

Now our K-nearest neighbor model will look to `Xarray` and `y` to fetch its training data, so we write:

In [10]:
@constant knn = trainable(KNNRegressor(K=4), Xarray, y)

trainable([0m[1mKNNRegressor @ 1…54[22m, [0m[1mXarray[22m, [0m[1my[22m)

We create a new node where predictions of the KNN model may be fetched:

In [11]:
yhat = predict(knn, Xarray)

predict([32m[1mknn[22m[39m, array(transform([32m[1mstand[22m[39m, [0m[1mX[22m)))

By default, fitting the new node causes all dependent `TrainableModel` objects to be trained:

In [12]:
fit!(yhat)

┌ Info: Training [0m[1mTrainableModel @ stand[22m whose model is [0m[1mStandardizer @ 5…72[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88
┌ Info: Training [0m[1mTrainableModel @ knn[22m whose model is [0m[1mKNNRegressor @ 1…54[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88


predict([32m[1mknn[22m[39m, array(transform([32m[1mstand[22m[39m, [0m[1mX[22m)))

To fetch an actual prediction *on new data* we call our node with the new data as argument:

In [13]:
rms(ytest, yhat(Xtest))

7.675912148491223

However, if we wish to retrain `knn` with a new hyperparameter without bothering to refit our scaling transformer, we can "freeze" the scaler:

In [14]:
freeze!(stand)
yhat

predict([32m[1mknn[22m[39m, array(transform([31m[1mstand[22m[39m, [0m[1mX[22m)))

Now `stand` appears in red, instead of green, to indicate that it is frozen.

In [15]:
knn.model.K = 7
fit!(yhat)
rms(ytest, yhat(Xtest))

└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:84
┌ Info: Training [0m[1mTrainableModel @ knn[22m whose model is [0m[1mKNNRegressor @ 1…54[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88


7.32782969364479

### Advanced: Wrapping the network as a new model

`Standardizer` and `KNNRegressor` are examples of MLJ *model* types. While a model is just a container for the hyperparameters of some learning algorithm, every such model has a low-level `fit` and `update` method which implements the kind of training observed above. (The complete specification for these methods is given [here](https://github.com/alan-turing-institute/MLJ.jl/blob/master/doc/adding_new_models.md).) Additionally, there is a low-level `transform` method dispatching on `Standardizer` objects, and a low-level `predict` method dispatching on `KNNRegressor` objects. 

To bundle the learning network defined above as a stand-alone "composite" model, we define a new model type and implement corresponding `fit`, `update` and `predict` methods. We can give our composite model hyperparameters to control exacly how how retraining should look; for example, fit the transformer only once, or arrange for retraining only when relevant hyperparameters change. The example below implements the latter design.

First, we define a new model, the container for the composite model's hyperparameters, which in this case are other models:

In [16]:
import MLJ: MLJType, Supervised, LearningNode, fit, update, predict

mutable struct WrappedKNN <: Supervised{LearningNode}
    stand_model::Standardizer
    knn_model::KNNRegressor
end

Next we define a struct to remember details of the learning network we will construct in calls to `fit`. This is needed for the `update` method, which contains the retraining logic:

In [17]:
mutable struct Cache <: MLJType
    stand
    knn
    stand_model
    knn_model
end

The `fit` method simply wraps the code we already wrote above into a function. Additionally it outputs to `cache` the two `TrainableModel`s, so that they can be frozen or reactivated in `update` (which is called with `cache` as an argument) according to whether or not the component models have changed. So that `update` can detect the change, `cache` also contains the model values used in the initial `fit`.

In [18]:
function fit(composite::WrappedKNN, verbosity, Xtrain, ytrain)

    stand_model = composite.stand_model
    knn_model = composite.knn_model
    
    X = node(Xtrain) # instantiates a source node
    y = node(ytrain)
    
    stand = trainable(stand_model, X)
    
    Xstand = transform(stand, X)
    Xarray = array(Xstand)
    
    knn = trainable(knn_model, Xarray, y)
    
    yhat = predict(knn, Xarray)

    fit!(yhat, verbosity)
    
    fitresult = yhat
    report = knn.report
    cache = Cache(stand, knn, deepcopy(stand_model), deepcopy(knn_model))

    return fitresult, cache, report

end

fit (generic function with 7 methods)

The `predict` method just calls the last node of our network on the new data:

In [19]:
predict(composite::WrappedKNN, fitresult, Xnew) = fitresult(Xnew)

predict (generic function with 5 methods)

In [20]:
function update(composite::WrappedKNN, verbosity, fitresult, cache, X, y)

    stand, knn = cache.stand, cache.knn
    stand_model, knn_model = cache.stand_model, cache.knn_model

    case1 = (composite.stand_model == stand_model) # true if `stand_model` has not changed
    case2 = (composite.knn_model == knn_model) # true if `knn_model` has not changed

    # we initially activate all trainable models, but leave them in the
    # state needed for this call to update (for post-train inspection):
    thaw!(stand); thaw!(knn)
    
    if case1
        freeze!(stand)
    end
    if case1 && case2 
        freeze!(knn)
    end

    fit!(fitresult, verbosity)

    cache.stand_model = deepcopy(composite.stand_model)
    cache.knn_model = deepcopy(composite.knn_model)

    return fitresult, cache, knn.report

end

update (generic function with 2 methods)

We're now ready to build a new simplified network with just one non-source node:

In [21]:
composite_model = WrappedKNN(Standardizer(), KNNRegressor(K=4))

# [0m[1mWrappedKNN @ 5…77[22m: 
stand_model             =>   [0m[1mStandardizer @ 1…70[22m
knn_model               =>   [0m[1mKNNRegressor @ 1…99[22m

## [0m[1mStandardizer @ 1…70[22m: 
features                =>   0-element Array{Symbol,1}

## [0m[1mKNNRegressor @ 1…99[22m: 
K                       =>   4
metric                  =>   euclidean (generic function with 1 method)
kernel                  =>   reciprocal (generic function with 1 method)



In [22]:
@constant composite = trainable(composite_model, X, y)

trainable([0m[1mWrappedKNN @ 5…77[22m, [0m[1mX[22m, [0m[1my[22m)

In [23]:
zhat = predict(composite, X)
fit!(zhat)

┌ Info: Training [0m[1mTrainableModel @ composite[22m whose model is [0m[1mWrappedKNN @ 5…77[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88
┌ Info: Training [0m[1mTrainableModel @ 1…51[22m whose model is [0m[1mStandardizer @ 1…70[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88
┌ Info: Training [0m[1mTrainableModel @ 5…90[22m whose model is [0m[1mKNNRegressor @ 1…99[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88


predict([32m[1mcomposite[22m[39m, [0m[1mX[22m)

Changing the stadardization hyperparameter triggers retraining of all components:

In [24]:
composite.model.stand_model.features = [:Age, :Crim, :Zn]
fit!(zhat)

┌ Info: Training [0m[1mTrainableModel @ composite[22m whose model is [0m[1mWrappedKNN @ 5…77[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88
┌ Info: Training [0m[1mTrainableModel @ 1…51[22m whose model is [0m[1mStandardizer @ 1…70[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88
┌ Info: Training [0m[1mTrainableModel @ 5…90[22m whose model is [0m[1mKNNRegressor @ 1…99[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88


predict([32m[1mcomposite[22m[39m, [0m[1mX[22m)

However, changing only a KNN hyperparameter does not trigger retraining of the standardizer:

In [25]:
composite.model.knn_model.K = 3
fit!(zhat)

┌ Info: Training [0m[1mTrainableModel @ composite[22m whose model is [0m[1mWrappedKNN @ 5…77[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:84
┌ Info: Training [0m[1mTrainableModel @ 5…90[22m whose model is [0m[1mKNNRegressor @ 1…99[22m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/networks.jl:88


predict([32m[1mcomposite[22m[39m, [0m[1mX[22m)