# End to end examples

## AMES

> Baby steps <br>
> Dummy model <br>
> KNN-Ridge blend <br>
> Using the expanded syntax <br>
> Tuning the model <br>

In [1]:
using Pkg ; Pkg.activate("D:/JULIA/6_ML_with_Julia/EX-AMES"); Pkg.instantiate()

[32m[1m  Activating[22m[39m project at `D:\JULIA\6_ML_with_Julia\EX-AMES`


### Baby steps
---

Let's load a reduced version of the well-known Ames House Price data set (containing six of the more important categorical features and six of the more important numerical features). As "iris" the dataset is so common that you can load it directly with ```@load_ames``` and the reduced version via ```@load_reduced_ames```

In [2]:
using MLJ
using PrettyPrinting
import DataFrames: DataFrame
import Statistics

X, y = @load_reduced_ames
X = DataFrame(X)
@show size(X)
first(X, 3) |> pretty

size(X) = (1456, 12)
┌─────────────────────────────────┬────────────┬──────────────────────────────────┬────────────┬─────────────┬────────────┬────────────┬────────────┬──────────────────────────────────┬────────────┬──────────────┬───────────┐
│[1m OverallQual                     [0m│[1m GrLivArea  [0m│[1m Neighborhood                     [0m│[1m x1stFlrSF  [0m│[1m TotalBsmtSF [0m│[1m BsmtFinSF1 [0m│[1m LotArea    [0m│[1m GarageCars [0m│[1m MSSubClass                       [0m│[1m GarageArea [0m│[1m YearRemodAdd [0m│[1m YearBuilt [0m│
│[90m CategoricalValue{Int64, UInt32} [0m│[90m Float64    [0m│[90m CategoricalValue{String, UInt32} [0m│[90m Float64    [0m│[90m Float64     [0m│[90m Float64    [0m│[90m Float64    [0m│[90m Int64      [0m│[90m CategoricalValue{String, UInt32} [0m│[90m Float64    [0m│[90m Int64        [0m│[90m Int64     [0m│
│[90m OrderedFactor{10}               [0m│[90m Continuous [0m│[90m Multiclass{25}              

and the target is a continuous vector:

In [3]:
@show y[1:3]
scitype(y)

y[1:3] = [138000.0, 369900.0, 180000.0]


AbstractVector{Continuous} (alias for AbstractArray{Continuous, 1})

so this is a standard regression problem with a mix of categorical and continuous input.

### Dummy model
---

Remember that a model is just a container for hyperparameters; let's take a particularly simple one: the constant regression.

In [4]:
creg = ConstantRegressor()

ConstantRegressor(
    distribution_type = Distributions.Normal)

Wrapping the model in data creates a machine which will store training outcomes (fit-results)

In [5]:
cmach = machine(creg, X, y)

Machine{ConstantRegressor,…} trained 0 times; caches data
  model: ConstantRegressor{Distributions.Normal}
  args: 
    1:	Source @997 ⏎ `Table{Union{AbstractVector{Continuous}, AbstractVector{Count}, AbstractVector{Multiclass{15}}, AbstractVector{Multiclass{25}}, AbstractVector{OrderedFactor{10}}}}`
    2:	Source @199 ⏎ `AbstractVector{Continuous}`


You can now train the machine specifying the data it should be trained on (if unspecified, all the data will be used);

In [6]:
train, test = partition(collect(eachindex(y)), 0.70, shuffle = true);
fit!(cmach, rows = train)
ŷ = predict(cmach,rows = test)
ŷ[1:3] |> pprint

[Distributions.Normal{Float64}(μ=180856.8047105005, σ=76444.94463815287),
 Distributions.Normal{Float64}(μ=180856.8047105005, σ=76444.94463815287),
 Distributions.Normal{Float64}(μ=180856.8047105005, σ=76444.94463815287)]

┌ Info: Training Machine{ConstantRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464


In [7]:
fitted_params(cmach)

(target_distribution = Distributions.Normal{Float64}(μ=180856.8047105005, σ=76444.94463815287),)

Observe that the output is probabilistic, each element is a univariate normal distribution (with the same mean and variance as it's a constant model).

You can recover deterministic output by either computing the mean of predictions or using predict_mean directly (the mean function can bve applied to any distribution from ```Distributions.jl```):

In [8]:
ŷ = predict_mean(cmach, rows = test);
ŷ[1:3]

3-element Vector{Float64}:
 180856.8047105005
 180856.8047105005
 180856.8047105005

You can then call one of the loss functions to assess the quality of the model by comparing the performances on the test set:

In [9]:
rmsl(ŷ, y[test])

0.4152254392992977

### KNN-Ridge blend
---

Let's try something a bit fancier than a constant regressor.

* one-hot-encode categorical inputs
* log-transform the target
* fit both a KNN regression and a Ridge regression on the data
* Compute a weighted average of individual model predictions
* inverse transform (exponentiate) the blended prediction

You will first define a fixed model where all hyperparameters are specified or set to default. Then you will see how to create a model around a learning network that can be tuned.

In [10]:
RidgeRegressor = @load RidgeRegressor pkg = "MultivariateStats"
KNNRegressor = @load KNNRegressor

┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\loading.jl:168


import MLJMultivariateStatsInterface ✔


┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\loading.jl:168


import NearestNeighborModels ✔


NearestNeighborModels.KNNRegressor

### Using the expanded syntax

Let's start by defining the source nodes:

In [11]:
Xs = source(X)
ys = source(y)

Source @618 ⏎ `AbstractVector{Continuous}`

On the "first layer", there's one hot encoder and a log transform, these will respectively lead to node ```W``` and node ```z```:

In [12]:
hot = machine(OneHotEncoder(), Xs)

W = transform(hot, Xs)
z = log(ys);

On the "second layer", there's a KNN regressor and a ridge regressor, these lead to node ```ẑ₁``` and ```ẑ₂```

In [13]:
knn = machine(KNNRegressor(K = 5), W, z)
ridge = machine(RidgeRegressor(lambda = 2.5 ), W, z)

ẑ₁= predict(ridge, W)
ẑ₂= predict(knn, W)

Node{Machine{KNNRegressor,…}}
  args:
    1:	Node{Machine{OneHotEncoder,…}}
  formula:
    predict(
        [0m[1mMachine{KNNRegressor,…}[22m, 
        transform(
            [0m[1mMachine{OneHotEncoder,…}[22m, 
            Source @520))

On the "third layer", there's a weighted combination of the two regression models:

In [14]:
ẑ = 0.3ẑ₁+ 0.7ẑ₂;

And finally we need to invert the initial transformation of the target (which was a log):

In [15]:
ŷ = exp(ẑ);

You've now defined a full learning network which you can fit and use for prediction:

In [16]:
fit!(ŷ, rows = train)
ypreds = ŷ(rows = test)

┌ Info: Training Machine{OneHotEncoder,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Spawning 10 sub-features to one-hot encode feature :OverallQual.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 25 sub-features to one-hot encode feature :Neighborhood.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 15 sub-features to one-hot encode feature :MSSubClass.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Training Machine{RidgeRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Training Machine{KNNRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464


437-element Vector{Float64}:
 121862.38366142452
 186841.80458583724
 100810.70009926378
 168624.32388729427
 172550.32042394014
 146109.3311793182
 188050.05635913624
  92979.51603158483
 170746.84585578096
 110336.95731936999
 116326.9186322523
 147413.62683260464
 128182.68194001126
      ⋮
 195239.1420157948
 181510.0019841731
 282227.7389996184
 224404.60316860853
 137396.97436574553
 220227.1659955515
 277933.4351366653
 122914.13837022873
 204638.5512641041
 125487.87840655232
 179234.11773191555
 152891.56011207687

In [17]:
rmsl(y[test], ypreds)

0.1831145124401889

### Tuning the model

So far the hyperparameters were explicitly given but it makes more sense to learn them. For this, we define a model around the learning network which can then be trained and tuned as any model:

In [18]:
mutable struct KNNRidgeBlend <: DeterministicComposite
    knn_model::KNNRegressor
    ridge_model::RidgeRegressor
    knn_weight::Float64
end

We must specify how such a model should be fit, which is effectively just the learning network we had defined before except that now the parameters are contained in the struct:

In [19]:
# 42 methods for generic function fit:
methods(MLJ.fit)

In [20]:
function MLJ.fit(model::KNNRidgeBlend, verbosity::Int, X, y)
    Xs = source(X)
    ys = source(y)
    hot = machine(OneHotEncoder(), Xs)
    W = transform(hot, Xs)
    z = log(ys)
    ridge_model = model.ridge_model
    knn_model = model.knn_model
    ridge = machine(ridge_model, W, z)
    knn = machine(knn_model, W, z)
    # and finally
    ẑ = model.knn_weight * predict(knn, W) + (1.0 - model.knn_weight) * predict(ridge, W)
    ŷ = exp(ẑ)
    
    mach = machine(Deterministic(), Xs, ys; predict = ŷ)
    return!(mach, model, verbosity)
end

Note: you really want to set ```verbosity=0``` here otherwise in the tuning you will get a lot of verbose output!

You can now instantiate and fit such a model:

In [21]:
krb = KNNRidgeBlend(KNNRegressor(K = 5), RidgeRegressor(lambda = 2.5), 0.3)
mach = machine(krb, X, y)
fit!(mach, rows = train)

preds = predict(mach, rows = test)
rmsl(y[test], preds)

┌ Info: Training Machine{KNNRidgeBlend,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Training Machine{OneHotEncoder,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Spawning 10 sub-features to one-hot encode feature :OverallQual.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 25 sub-features to one-hot encode feature :Neighborhood.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 15 sub-features to one-hot encode feature :MSSubClass.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Training Machine{KNNRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Training Machine{RidgeRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464


0.14631783029520562

In [22]:
# 43 methods for generic function fit:
methods(MLJ.fit)

But more interestingly, the hyperparameters of the model can be tuned.

Before we get started, it's important to note that the hyperparameters of the model have different levels of nesting. This becomes explicit when trying to access elements:

In [23]:
@show krb.knn_weight
@show krb.knn_model.K
@show krb.ridge_model.lambda

krb.knn_weight = 0.3
krb.knn_model.K = 5
krb.ridge_model.lambda = 2.5


2.5

You can also see all the hyperparameters using the params function:

In [25]:
params(krb) |> pprint

(knn_model = (K = 5,
              algorithm = :kdtree,
              metric = Distances.Euclidean(0.0),
              leafsize = 10,
              reorder = true,
              weights = NearestNeighborModels.Uniform()),
 ridge_model = (lambda = 2.5, bias = true),
 knn_weight = 0.3)

The range of values to do your hyperparameter tuning over should follow the nesting structure reflected by ```params```:

In [26]:
k_range = range(krb, :(knn_model.K), lower = 2, upper = 100, scale=:log10)
l_range = range(krb, :(ridge_model.lambda), lower = 1e-4, upper = 10, scale=:log10)
w_range = range(krb, :(knn_weight), lower = 0.1, upper=0.9)

ranges = [k_range, l_range, w_range]

3-element Vector{MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T}:
 NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0) on log10 scale
 NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.00005, unit=4.99995) on log10 scale
 NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)

> https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Base.range

Now there remains to define how the tuning should be done, let's just specify a very coarse grid tuning with cross validation and instantiate a tuned model:

In [27]:
tuning = Grid(resolution = 3)
resampling = CV(nfolds = 6)

tm = TunedModel(model = krb, tuning = tuning, resampling = resampling, ranges = ranges, measure = rmsl)

DeterministicTunedModel(
    model = KNNRidgeBlend(
            knn_model = KNNRegressor,
            ridge_model = RidgeRegressor,
            knn_weight = 0.3),
    tuning = Grid(
            goal = nothing,
            resolution = 3,
            shuffle = true,
            rng = Random._GLOBAL_RNG()),
    resampling = CV(
            nfolds = 6,
            shuffle = false,
            rng = Random._GLOBAL_RNG()),
    measure = RootMeanSquaredLogError(),
    weights = nothing,
    operation = nothing,
    range = MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T[NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0) on log10 scale, NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.00005, unit=4.99995) on log10 scale, NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)],
    selection_heuristic = MLJTuning.NaiveSelection(nothing),
    train_best = true,
    repeats = 1,
    n = nothing,
    acceleration = CPU1{Nothing}(nothing),
    acceleration_resampling =

which we can now finally fit...

In [28]:
mtm = machine(tm, X, y)
fit!(mtm, rows = train);

┌ Info: Training Machine{DeterministicTunedModel{Grid,…},…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Attempting to evaluate 27 models.
└ @ MLJTuning C:\Users\jeffr\.julia\packages\MLJTuning\Al9yX\src\tuned_models.jl:680


To retrieve the best model, you can use:

In [29]:
krb_best = fitted_params(mtm).best_model

KNNRidgeBlend(
    knn_model = KNNRegressor(
            K = 2,
            algorithm = :kdtree,
            metric = Distances.Euclidean(0.0),
            leafsize = 10,
            reorder = true,
            weights = NearestNeighborModels.Uniform()),
    ridge_model = RidgeRegressor(
            lambda = 0.03162277660168379,
            bias = true),
    knn_weight = 0.1)

In [30]:
@show krb_best.knn_model.K
@show krb_best.ridge_model.lambda
@show krb_best.knn_weight

krb_best.knn_model.K = 2
krb_best.ridge_model.lambda = 0.03162277660168379
krb_best.knn_weight = 0.1


0.1

you can also use ```mtm``` to make predictions (which will be done using the best model)

In [31]:
preds = predict(mtm, rows = test)
rmsl(y[test], preds)

0.14066627522142472