### Basic training and testing

In [1]:
using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h

In [None]:
using MLJ
using DataFrames

task = load_boston()
X, y = task()

train, test = partition(eachindex(y), 0.7); # 70:30 split

A *model* is a container for hyperparameters:

In [None]:
knn_model=KNNRegressor(K=10)

Wrapping the model in data creates a *machine* which will store training outcomes (called *fit-results*):

In [None]:
knn = machine(knn_model, X, y)

Training on the training rows and evaluating on the test rows:

In [None]:
fit!(knn, rows=train)
yhat = predict(knn, X[test,:])
rms(y[test], yhat)

Or, in one line:

In [None]:
evaluate!(knn, resampling=Holdout(fraction_train=0.7))

Changing a hyperparameter and re-evaluating:

In [None]:
knn_model.K = 20
evaluate!(knn, resampling=Holdout(fraction_train=0.7))

### Systematic tuning as a model wrapper

A simple example of a composite model is a homogeneous ensemble. Here's a bagged ensemble model for 20 K-nearest neighbour regressors:

In [None]:
ensemble_model = EnsembleModel(atom=knn_model, n=20) 

Let's simultaneously tune the ensemble's `bagging_fraction` and the K-nearest neighbour hyperparameter `K`. Since one of our models is a field of the other, we have nested hyperparameters:

In [None]:
params(ensemble_model) # a named tuple (nested)

To define a tuning grid, we construct ranges for the two parameters and collate these ranges following the same pattern above (omitting parameters that don't change):

In [None]:
B_range = range(ensemble_model, :bagging_fraction, lower= 0.5, upper=1.0, scale = :linear)
K_range = range(knn_model, :K, lower=1, upper=100, scale=:log10)
nested_ranges = (atom = (K = K_range,), bagging_fraction = B_range)

Now we choose a tuning strategy, and a resampling strategy (for estimating performance), and wrap these strategies around our ensemble model to obtain a new model:

In [None]:
tuning = Grid(resolution=8)
resampling = CV(nfolds=6)

tuned_ensemble_model = TunedModel(model=ensemble_model, 
    tuning=tuning, resampling=resampling, nested_ranges=nested_ranges)

Fitting the corresponding machine tunes the underlying model (in this case an ensemble) and retrains on all supplied data:

In [None]:
tuned_ensemble = machine(tuned_ensemble_model, X[train,:], y[train])
fit!(tuned_ensemble);

In [25]:
fp = fitted_params(tuned_ensemble)

(best_model = [34mDeterministicEnsembleModel @ 7…60[39m,)

In [26]:
@show fp.best_model.bagging_fraction
@show fp.best_model.atom.K;

(fp.best_model).bagging_fraction = 0.7142857142857143
((fp.best_model).atom).K = 52
