**setup demo:**

In [1]:
Pkg.activate("working", shared=true) # or whatever you call the environment that has MLJ installed
using MLJ

┌ Info: Recompiling stale cache file /Users/anthony/.julia/compiled/v1.0/MLJ/rAU56.ji for MLJ [add582a8-e3ab-11e8-2d5e-e98b27df1bc7]
└ @ Base loading.jl:1190


### Basic train and test

Load data and define train and test rows:

In [2]:
using MLJ

X, y = datanow(); # load the Boston dataset
train, test = partition(eachindex(y), 0.7); # 70:30 split

A *model* is a container for hyperparameters:

In [3]:
knn_model=KNNRegressor(K=10)

# [34mKNNRegressor @ 1…57[39m: 
K                       =>   10
metric                  =>   euclidean (generic function with 1 method)
kernel                  =>   reciprocal (generic function with 1 method)



Wrapping the model in data creates a *machine* which will store training outcomes ("fit-results"):

In [4]:
knn = machine(knn_model, X, y)

# [34mMachine{KNNRegressor} @ 1…17[39m: 
model                   =>   [34mKNNRegressor @ 1…57[39m
fitresult               =>   (undefined)
cache                   =>   (undefined)
args                    =>   (omitted Tuple{DataFrames.DataFrame,Array{Float64,1}} of length 2)
report                  =>   empty Dict{Symbol,Any}
rows                    =>   (undefined)



Training on the training rows, predicting on test rows, and evaluating:

In [5]:
fit!(knn, rows=train);
yhat = predict(knn, X[test,:])
rms(y[test], yhat)

┌ Info: Training [34mMachine{KNNRegressor} @ 1…17[39m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/machines.jl:98


5.114498666132261

Changing a hyper-parameter and re-evaluating:

In [6]:
knn_model.K = 20
fit!(knn)
yhat = predict(knn, X[test,:])
rms(y[test], yhat)

┌ Info: Training [34mMachine{KNNRegressor} @ 1…17[39m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/machines.jl:98


4.884523266056419

### Systematic tuning

In [7]:
K_range = range(knn_model, :K, lower=1, upper=100, scale=:log10)

# [34mNumericRange @ 1…87[39m: 
lower                   =>   1
upper                   =>   100
scale                   =>   :log10



Specify a resolution and a numeric range becomes an iterator:

In [8]:
iterator(K_range, 8)

8-element Array{Int64,1}:
   1
   2
   4
   7
  14
  27
  52
 100

Choosing a tuning strategy:

In [9]:
tuning = Grid(resolution=8)

# [34mGrid @ 8…32[39m: 
resolution              =>   8



Choose a resampling strategy:

In [10]:
resampling = Holdout(fraction_train=0.8)

# [34mHoldout @ 5…09[39m: 
fraction_train          =>   0.8



Define a new model which wraps the these strategies around an existing model:

In [11]:
tuned_knn_model = TunedModel(model=knn_model, 
    tuning=tuning, resampling=resampling, param_ranges=Params(:K => K_range))

# [34mTunedModel @ 1…84[39m: 
model                   =>   [34mKNNRegressor @ 1…57[39m
tuning                  =>   [34mGrid @ 8…32[39m
resampling              =>   [34mHoldout @ 5…09[39m
measure                 =>   rms (generic function with 5 methods)
param_ranges            =>   (omitted Params)
report_measurements     =>   true



Fitting the corresponding machine tunes the underlying model and retrains on all supplied data:

In [12]:
tuned_knn = machine(tuned_knn_model, X[train,:], y[train])
fit!(tuned_knn);

┌ Info: Training [34mMachine{TunedModel{Grid,KNNRegre…} @ 1…21[39m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/machines.jl:98


Searching for best model...
model number: 1	 measurement: 2.0303940504246962    
model number: 2	 measurement: 1.9828439251201737    
model number: 3	 measurement: 2.6425280736693972    
model number: 4	 measurement: 2.973368220376769    
model number: 5	 measurement: 3.1908319369192526    
model number: 6	 measurement: 4.175863415495205    
model number: 7	 measurement: 4.731343943808259    
model number: 8	 measurement: 4.731343943808259    

Training best model on all supplied data...


┌ Info: Training [34mMachine{KNNRegressor} @ 1…87[39m.
└ @ MLJ /Users/anthony/Dropbox/Julia7/MLJ/src/machines.jl:98


We can inspect the best model:

In [13]:
best(tuned_knn)

# [34mKNNRegressor @ 1…35[39m: 
K                       =>   2
metric                  =>   euclidean (generic function with 1 method)
kernel                  =>   reciprocal (generic function with 1 method)



And evaluate the tuned model:

In [14]:
yhat = predict(tuned_knn, X[test,:])
rms(yhat, y[test])

7.506195536032624