# Tuning

Models are wrapped into tuning strategies, and the wrapped model is then trainined as a machine with data. The 
training will initiate a search for optimal model hyperparameters.


In [1]:
using MLJ
import RDatasets: dataset
import DataFrames: DataFrame, select

In [2]:
X, y = @load_iris
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFor silent loading, specify `verbosity=0`. 


import MLJDecisionTreeInterface ✔


MLJDecisionTreeInterface.DecisionTreeClassifier

The key is to use the range function to specify the values for the parameter.

In [3]:
dtc = DecisionTreeClassifier()
r   = range(dtc, :max_depth, lower=1, upper=5) #Max depth is the hyperparameter of interest

NumericRange(1 ≤ max_depth ≤ 5; origin=3.0, unit=2.0)

Then a tuning strategy is created:

In [4]:
tm = TunedModel(model=dtc,ranges=[r,],measure=cross_entropy)

ProbabilisticTunedModel(
  model = DecisionTreeClassifier(
        max_depth = -1, 
        min_samples_leaf = 1, 
        min_samples_split = 2, 
        min_purity_increase = 0.0, 
        n_subfeatures = 0, 
        post_prune = false, 
        merge_purity_threshold = 1.0, 
        display_depth = 5, 
        feature_importance = :impurity, 
        rng = Random._GLOBAL_RNG()), 
  tuning = RandomSearch(
        bounded = Distributions.Uniform, 
        positive_unbounded = Distributions.Gamma, 
        other = Distributions.Normal, 
        rng = Random._GLOBAL_RNG()), 
  resampling = Holdout(
        fraction_train = 0.7, 
        shuffle = false, 
        rng = Random._GLOBAL_RNG()), 
  measure = LogLoss(
        tol = 2.220446049250313e-16), 
  weights = nothing, 
  class_weights = nothing, 
  operation = nothing, 
  range = MLJBase.NumericRange{Int64, MLJBase.Bounded, Symbol}[NumericRange(1 ≤ max_depth ≤ 5; origin=3.0, unit=2.0)], 
  selection_heuristic = MLJTuning.NaiveSelecti

The tuning strategy is controlled by:
1. Model
2. ranges of parameters
3. measure of error
4. the tuning algorithm (i.e Grid)
5. Resampling strat (ie Holdout, CV)

In [5]:
m = machine(tm, X, y)
fit!(m)

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mAttempting to evaluate 10 models.


trained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args: 
    1:	Source @448 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @874 ⏎ AbstractVector{Multiclass{3}}


To find the best model access the machines fields via fitted_params, use the feild
best_model.

In [6]:
fitted_params(m).best_model

DecisionTreeClassifier(
  max_depth = 2, 
  min_samples_leaf = 1, 
  min_samples_split = 2, 
  min_purity_increase = 0.0, 
  n_subfeatures = 0, 
  post_prune = false, 
  merge_purity_threshold = 1.0, 
  display_depth = 5, 
  feature_importance = :impurity, 
  rng = Random._GLOBAL_RNG())

Optimise based on error rate on predicted class:

In [7]:
tm = TunedModel(model=dtc, ranges=r, operation=predict_mode,
                measure=misclassification_rate)
m = machine(tm, X, y)
fit!(m)
fitted_params(m).best_model

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mAttempting to evaluate 10 models.


DecisionTreeClassifier(
  max_depth = 2, 
  min_samples_leaf = 1, 
  min_samples_split = 2, 
  min_purity_increase = 0.0, 
  n_subfeatures = 0, 
  post_prune = false, 
  merge_purity_threshold = 1.0, 
  display_depth = 5, 
  feature_importance = :impurity, 
  rng = Random._GLOBAL_RNG())

In [8]:
report(m).best_history_entry

(model = DecisionTreeClassifier(max_depth = 2, …),
 measure = [MisclassificationRate()],
 measurement = [0.1111111111111111],
 per_fold = [[0.1111111111111111]],)

For nested hyperparamters access them by using model.nested_parameter

In [9]:
meh = evaluate!(m,resampling=CV(nfolds=3),measure=misclassification_rate)

meh



PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌─────────────────────────┬──────────────┬─────────────┬─────────┬──────────────
│[22m measure                 [0m│[22m operation    [0m│[22m measurement [0m│[22m 1.96*SE [0m│[22m per_fold   [0m ⋯
├─────────────────────────┼──────────────┼─────────────┼─────────┼──────────────
│ MisclassificationRate() │ predict_mode │ 1.0         │ 0.0     │ [1.0, 1.0,  ⋯
└─────────────────────────┴──────────────┴─────────────┴─────────┴──────────────
[36m                                                                1 column omitted[0m
