# Intro to Stats Learning

## Lab 8 - Tree-based models

> https://juliaai.github.io/DataScienceTutorials.jl/isl/lab-8/
> <br> (project folder) https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/ISL-lab-8.tar.gz

In [1]:
using Pkg; Pkg.activate("D:/JULIA/6_ML_with_Julia/ISL-lab-8"); Pkg.instantiate()

[32m[1m  Activating[22m[39m project at `D:\JULIA\6_ML_with_Julia\ISL-lab-8`


Getting started
1. Decision Tree Classifier
2. Tuning a DTC
3. Decision Tree Regressor <br>

Random Forest <br>
Gradient Boosting Machine

### Getting started 

---

In [2]:
using MLJ
import RDatasets: dataset
using PrettyPrinting
import DataFrames: DataFrame, select, Not

DTC = @load DecisionTreeClassifier pkg = DecisionTree

carseats = dataset("ISLR", "Carseats")

first(carseats, 3) |> pretty

┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\loading.jl:168


import MLJDecisionTreeInterface ✔
┌────────────┬────────────┬────────────┬─────────────┬────────────┬────────────┬─────────────────────────────────┬────────────┬────────────┬─────────────────────────────────┬─────────────────────────────────┐
│[1m Sales      [0m│[1m CompPrice  [0m│[1m Income     [0m│[1m Advertising [0m│[1m Population [0m│[1m Price      [0m│[1m ShelveLoc                       [0m│[1m Age        [0m│[1m Education  [0m│[1m Urban                           [0m│[1m US                              [0m│
│[90m Float64    [0m│[90m Float64    [0m│[90m Float64    [0m│[90m Float64     [0m│[90m Float64    [0m│[90m Float64    [0m│[90m CategoricalValue{String, UInt8} [0m│[90m Float64    [0m│[90m Float64    [0m│[90m CategoricalValue{String, UInt8} [0m│[90m CategoricalValue{String, UInt8} [0m│
│[90m Continuous [0m│[90m Continuous [0m│[90m Continuous [0m│[90m Continuous  [0m│[90m Continuous [0m│[90m Continuous [0m│[90m Multiclass{3

We encode a new variable High based on whether the sales are higher or lower than 8 and add that column to the dataframe:

In [3]:
names(carseats)

11-element Vector{String}:
 "Sales"
 "CompPrice"
 "Income"
 "Advertising"
 "Population"
 "Price"
 "ShelveLoc"
 "Age"
 "Education"
 "Urban"
 "US"

In [4]:
High = ifelse.(carseats.Sales .<= 8, "No", "Yes") |> categorical;
carseats[!, :High] = High;

Let's now train a basic decision tree classifier for ```High``` given the other features after one-hot-encoding the categorical features:

In [5]:
X = select(carseats, Not([:Sales, :High]))
y = carseats.High;

### Decision Tree Classifier

In [6]:
HotTreeClf = OneHotEncoder() |> DTC()

mdl = HotTreeClf
mach = machine(mdl, X, y)
fit!(mach);

┌ Info: Training Machine{ProbabilisticPipeline{NamedTuple{,…},…},…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Training Machine{OneHotEncoder,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Spawning 3 sub-features to one-hot encode feature :ShelveLoc.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 2 sub-features to one-hot encode feature :Urban.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 2 sub-features to one-hot encode feature :US.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Training Machine{DecisionTreeClassifier,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464


Note ```|>``` is syntactic sugar for creating a ```Pipeline``` model from component model instances or model types. Note also that the machine ```mach``` is trained on the whole data.

In [7]:
ypred = predict_mode(mach, X)
misclassification_rate(ypred, y)

0.0

That's right... it gets it perfectly; this tends to be classic behaviour for a DTC to overfit the data it's trained on. Let's see if it generalises:

In [8]:
train, test = partition(eachindex(y), 0.5, shuffle = true, rng = 333)
fit!(mach, rows = train)
ypred = predict_mode(mach, rows = test)
misclassification_rate(ypred, y[test])

┌ Info: Training Machine{ProbabilisticPipeline{NamedTuple{,…},…},…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Training Machine{OneHotEncoder,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Spawning 3 sub-features to one-hot encode feature :ShelveLoc.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 2 sub-features to one-hot encode feature :Urban.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Spawning 2 sub-features to one-hot encode feature :US.
└ @ MLJModels C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\builtins\Transformers.jl:1142
┌ Info: Training Machine{DecisionTreeClassifier,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464


0.29

Not really...

### Tuning a DTC

Let's try to do a bit of tuning

In [9]:
r_mpi = range(mdl, :(decision_tree_classifier.max_depth), lower = 1, upper = 10)
r_msi = range(mdl, :(decision_tree_classifier.min_samples_leaf), lower = 1, upper = 50)

tm = TunedModel(model = mdl, 
                ranges = [r_mpi, r_msi],
                tuning = Grid(resolution = 8),
                resampling = CV(nfolds = 5, rng = 112),
                operation = predict_mode,
                measure = misclassification_rate)

mtm = machine(tm, X, y)
fit!(mtm, rows = train)

┌ Info: Training Machine{ProbabilisticTunedModel{Grid,…},…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Attempting to evaluate 64 models.
└ @ MLJTuning C:\Users\jeffr\.julia\packages\MLJTuning\Al9yX\src\tuned_models.jl:680


Machine{ProbabilisticTunedModel{Grid,…},…} trained 1 time; caches data
  model: MLJTuning.ProbabilisticTunedModel{Grid, MLJBase.ProbabilisticPipeline{NamedTuple{(:one_hot_encoder, :decision_tree_classifier), Tuple{Unsupervised, Probabilistic}}, MLJModelInterface.predict}}
  args: 
    1:	Source @616 ⏎ `Table{Union{AbstractVector{Continuous}, AbstractVector{Multiclass{3}}, AbstractVector{Multiclass{2}}}}`
    2:	Source @505 ⏎ `AbstractVector{Multiclass{2}}`


In [10]:
ypred = predict_mode(mtm, rows = test)
misclassification_rate(ypred, y[test])

0.305

We can inspect the parameters of the best model

In [11]:
fitted_params(mtm).best_model.decision_tree_classifier # max_depth = 9

DecisionTreeClassifier(
    max_depth = 5,
    min_samples_leaf = 1,
    min_samples_split = 2,
    min_purity_increase = 0.0,
    n_subfeatures = 0,
    post_prune = false,
    merge_purity_threshold = 1.0,
    pdf_smoothing = 0.0,
    display_depth = 5,
    rng = Random._GLOBAL_RNG())

In [12]:
fitted_params(mtm).best_model

ProbabilisticPipeline(
    one_hot_encoder = OneHotEncoder(
            features = Symbol[],
            drop_last = false,
            ordered_factor = true,
            ignore = false),
    decision_tree_classifier = DecisionTreeClassifier(
            max_depth = 5,
            min_samples_leaf = 1,
            min_samples_split = 2,
            min_purity_increase = 0.0,
            n_subfeatures = 0,
            post_prune = false,
            merge_purity_threshold = 1.0,
            pdf_smoothing = 0.0,
            display_depth = 5,
            rng = Random._GLOBAL_RNG()),
    cache = true)

In [13]:
fitted_params(mtm)

(best_model = ProbabilisticPipeline{NamedTuple{,…},…},
 best_fitted_params = (decision_tree_classifier = (tree = Decision Tree
Leaves: 15
Depth:  5,
                                                   encoding = Dict{CategoricalArrays.CategoricalValue{String, UInt32}, UInt32}("Yes" => 0x00000002, "No" => 0x00000001),),
                       one_hot_encoder = (fitresult = OneHotEncoderResult,),
                       machines = Machine[Machine{OneHotEncoder,…}, Machine{DecisionTreeClassifier,…}],
                       fitted_params_given_machine = OrderedCollections.LittleDict{Any, Any, Vector{Any}, Vector{Any}}(Machine{OneHotEncoder,…} => (fitresult = OneHotEncoderResult,), Machine{DecisionTreeClassifier,…} => (tree = Decision Tree
Leaves: 15
Depth:  5, encoding = Dict{CategoricalArrays.CategoricalValue{String, UInt32}, UInt32}("Yes" => 0x00000002, "No" => 0x00000001))),),)

### Decisioin Tree Regressor

In [14]:
DTR = @load DecisionTreeRegressor pkg = DecisionTree

boston = dataset("MASS", "Boston")

y, X = unpack(boston, ==(:MedV))

train, test = partition(eachindex(y), 0.5, shuffle = true, rng = 551)

scitype(X)

import MLJDecisionTreeInterface ✔


┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\loading.jl:168


Table{Union{AbstractVector{Continuous}, AbstractVector{Count}}}

Let's recode the Count as Continuous and then fit a DTR

In [15]:
schema(X)

┌─────────┬────────────┬─────────┐
│[22m names   [0m│[22m scitypes   [0m│[22m types   [0m│
├─────────┼────────────┼─────────┤
│ Crim    │ Continuous │ Float64 │
│ Zn      │ Continuous │ Float64 │
│ Indus   │ Continuous │ Float64 │
│ Chas    │ Count      │ Int64   │
│ NOx     │ Continuous │ Float64 │
│ Rm      │ Continuous │ Float64 │
│ Age     │ Continuous │ Float64 │
│ Dis     │ Continuous │ Float64 │
│ Rad     │ Count      │ Int64   │
│ Tax     │ Count      │ Int64   │
│ PTRatio │ Continuous │ Float64 │
│ Black   │ Continuous │ Float64 │
│ LStat   │ Continuous │ Float64 │
└─────────┴────────────┴─────────┘


In [16]:
X = coerce(X, autotype(X, rules = (:discrete_to_continuous, )))

dtr_model = DTR()
dtr = machine(dtr_model, X, y)

fit!(dtr, rows = train)

┌ Info: Training Machine{DecisionTreeRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464


Machine{DecisionTreeRegressor,…} trained 1 time; caches data
  model: MLJDecisionTreeInterface.DecisionTreeRegressor
  args: 
    1:	Source @967 ⏎ `Table{AbstractVector{Continuous}}`
    2:	Source @792 ⏎ `AbstractVector{Continuous}`


In [17]:
ypred = MLJ.predict(dtr, rows = test)
round(rms(ypred, y[test]), sigdigits = 3)

4.98

Again we can try tuning this a bit, since it's the same idea as before, let's just try to adjust the depth of the tree:

In [18]:
r_depth = range(dtr_model, :max_depth, lower = 2, upper = 20)

NumericRange(2 ≤ max_depth ≤ 20; origin=11.0, unit=9.0)

In [19]:
tm = TunedModel(model = dtr_model, 
                ranges = [r_depth],
                tuning = Grid(resolution = 10),
                resampling = CV(nfolds = 5, rng = 1254),
                measure = rms)

DeterministicTunedModel(
    model = DecisionTreeRegressor(
            max_depth = -1,
            min_samples_leaf = 5,
            min_samples_split = 2,
            min_purity_increase = 0.0,
            n_subfeatures = 0,
            post_prune = false,
            merge_purity_threshold = 1.0,
            rng = Random._GLOBAL_RNG()),
    tuning = Grid(
            goal = nothing,
            resolution = 10,
            shuffle = true,
            rng = Random._GLOBAL_RNG()),
    resampling = CV(
            nfolds = 5,
            shuffle = true,
            rng = Random.MersenneTwister(1254)),
    measure = RootMeanSquaredError(),
    weights = nothing,
    operation = nothing,
    range = MLJBase.NumericRange{Int64, MLJBase.Bounded, Symbol}[NumericRange(2 ≤ max_depth ≤ 20; origin=11.0, unit=9.0)],
    selection_heuristic = MLJTuning.NaiveSelection(nothing),
    train_best = true,
    repeats = 1,
    n = nothing,
    acceleration = CPU1{Nothing}(nothing),
    acceleration_resa

In [20]:
mtm = machine(tm, X, y)

Machine{DeterministicTunedModel{Grid,…},…} trained 0 times; caches data
  model: MLJTuning.DeterministicTunedModel{Grid, MLJDecisionTreeInterface.DecisionTreeRegressor}
  args: 
    1:	Source @419 ⏎ `Table{AbstractVector{Continuous}}`
    2:	Source @085 ⏎ `AbstractVector{Continuous}`


In [21]:
fit!(mtm, rows = train)

┌ Info: Training Machine{DeterministicTunedModel{Grid,…},…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
┌ Info: Attempting to evaluate 10 models.
└ @ MLJTuning C:\Users\jeffr\.julia\packages\MLJTuning\Al9yX\src\tuned_models.jl:680


Machine{DeterministicTunedModel{Grid,…},…} trained 1 time; caches data
  model: MLJTuning.DeterministicTunedModel{Grid, MLJDecisionTreeInterface.DecisionTreeRegressor}
  args: 
    1:	Source @419 ⏎ `Table{AbstractVector{Continuous}}`
    2:	Source @085 ⏎ `AbstractVector{Continuous}`


In [22]:
ypred = MLJ.predict(mtm, rows = test)
round(rms(ypred, y[test]), sigdigits = 3)

5.05

In [23]:
fitted_params(mtm).best_model

DecisionTreeRegressor(
    max_depth = 6,
    min_samples_leaf = 5,
    min_samples_split = 2,
    min_purity_increase = 0.0,
    n_subfeatures = 0,
    post_prune = false,
    merge_purity_threshold = 1.0,
    rng = Random._GLOBAL_RNG())

### Random Forest

---

**Note** : the package DecisionTree.jl also has a RandomForest model but it is not yet interfaced with in MLJ

In [24]:
RFR = @load RandomForestRegressor pkg=ScikitLearn

import MLJScikitLearnInterface

┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\loading.jl:168


 ✔


MLJScikitLearnInterface.RandomForestRegressor

In [25]:
rf_mdl = RFR()
rf = machine(rf_mdl, X, y)
fit!(rf, rows = train)

┌ Info: Training Machine{RandomForestRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
│   caller = npyinitialize() at numpy.jl:67
└ @ PyCall C:\Users\jeffr\.julia\packages\PyCall\L0fLP\src\numpy.jl:67
  warn(
  warn(


Machine{RandomForestRegressor,…} trained 1 time; caches data
  model: MLJScikitLearnInterface.RandomForestRegressor
  args: 
    1:	Source @511 ⏎ `Table{AbstractVector{Continuous}}`
    2:	Source @867 ⏎ `AbstractVector{Continuous}`


In [26]:
ypred = MLJ.predict(rf, rows = test)
round(rms(ypred, y[test]), sigdigits = 3)

3.83

### Gradient Boosting Machine
---

In [27]:
XGBR = @load XGBoostRegressor

xgb_mdl = XGBR(num_round = 10, max_depth = 10)
xgb = machine(xgb_mdl, X, y)
fit!(xgb, rows = train)

┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main C:\Users\jeffr\.julia\packages\MLJModels\tMgLW\src\loading.jl:168


import MLJXGBoostInterface ✔


┌ Info: Training Machine{XGBoostRegressor,…}.
└ @ MLJBase C:\Users\jeffr\.julia\packages\MLJBase\MuLnJ\src\machines.jl:464
[1]	train-rmse:17.455181
[2]	train-rmse:12.624966
[3]	train-rmse:9.235101
[4]	train-rmse:6.827576
[5]	train-rmse:5.102226
[6]	train-rmse:3.865698
[7]	train-rmse:2.960039
[8]	train-rmse:2.282763
[9]	train-rmse:1.794703
[10]	train-rmse:1.432438


Machine{XGBoostRegressor,…} trained 1 time; caches data
  model: MLJXGBoostInterface.XGBoostRegressor
  args: 
    1:	Source @555 ⏎ `Table{AbstractVector{Continuous}}`
    2:	Source @481 ⏎ `AbstractVector{Continuous}`


In [28]:
ypred = MLJ.predict(xgb, rows = test)
round(rms(ypred, y[test]), sigdigits = 3)

3.96

Again we could do some tuning for this.