# A composite model example with column splits and merges

The following composite model description comes from a
discussion at [MLJ issue #166](https://github.com/alan-turing-institute/MLJ.jl/issues/166#issuecomment-533934909):

> Regress y from x, and classify c from a and b. Then classify w
> from y and c.

Below we show how to use MLJ to define a new supervised model type
`MyComposite` with input (a, b, x) to learn a target (c, y, w)
according to this prescription. The fields (hyperparmeters) of the
new composite model will be the two classifiers and regresssor.

The new model type is obtained by "protyping" the composite model
using a learning network, and then exporting the network as a
stand-alone model type.

Select the relevant MLJ version in the [manual
entry](https://alan-turing-institute.github.io/MLJ.jl/stable/composing_models/)
for more on this general procedure.

To run without issues, this notebook/script should lie in a copy of
[this
directory](https://github.com/alan-turing-institute/MLJ.jl/tree/master/examples/composite_example),
in some tagged release of the [MLJ
package](https://github.com/alan-turing-institute/MLJ.jl).

In [1]:
using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()

  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h

In [2]:
using MLJ
using Random
Random.seed!(12);

### ASSUMPTIONS

We will make the following assumptions regarding the scientific types
of the data:

- `a`, `b`, `x` have scitype  `AbstractMatrix{Continuous}`

- `c` and `w` have scitype `AbstractVector{<:Finite}`

- `y` has scitype `AbstractVector{Continuous}`

All data share the same number of rows (corresponding to
observations).

For example,

In [3]:
N = 2
a = fill(1.0, (3N, 3)) + rand(3N, 3)
b = fill(2.0, (3N, 2)) + rand(3N, 2);
x = fill(3.0, (3N, 2)) + rand(3N, 2);
c = categorical(rand("pqr", 3N)); levels!(c, ['p', 'q', 'r'])
w = categorical(rand("PQ", 3N)); levels!(w, ['P', 'Q'])
y = fill(4.0, 3N) + rand(3N);

I'll suppose the input to our supervised composite model is to be
presented as a matrix of the form `X = hcat(a, b, x)` where `a`,
`b`, `x` are of the form above.  For example,

In [4]:
X = hcat(a, b, x);

scitype(X)

AbstractArray{Continuous,2}

Since the three target variables `c`, `y`, `z` for the composite
have different types, I'll suppose that these are presented as the
three columns of a table, with names :c, :y, and :z. For example,

In [5]:
Y = (c=c, y=y, w=w);

scitype(Y)

ScientificTypes.Table{Union{AbstractArray{Continuous,1}, AbstractArray{Multiclass{3},1}, AbstractArray{Multiclass{2},1}}}

We are assuming the learners are:

- A probabilisitic classifier for learning c from a, b
- A deterministic regressor for learning y from x
- A deterministic classifier for learning w from c and y

Here "classifier" means `AbstractVector{<:Finite}` target scitype,
and "regressor" means `AbstractVector{<:Continuous}` target scitype.

We restrict to component models that have `Table(Continous)` input
scitype and so will need to one-hot encode (c, y), before learning
w.

### PROTYPING THE COMPOSITE MODEL

Now we define a learning network using component models that will
become default values for the fields (hyperparameters) of our final
composite model type.

We first define ordinary functions to do splitting and merging. The
functions return a table or vector depending on what the component
models will be requiring (in this case, tables for inputs, vectors
for targets):

Splits:

In [6]:
get_ab(X) = MLJ.table(X[:,1:5], names=[:a1, :a2, :a3, :b1, :b2])
get_x(X)  = MLJ.table(X[:,6:7], names=[:x1, :x2])
get_c(Y)  = Y.c
get_y(Y)  = Y.y
get_w(Y)  = Y.w;

Merges:

In [7]:
put_cy(c, y) = (c=c, y=y)
put_cyw(c, y, w) = (c=c, y=y, w=w);

In [8]:
get_ab(X) |> pretty
get_x(X) |> pretty
put_cy(c, y) |> pretty
put_cyw(c, y, w) |> pretty

┌────────────────────┬────────────────────┬────────────────────┬────────────────────┬────────────────────┐
│ a1                 │ a2                 │ a3                 │ b1                 │ b2                 │
│ Float64            │ Float64            │ Float64            │ Float64            │ Float64            │
│ Continuous         │ Continuous         │ Continuous         │ Continuous         │ Continuous         │
├────────────────────┼────────────────────┼────────────────────┼────────────────────┼────────────────────┤
│ 1.258509611589049  │ 1.966360870193827  │ 1.3183374212706567 │ 2.869935590858474  │ 2.879294255581148  │
│ 1.9692536801431555 │ 1.5372649650697023 │ 1.369250972215818  │ 2.8462559286657476 │ 2.9075956827073743 │
│ 1.4741774214537995 │ 1.1047972018029866 │ 1.9648259583536951 │ 2.9603863080461643 │ 2.1638314679870945 │
│ 1.4345063919322494 │ 1.1772158529795567 │ 1.493953333184327  │ 2.298923829587551  │ 2.6935049602253294 │
│ 1.965789950130105  │ 1.739884810554

We now define source nodes. These nodes could simply wrap `nothing`
instead of concrete data, and the network could still be exported.
However, to enable testing of the learning network as we build it,
we will wrap the data defined above. (The author discovered several
errors in earlier attempts this way.)

In [9]:
X_ = source(X)
Y_ = source(Y, kind=:target)

[34mSource{:target} @ 1…96[39m


Now for the rest of the network.

Initial splits:

In [10]:
ab_ = node(get_ab, X_)
x_ = node(get_x, X_)
c_ = node(get_c, Y_)
y_ = node(get_y, Y_)
w_ = node(get_w, Y_)

[34mNode @ 4…73[39m = get_w([34m1…96[39m)

Node to predict c:

In [11]:
clf1 = @load DecisionTreeClassifier # a model instance
m1 = machine(clf1, ab_, c_)
ĉ_ = predict_mode(m1, ab_)

[34mNode @ 8…65[39m = predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m))

Node to predict y:

In [12]:
rgs = @load RidgeRegressor pkg=MultivariateStats
rgs.lambda = 0.1
m = machine(rgs, x_, y_)
ŷ_ = predict(m, x_)

[34mNode @ 5…02[39m = predict([0m[1m1…23[22m, get_x([34m1…58[39m))

Merge c and y:

In [13]:
cy_ = node(put_cy, ĉ_, ŷ_)

[34mNode @ 7…68[39m = put_cy(predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m)), predict([0m[1m1…23[22m, get_x([34m1…58[39m)))

Node to do the one-hot-encoding:

In [14]:
hot = OneHotEncoder(drop_last=true)
cy__ = transform(machine(hot, cy_), cy_)

[34mNode @ 1…99[39m = transform([0m[1m6…33[22m, put_cy(predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m)), predict([0m[1m1…23[22m, get_x([34m1…58[39m))))

Node to predict w:

In [15]:
clf2 = @load SVC
m2 = machine(clf2, cy__, w_)
ŵ_ = predict(m2, cy__)

[34mNode @ 8…93[39m = predict([0m[1m1…47[22m, transform([0m[1m6…33[22m, put_cy(predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m)), predict([0m[1m1…23[22m, get_x([34m1…58[39m)))))

Final merge:

In [16]:
Ŷ_ = node(put_cyw, ĉ_, ŷ_, ŵ_)

[34mNode @ 6…95[39m = put_cyw(predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m)), predict([0m[1m1…23[22m, get_x([34m1…58[39m)), predict([0m[1m1…47[22m, transform([0m[1m6…33[22m, put_cy(predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m)), predict([0m[1m1…23[22m, get_x([34m1…58[39m))))))

As a test of functionality, we can fit the final node, which trains
the whole network...

In [17]:
fit!(Ŷ_, rows=1:2N)

┌ Info: Training [34mNodalMachine{DecisionTreeClassifier} @ 1…96[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Training [34mNodalMachine{RidgeRegressor} @ 1…23[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Training [34mNodalMachine{OneHotEncoder} @ 6…33[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Spawning 2 sub-features to one-hot encode feature :c.
└ @ MLJModels.Transformers /Users/anthony/.julia/packages/MLJModels/ijYFi/src/builtins/Transformers.jl:510
┌ Info: Training [34mNodalMachine{SVC} @ 1…47[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141


[34mNode @ 6…95[39m = put_cyw(predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m)), predict([0m[1m1…23[22m, get_x([34m1…58[39m)), predict([0m[1m1…47[22m, transform([0m[1m6…33[22m, put_cy(predict_mode([0m[1m1…96[22m, get_ab([34m1…58[39m)), predict([0m[1m1…23[22m, get_x([34m1…58[39m))))))

... and make a prediction:

In [18]:
Ŷ_(rows=(2N-1):3N)

(c = CategoricalArrays.CategoricalValue{Char,UInt32}['r', 'r', 'q', 'q'],
 y = [4.56728, 4.7523, 4.63817, 4.67546],
 w = CategoricalArrays.CategoricalValue{Char,UInt32}['P', 'P', 'Q', 'Q'],)

### EXPORT THE LEARNING NETWORK AS STAND-ALONE MODEL

The next code simultaneously creates a new model type `MyComposite`
and defines `comp` as an instance, using deep copies of the
specified learning network component models as default field values:

In [19]:
comp = @from_network MyComposite(classifier1=clf1,
                                 classifier2=clf2,
                                 regressor=rgs) <= Ŷ_

Main.##365.MyComposite(classifier1 = DecisionTreeClassifier(pruning_purity = 1.0,
                                                            max_depth = -1,
                                                            min_samples_leaf = 1,
                                                            min_samples_split = 2,
                                                            min_purity_increase = 0.0,
                                                            n_subfeatures = 0,
                                                            display_depth = 5,
                                                            post_prune = false,
                                                            merge_purity_threshold = 0.9,
                                                            pdf_smoothing = 0.05,),
                       classifier2 = SVC(kernel = RadialBasis::KERNEL = 2,
                                         gamma = -1.0,
                                         weights

As a model, this object has no data attached to it. We fit it to
data, as we do any other model:

In [20]:
X = rand(100, 7);
Y = (c=categorical(rand("abc", 100)),
     y=rand(100),
     w=categorical(rand("AB", 100)));

m = machine(comp, X, Y)
fit!(m, rows=1:80)
Ŷ = predict(m, rows=81:100)
error = sum(Ŷ.w .!= Y.w[81:100])/20

┌ Info: Training [34mMachine{MyComposite} @ 1…10[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Training [34mNodalMachine{DecisionTreeClassifier} @ 7…57[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Training [34mNodalMachine{RidgeRegressor} @ 1…47[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Training [34mNodalMachine{OneHotEncoder} @ 1…76[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Spawning 2 sub-features to one-hot encode feature :c.
└ @ MLJModels.Transformers /Users/anthony/.julia/packages/MLJModels/ijYFi/src/builtins/Transformers.jl:510
┌ Info: Training [34mNodalMachine{SVC} @ 1…12[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141


0.65

We can select new component models, for example ...

In [21]:
comp.classifier1 = @load KNNClassifier

KNNClassifier(K = 5,
              algorithm = :kdtree,
              metric = Distances.Euclidean(0.0),
              leafsize = 10,
              reorder = true,
              weights = :uniform,)[34m @ 1…79[39m

... and retrain:

In [22]:
fit!(m, rows=1:80)
Ŷ = predict(m, rows=81:100)
error = sum(Ŷ.w .!= Y.w[81:100])/20

┌ Info: Updating [34mMachine{MyComposite} @ 1…10[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:154
┌ Info: Training [34mNodalMachine{KNNClassifier} @ 1…18[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Training [34mNodalMachine{RidgeRegressor} @ 1…40[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Training [34mNodalMachine{OneHotEncoder} @ 4…57[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141
┌ Info: Spawning 2 sub-features to one-hot encode feature :c.
└ @ MLJModels.Transformers /Users/anthony/.julia/packages/MLJModels/ijYFi/src/builtins/Transformers.jl:510
┌ Info: Training [34mNodalMachine{SVC} @ 1…13[39m.
└ @ MLJ /Users/anthony/.julia/packages/MLJ/z2Z6L/src/machines.jl:141


0.45

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*