# Model Composition with MLJFlux

In this workflow example, we see how MLJFlux enables composing MLJ models with MLJFlux models. We will assume a
class imbalance setting and wrap an oversampler with a deep learning model from MLJFlux.

**Julia version** is assumed to be 1.10.*

### Basic Imports

In [1]:
using MLJ               # Has MLJFlux models
using Flux              # For more flexibility
import RDatasets        # Dataset source
import Random           # To create imbalance
import Imbalance        # To solve the imbalance

### Loading and Splitting the Data

In [2]:
iris = RDatasets.dataset("datasets", "iris");
y, X = unpack(iris, ==(:Species), colname -> true, rng=123);
X = Float32.(X);      # To be compatible with type of network network parameters

To simulate an imbalanced dataset, we will take a random sample:

In [3]:
Random.seed!(803429)
subset_indices = rand(1:size(X, 1), 100)
X, y = X[subset_indices, :], y[subset_indices]
Imbalance.checkbalance(y)

versicolor: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 28 (65.1%) 
virginica:  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 29 (67.4%) 
setosa:     ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 43 (100.0%) 


### Instantiating the model

Let's load `BorderlineSMOTE1` to oversample the data and `Standardizer` to standardize it.

In [4]:
BorderlineSMOTE1 = @load BorderlineSMOTE1 pkg=Imbalance verbosity=0
NeuralNetworkClassifier = @load NeuralNetworkClassifier pkg=MLJFlux
# We didn't need to load Standardizer because it is a  local model for MLJ (see `localmodels()`)

clf = NeuralNetworkClassifier(
    builder=MLJFlux.MLP(; hidden=(5,4), σ=Flux.relu),
    optimiser=Flux.ADAM(0.01),
    batch_size=8,
    epochs=50,
    rng=42
    )

[ Info: For silent loading, specify `verbosity=0`. 
import MLJFlux ✔


NeuralNetworkClassifier(
  builder = MLP(
        hidden = (5, 4), 
        σ = NNlib.relu), 
  finaliser = NNlib.softmax, 
  optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8, IdDict{Any, Any}()), 
  loss = Flux.Losses.crossentropy, 
  epochs = 50, 
  batch_size = 8, 
  lambda = 0.0, 
  alpha = 0.0, 
  rng = 42, 
  optimiser_changes_trigger_retraining = false, 
  acceleration = CPU1{Nothing}(nothing))

First we wrap the oversampler with the neural network via the `BalancedModel` construct. This comes from `MLJBalancing`
And allows combining resampling methods with MLJ models in a sequential pipeline.

In [5]:
oversampler = BorderlineSMOTE1(k=5, ratios=1.0, rng=42)
balanced_model = BalancedModel(model=clf, balancer1=oversampler)
standarizer = Standardizer()

Standardizer(
  features = Symbol[], 
  ignore = false, 
  ordered_factor = false, 
  count = false)

Now let's compose the balanced model with a standardizer.

In [6]:
pipeline = standarizer |> balanced_model

ProbabilisticPipeline(
  standardizer = Standardizer(
        features = Symbol[], 
        ignore = false, 
        ordered_factor = false, 
        count = false), 
  balanced_model_probabilistic = BalancedModelProbabilistic(
        model = NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), 
        balancer1 = BorderlineSMOTE1(m = 5, …)), 
  cache = true)

By this, any training data will be standardized then oversampled then passed to the model. Meanwhile,
for inference, the standardizer will automatically use the training set's mean and std and the oversampler
will be transparent.

### Training the Composed Model
It's indistinguishable from training a single model.

In [7]:
mach = machine(pipeline, X, y)
fit!(mach)
cv=CV(nfolds=5)
evaluate!(mach, resampling=cv, measure=accuracy)

[ Info: Training machine(ProbabilisticPipeline(standardizer = Standardizer(features = Symbol[], …), …), …).
[ Info: Training machine(:standardizer, …).
[ Info: Training machine(:balanced_model_probabilistic, …).
[ Info: Training machine(BorderlineSMOTE1(m = 5, …), …).
[ Info: Training machine(:model, …).
[ Info: After filtering, the mapping from each class to number of borderline points is ("virginica" => 1, "versicolor" => 2).
[ Info: After filtering, the mapping from each class to number of borderline points is ("virginica" => 1, "versicolor" => 2).
│   The input will be converted, but any earlier layers may be very slow.
│   layer = Dense(4 => 5, relu)  # 25 parameters
│   summary(x) = "4×8 Matrix{Float64}"
└ @ Flux ~/.julia/packages/Flux/Wz6D4/src/layers/stateless.jl:60
[ Info: After filtering, the mapping from each class to number of borderline points is ("virginica" => 3, "versicolor" => 1).
[ Info: After filtering, the mapping from each class to number of borderline points is ("

PerformanceEvaluation object with these fields:
  model, measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows, resampling, repeats
Extract:
┌────────────┬──────────────┬─────────────┬─────────┬───────────────────────────
│[22m measure    [0m│[22m operation    [0m│[22m measurement [0m│[22m 1.96*SE [0m│[22m per_fold                [0m ⋯
├────────────┼──────────────┼─────────────┼─────────┼───────────────────────────
│ Accuracy() │ predict_mode │ 0.98        │ 0.0268  │ [1.0, 1.0, 0.95, 0.95, 1 ⋯
└────────────┴──────────────┴─────────────┴─────────┴───────────────────────────
[36m                                                                1 column omitted[0m


---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*