Skip to content

Latest commit

 

History

History
69 lines (53 loc) · 2.14 KB

linear_pipelines.md

File metadata and controls

69 lines (53 loc) · 2.14 KB

Linear Pipelines

In MLJ a pipeline is a composite model in which models are chained together in a linear (non-branching) chain. For other arrangements, including custom architectures via learning networks, see Composing Models.

For purposes of illustration, consider a supervised learning problem with the following toy data:

using MLJ
MLJ.color_off()
using MLJ
X = (age    = [23, 45, 34, 25, 67],
     gender = categorical(['m', 'm', 'f', 'm', 'f']));
y = [67.0, 81.5, 55.6, 90.0, 61.1]
     nothing # hide

We would like to train using a K-nearest neighbor model, but the model type KNNRegressor assumes the features are all Continuous. This can be fixed by first:

  • coercing the :age feature to have Continuous type by replacing X with coerce(X, :age=>Continuous)
  • standardizing continuous features and one-hot encoding the Multiclass features using the ContinuousEncoder model

However, we can avoid separately applying these preprocessing steps (two of which require fit! steps) by combining them with the supervised KKNRegressor model in a new pipeline model, using Julia's |> syntax:

KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels
pipe = (X -> coerce(X, :age=>Continuous)) |> ContinuousEncoder() |> KNNRegressor(K=2)

We see above that pipe is a model whose hyperparameters are themselves other models or a function. (The names of these hyper-parameters are automatically generated. To specify your own names, use the explicit Pipeline constructor instead.)

The |> syntax can also be used to extend an existing pipeline or concatenate two existing pipelines. So, we could instead have defined:

pipe_transformer = (X -> coerce(X, :age=>Continuous)) |> ContinuousEncoder()
pipe = pipe_transformer |> KNNRegressor(K=2)

A pipeline is just a model like any other. For example, we can evaluate its performance on the data above:

evaluate(pipe, X, y, resampling=CV(nfolds=3), measure=mae)

To include target transformations in a pipeline, wrap the supervised component using TransformedTargetModel.

Pipeline