More arrows #307

tlienart · 2019-10-30T22:09:05Z

This is to keep track of things that can be added to the arrow syntax.

stick `source` where needed

this

pca = X |> Standardizer() |> PCA(maxoutdim=2)

works fine. for the supervised case it'd be nice to do something like

pipe = (X |> Standardizer(), y) |> MultinomialClassifier()

that doesn't work because y is not recognised well; this however

(Xtrain |> OneHotEncoder(), source(ytrain, kind=:target)) |> DecisionTreeClassifier()

is fine but of course ugly, that should be an easy fix in the arrow syntax definition that when passed a tuple, if it's not a node, then slap a node on it.

arrow on hcat

In stacking, it can be nice to hcat the output of nodes and then feed the result in a later layer, currently this doesn't fully work because the hcat will not lead to a table. The easy way out is just to use the MLJBase.table if it receives data that is a matrix.

The text was updated successfully, but these errors were encountered:

vollmersj · 2020-02-28T16:39:37Z

I thought this could work with insertcols! (however one needs to copy the data otherwise one has multiple columns with same name)

f(x)=insertcols!(copy(x),1,p=x[:x1] .* x[:x2])
W = X |> MLJBase.StaticTransformer(;f= f) |> UnivariateStandardizer()

MVP that doesnt work


using MLJ, DataFrames, Random
MLJ.color_off() # hide
@load RidgeRegressor pkg=MultivariateStats
Random.seed!(5) # for reproducibility
x1 = rand(300)
x2 = rand(300)
x3 = rand(300)
y = exp.(x1 - x2 -2x3 + 0.1*rand(300))
X = DataFrame(x1=x1, x2=x2, x3=x3)

using MLJBase
# Let's start with `W` and `z` (the "first layer"):
f(x)=insertcols!(copy(x),1,p=x[:x1] .* x[:x2])
W = X |> MLJBase.StaticTransformer(;f= f) |> UnivariateStandardizer()

z = y |> UnivariateBoxCoxTransformer()

ẑ = (W, z) |> RidgeRegressor(lambda=0.1);

ablaom · 2020-03-04T22:31:12Z

@vollmersj

Here are examples of inserting static transformers into composite models. Due to a bug I discovered in preparing this, you will need MLJBase 0.11.10 just released.

It would be awesome if you could use part of these to improve the documentation and tutorials regarding static transformers #393 .

Note: For now, at least, I recommend that you always explicitly do Xs =source(X) and ys=source(y, kind=:target) before using arrow syntax, to avoid possible confusion about which data is input and which is target.

# March 4th

using MLJ, DataFrames, Random
MLJ.color_off() # hide
@load RidgeRegressor pkg=MultivariateStats
Random.seed!(5) # for reproducibility
x1 = rand(300)
x2 = rand(300)
x3 = rand(300)
y = exp.(x1 - x2 -2x3 + 0.1*rand(300))
X = DataFrame(x1=x1, x2=x2, x3=x3)

using MLJBase

f(X)=insertcols!(copy(X), 1, p=X[:,:x1] .* X[:,:x2])

# Solution 1 (simplest): Just use @pipeline

comp = @pipeline Comp(f, std=Standardizer(),
                      rgs=RidgeRegressor(),
                      target=UnivariateBoxCoxTransformer())
e = evaluate(comp, X, y, measure=mae, resampling=CV())

# What if your transformer has parameters? Then you need a static
# transformer:

mutable struct MyTransformer <: Static
    ftr::Symbol
end

MLJBase.transform(transf::MyTransformer, verbosity, X) =
    insertcols!(copy(X), 1, p=X[:,transf.ftr] .* X[:,:x2])

comp2 = @pipeline Comp2(transf=MyTransformer(:x3), std=Standardizer(),
                      rgs=RidgeRegressor(),
                      target=UnivariateBoxCoxTransformer())

comp2.transf.ftr = :x1 # change the parameter

e2 =  evaluate(comp2, X, y, measure=mae, resampling=CV())
@assert e2.measurement[1] ≈ e.measurement[1]

# Solution 2: Using learning network:

Xs = source(X)
ys = source(y, kind=:target)

ridge = RidgeRegressor()

# overload your function for nodes:
f(X::AbstractNode) = node(f, X)

W = Xs |> f |> Standardizer()
z = ys |> UnivariateBoxCoxTransformer()
zhat = (W, z) |> ridge
yhat = zhat |> inverse_transform(z)

# # or, without arrow syntax:
# X2 = f(Xs)
# W = transform(machine(Standardizer(), X2), X2)

# box_mach = machine(UnivariateBoxCoxTransformer(), ys)
# z = transform(box_mach, ys)

# ridge_mach = machine(ridge, W, z)
# zhat = predict(ridge_mach, W)
# yhat = inverse_transform(box_mach, zhat)

comp3 = @from_network Comp3(rgs=ridge) <= yhat

e3 =  evaluate(comp3, X, y, measure=mae, resampling=CV())
@assert e2.measurement[1] ≈ e.measurement[1]

# Or if you need parameters for your static transformer:

inserter = MyTransformer(:x3)

W = Xs |> inserter |> Standardizer()
z = ys |> UnivariateBoxCoxTransformer()
zhat = (W, z) |> ridge
yhat = zhat |> inverse_transform(z)

# # or, without arrow syntax:
# inserter_mach = machine(inserter)
# X2 = transform(inserter_mach, Xs)
# W = transform(machine(Standardizer(), X2), X2)

# box_mach = machine(UnivariateBoxCoxTransformer(), ys)
# z = transform(box_mach, ys)

# ridge_mach = machine(ridge, W, z)
# zhat = predict(ridge_mach, W)
yhat = inverse_transform(box_mach, zhat)

comp4 = @from_network Comp4(transf=inserter, rgs=ridge) <= yhat

comp4.transf.ftr = :x1 # change the parameter

e4 =  evaluate(comp4, X, y, measure=mae, resampling=CV())
@assert e4.measurement[1] ≈ e.measurement[1]

tlienart self-assigned this Oct 30, 2019

tlienart added arrow syntax enhancement New feature or request labels Oct 30, 2019

tlienart mentioned this issue Nov 1, 2019

Arrows #314

Merged

ablaom mentioned this issue Mar 15, 2020

Add tutorial for StaticTransformer JuliaAI/DataScienceTutorials.jl#24

Open

ablaom closed this as completed Dec 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More arrows #307

More arrows #307

tlienart commented Oct 30, 2019

vollmersj commented Feb 28, 2020

ablaom commented Mar 4, 2020 •

edited

Loading

More arrows #307

More arrows #307

Comments

tlienart commented Oct 30, 2019

stick source where needed

arrow on hcat

vollmersj commented Feb 28, 2020

ablaom commented Mar 4, 2020 • edited Loading

stick `source` where needed

ablaom commented Mar 4, 2020 •

edited

Loading