Implement MLJ interface for linear models #35

ablaom · 2018-12-15T21:49:34Z

Would someone like to implement the new MLJ interface for linear models for which julia code already exists, including:

GLM.jl for many of these

Lasso.jl ~~which needs upgrading from 0.6.~~

MultivarateStats

GLM

basic interface for some of the GLM regression models extension of the GLM interface MLJModels.jl#27

Lasso.jl

lasso regression
fused lasso
trend filter
gamma lasso

Multivariate stats

OLS
Ridge

Relevant:

https://discourse.julialang.org/t/how-to-fit-a-glm-to-all-unnamed-features-of-arbitrary-design-matrix/20490

ablaom · 2018-12-16T08:14:38Z

In response to an offer of help from @tlienart. Some details:

How about you put your implementation of the MLJ "model interface" for
GLM.jl in a module that lives in 'src/builtins/GLM.jl' (where we
currently have the toy "KNN.jl") although your code will probably more
resemble the MulitvariateStats.jl stub where I put the RidgeRegressor
model. (I think we will move away from lazily loaded interface
implementations; if it does not stay in builtins, your code might become a
separate package or, we might try to get GLM.jl to include your
interface in their code.)

I expect you will generally be generally be predicting probabilities
rather than actual target values (this will probably be done in the
RidgeRegressor as well, but isn't at present). There has been some very
recent discussion about exaclty what predict should return in these
cases; see

issue 34

and

issue 33

We will go with @fkiraly recommendations, which are not reflected in the adding_new_models.md document just yet; In particular:

if an algorithm predicts probablities, there is no need to implement
a second predict method that predicts values (i.e., means or by applying
threshold, etc). So only one predict method per model. (We will
dump predict_proba)
the predict method will predict a vector of distribution-objects,
one for each input pattern. (To get the probability of a specific
outcome for the target one will need to call the object on the
outcome of interest, as Franz explains in the first thread
above. However, your interface isn't concerned with this.) I admit I
haven't thought too much about the details of this yet but hopefully
we can just use Distributions.jl for this purpose. I will be turning
to this question first thing when I return from holiday in the new year.

Do keep in mind that in the case of nominal target data, the target
y will arrive to your model as a CategoricalArray which includes
levels in its pool that may or may not be actually be realized in
the data, but which need be incorporated in the distribution object
(with zero probability if they do not occur); see also the
adding_new_models.md doc.

Note that you will need a separate model for each kind of target data
/ response type because each model SomeModel can only have one value for
metadata(SomeModel)[:outputs_are]. (To the possible values "nominal", "ordinal", "multiclass"
and "multivariate" we will now add "probabilistic", meaning
probablities are to be predicted). So you might have these models:

GLMProbabilisticRegressor
GLMProbabilisticClassifier
GLMProbabilisticMulticlassClassifier

and limit the allowed options for the "family" and "link" options
accordingly. Perhaps not worry about models for multivariate targets
just now.

No need for R style "formula". Your model already gets separate input
X and target y and you fit to all input features (columns) of
X. Feature selection will be external to the model interface.

ablaom · 2018-12-16T08:15:22Z

Oops. Closed by accident. :-)

tlienart · 2018-12-17T01:38:07Z

Ok, I'll start this and probably open a NO-MERGE PR for guidance while I get familiar with the interface and more comfortable with the goal

xiaodaigh · 2019-08-16T06:16:28Z

Is there an example on how to use linear models using MLJ.jl? Can anyone please show me a simple example of fitting y = ax+b where a and b are coefficients? E.g. in GLM it would be

using GLM
x = rand(100)
y = rand(100)
data = DataFrame(x=x, y=y)
lm(@formulat(y~x), data)

tlienart · 2019-08-17T14:59:47Z

Hello @xiaodaigh , there's an ongoing PR to interface with GLM models which should be merged next week I would think.

ablaom · 2019-08-18T23:42:26Z

For now you can use OLS (ordinary least squares regressor) or RidgeRegressor. For example:

julia> using MLJ
julia> X = (x1=rand(100), x2=rand(100));   # input must be a Tables.jl compatible table
julia> y = rand(100)
julia> @load  OLSRegressor         # load code from external packages

julia> model =  OLSRegressor()   # instantiate model
OLSRegressor(fit_intercept = true,) @ 4…70

julia> mach = machine(model, X, y)  # bind model to train/evaluation data 
Machine{OLSRegressor} @ 1…97

julia> fit!(mach, rows=1:95)    # fit on selected rows
[ Info: Training Machine{OLSRegressor} @ 1…97.
Machine{OLSRegressor} @ 1…97

julia> predict(mach, rows=96:100) # get (probabilistic) predictions on some other rows
5-element Array{Distributions.Normal{Float64},1}:
 Distributions.Normal{Float64}(μ=0.5573871503802207, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.5910371492542903, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.4871839625605999, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.6031116815100634, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.5461718402936951, σ=0.2789162731813959)

julia> predict_mean(mach, rows=1:5) # get point predictions
5-element Array{Float64,1}:
 0.5573871503802207
 0.5910371492542903
 0.4871839625605999
 0.6031116815100634
 0.5461718402936951

julia> predict_mean(mach, (x1=rand(4), x2=rand(4))) get point predictions on new input data
4-element Array{Float64,1}:
 0.5483367654825207
 0.5948051723537034
 0.4847273704563324
 0.5892571004039957

darenasc added the help wanted Extra attention is needed label Dec 15, 2018

ablaom added the enhancement New feature or request label Dec 15, 2018

ablaom closed this as completed Dec 16, 2018

ablaom reopened this Dec 16, 2018

This was referenced Jan 22, 2019

Interface for LowRankModels.jl #44

Open

GLM - base ols regressor #56

Merged

tlienart mentioned this issue Feb 12, 2019

Implement NaiveBayes.jl #66

Closed

tlienart self-assigned this Jul 12, 2019

ablaom closed this as completed Aug 18, 2019

egolep mentioned this issue Jul 4, 2021

Following end-to-end tutorial on AMES but got error #815

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement MLJ interface for linear models #35

Implement MLJ interface for linear models #35

ablaom commented Dec 15, 2018 •

edited by tlienart

ablaom commented Dec 16, 2018

ablaom commented Dec 16, 2018

tlienart commented Dec 17, 2018

xiaodaigh commented Aug 16, 2019 •

edited

tlienart commented Aug 17, 2019 •

edited

ablaom commented Aug 18, 2019

Implement MLJ interface for linear models #35

Implement MLJ interface for linear models #35

Comments

ablaom commented Dec 15, 2018 • edited by tlienart

ablaom commented Dec 16, 2018

ablaom commented Dec 16, 2018

tlienart commented Dec 17, 2018

xiaodaigh commented Aug 16, 2019 • edited

tlienart commented Aug 17, 2019 • edited

ablaom commented Aug 18, 2019

ablaom commented Dec 15, 2018 •

edited by tlienart

xiaodaigh commented Aug 16, 2019 •

edited

tlienart commented Aug 17, 2019 •

edited