Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing intermediate results of a Composite Model #841

Open
olivierlabayle opened this issue Sep 16, 2021 · 16 comments
Open

Storing intermediate results of a Composite Model #841

olivierlabayle opened this issue Sep 16, 2021 · 16 comments
Assignees
Labels
design discussion Discussing design issues

Comments

@olivierlabayle
Copy link

Hi!

Is your feature request related to a problem? Please describe.
I am trying to use the learning network API and would like to store additional results in the fitresult of my composite model. Could you provide some guidance on how to do this properly?

Describe the solution you'd like
Ideally I'd like to be able to store the value of any node that was computed at training time.

Describe alternatives you've considered
It seems that only the submodels fitresults are natively stored, so one way to do it I guess would to define some kind of ResultModel as a submodel for whatever value I would like and compute the result in the fit! function of this model.

Additional context

I must add that the learning network I am trying to build is not regular in that it will never be used for prediction however I feel that what I'm trying to do may be of general use in MLJ.

For instance, the following works fine except that I can't retrieve the value of the final node because of the anonymization in the return!. Moreover I don't think this is appropriate as I guess all computations (except fitting) would be made again each time I call the node right?

using MLJ

LinearRegressor = @load LinearRegressor pkg=MLJLinearModels verbosity = 0


mutable struct MyModel <: MLJ.DeterministicComposite
    model
end


function MLJ.fit(m::MyModel, verbosity, X, y)
    Xs = source(X)
    ys = source(y)
    mach = machine(m.model, Xs, ys)
    ypred = MLJ.predict(mach, Xs)
    μpred = node(x->mean(x), ypred)
    σpred = node((x, μ)->mean((x.-μ).^2), ypred, μpred)
    mach = machine(Deterministic(), Xs, ys; predict=σpred)
    fitresult, cache, report = return!(mach, m, verbosity)
    mach.fitresult = (σpred=σpred, fitresult...)
    return mach.fitresult, cache, report
end

X, y = make_regression(500, 5)
mach = machine(MyModel(LinearRegressor()), X, y)
fit!(mach)
fitted_params(mach)
mach.fitresult.σpred()
@ablaom
Copy link
Member

ablaom commented Sep 17, 2021

@olivierlabayle Thanks for raising this interesting question about creating new interface points for composite models.

Of course, if you are not interested in the ordinary predict output, you could just define the predict node to be σpred with return!(machine(Xs, ys, predict=σpred), model, verbosity). But I don't think that is what you are getting at right? You are looking to add ways of accessing information in addition to what can be extracted from predict and transform (you can already define both, incidentally).

But I am interested in clarifying exactly what you want here. I see that there are two possible objectives here. Do you want the output of σpred on the training data, to be recorded somehow in the report or fitted params or are effectively seeking to add a new operation that can be called on new data, like a predict operation? That is, are we trying to record extra data as a bi-product of training, or do we want to add extra functions that dispatch on both new data and the outcomes of training?

@olivierlabayle
Copy link
Author

olivierlabayle commented Sep 17, 2021

@ablaom Thanks for getting back at me so quickly.

I am working in causality, which means that my scenario differs from the traditional MLJ framework in the following ways:

  • I don't have data as (X, y) but rather (X, W, y).
  • I don't really have a predict time. In MLJ the learning algorithm outputs a prediction function, while I am only interested in outputing a real number (or vector).

The reason why I am so interested in the learning network API is that I think it provides a nice caching and scheduling mechanism. For instance, again in my use case, I might want to change one hyperparameter of model3 (see below) so that the whole procedure will not refit model1 and model2 because their upstream has not changed.

To cut it short, I think using the predict node (or more reasonably defining a new operation node) might work for me (as in the following ) but I don't want the computations to happen twice. Moreover this currently doesn't work because predict expects the data to be (X, y). The other solution would be to record some state of information at fit time as you mention, it seems both more appropriate for my use case and still useful fo general MLJ users (For instance I initially wanted to report the scores of the learners in the Stack). For general MLJ users that would be in addition to the predict function and for me it would be all I require.

Hope this helps!

using MLJ

LinearRegressor = @load LinearRegressor pkg=MLJLinearModels verbosity = 0


mutable struct MyModel <: MLJ.DeterministicComposite
    model1
    model2
    model3
end


function MLJ.fit(m::MyModel, verbosity, X, W, y)
    Xs = source(X)
    Ws = source(W)
    ys = source(y)

    mach1 = machine(m.model1, Xs, ys)
    mach2 = machine(m.model2, Ws, ys)

    ypred1 = MLJ.predict(mach1, Xs)
    ypred2 = MLJ.predict(mach2, Ws)

    Y = hcat(ypred1, ypred2)

    mach3 = machine(m.model3, Y, ys)

    ypred3 = MLJ.predict(mach3, Y)

    μpred = node(x->mean(x), ypred3)
    σpred = node((x, μ)->mean((x.-μ).^2), ypred3, μpred)

    estimate = node((μ, σ2)->(μ, σ2), μpred, σpred)

    mach = machine(Deterministic(), Xs, ys; predict=estimate)

    return!(mach, m, verbosity)

end

X, y = make_regression(500, 5)
model = MyModel(LinearRegressor(), LinearRegressor(), LinearRegressor())
mach = machine(model, X, X, y)
fit!(mach)
estimate = MLJ.predict(mach)

@ablaom
Copy link
Member

ablaom commented Sep 23, 2021

@olivierlabayle I've played around with this a bit today and will get your feedback on one experiment in the next day or so.

@ablaom ablaom added the design discussion Discussing design issues label Sep 23, 2021
@olivierlabayle
Copy link
Author

@ablaom That's great, very happy to hear that, thanks a lot!

@ablaom
Copy link
Member

ablaom commented Sep 24, 2021

@olivierlabayle Please have a look at JuliaAI/MLJBase.jl#644 which addresses the original suggestion and give me your feedback.

I think in the immediate term causal inference with targeted learning is out-of-scope. My focus for the next few months will be moving towards version 1.0.

Perhaps you can hack around the other obstacles for now, eg by exporting a predict node that you have no intention of using.

You might also want to conceptualise your model as a transformer with a single tuple (X, W, y) as input, which you split up.

@olivierlabayle
Copy link
Author

Yes I understand and I wasn't planning on having a dedicated MLJ structure for this. As you say, I will be hacking a bit, for now it's a mode with unused predict node but I like the transformer idea. I think with this pull request I should be good to go and benefit from the learning network machinery.

@ablaom ablaom self-assigned this Sep 28, 2021
@davnn
Copy link
Collaborator

davnn commented Oct 5, 2021

Wouldn't it be more intuitive/self-explanatory to add a report kwarg to the surrogate machine call that takes a named tuple input? It would also allow fitted_params if it's necessary at some point in the future.

mach = machine(Deterministic(), Xs, ys; predict=ypred3, μpred=μpred, σpred=σpred)

would become

mach = machine(Deterministic(), Xs, ys; predict=ypred3, report=(μpred=μpred, σpred=σpred))

@ablaom I also stumbled over this issue while implementing composite detectors, which should store training scores in the report for the composite model.

@ablaom
Copy link
Member

ablaom commented Oct 5, 2021

@davnn Thanks for chiming in here.

I also thought of this, but it seemed a bit more complicated. But yes, as you say this may be "more intuitive/self-explanatory". I should be happy to make that change.

which should store training scores in the report for the composite model.

Ah, yes, I can imagine that could be so. Does this mean we need to expedite this somewhat? Currently this is low on my priorities as I am swamped with other stuff.

@davnn
Copy link
Collaborator

davnn commented Oct 6, 2021

Ah, yes, I can imagine that could be so. Does this mean we need to expedite this somewhat? Currently this is low on my priorities as I am swamped with other stuff.

Nope, consider it low prio as well, just using a custom return! for now.

@olivierlabayle
Copy link
Author

@ablaom Thank you for managing to do this feature!

@davnn
Copy link
Collaborator

davnn commented Aug 18, 2022

I'm having a difficult time converting my custom return! to the new MLJ API (added in JuliaAI/MLJBase.jl#644). Previously, I could just use

function return_with_scores!(network_mach, model, verbosity, scores_train, X)
    fitresult, cache, report = MLJ.return!(network_mach, model, verbosity)
    report = merge(report, (scores=scores_train(X),))
    return fitresult, cache, report
end

instead of return! to add a scores field to the report named tuple. Using the same function with the new MLJ API results in a report = (..., additions = (scores = [1,2,3],...)), which means that there is no longer a unified API (between composite and individual models) to access the training scores. I would now have to check everywhere if the model is a composite and use report.additions.scores, or is there a better solution?

@ablaom
Copy link
Member

ablaom commented Aug 19, 2022

@davnn Good point. I suggest we add a raw_training_scores accessor function as suggested in the tracking issue cross-referenced above.

What do you think?

@davnn
Copy link
Collaborator

davnn commented Aug 21, 2022

Thank you for your detailed thoughts on how we could go forward. I need some more time to think about it. I'm a bit afraid of feature creep in MLJ, but maybe that's not a big problem.

@ablaom
Copy link
Member

ablaom commented Aug 23, 2022

Alternatively, we could introduce more generic accessor functions, training_predictions(model, fitresult, report) and training_transformations(model, fitresult, report) which, when implemented, are syntactically equivalent to predict(model, fitresult, Xtrain) and transform(model, fitresult, Xtrain) but more efficient, because they just extract data pre-computed at fit time (and available in fitresult or report)? Mmm, might be a bit abstract for users?

In your use case, you overload training_transformations to return training raw scores for all detectors: for regular detectors, this is report.scores (or whatever - I forget what you call them) and for composite models it's report.additions.scores.

@davnn
Copy link
Collaborator

davnn commented Aug 23, 2022

I would prefer to keep the API simple with a report that can flexibly accommodate predictions, transformations or whatever the algorithm could produce. Strangely enough, predict(model, fitresult, Xtrain) would NOT result in the training scores observed during fit for neighbor-based methods, because predict would compare the points in Xtrain to Xtrain, but fit ignores the first (trivial) neighbor.

It might make sense to follow the uniform access principle for things like the models' report, i.e. discourage or even disallow direct access to model intrinsics such as model.report and encourage report(model), which could be easily customized on a per-model basis to return any custom format.

@ablaom
Copy link
Member

ablaom commented Aug 29, 2022

Thanks for these points. I have some ideas about how to do this properly (and also how to greatly simplify the learning networks "export" process) but it's going to take a little time. I will keep you posted, and I appreciate your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design discussion Discussing design issues
Projects
None yet
Development

No branches or pull requests

3 participants