Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract y and yhat for each test fold from results of evaluate! #575

Open
CameronBieganek opened this issue Jun 18, 2020 · 3 comments
Open

Comments

@CameronBieganek
Copy link

Sometimes one wants to look at the actual and predicted y values for each test fold in a cross-validation. For example, one might want to make a plot of the residuals versus the predicted values. As far as I can tell, there's not an easy way to do that right now.

This is mentioned in #89, but I thought it would be good to have a more specific issue.

The scikit-learn equivalent of this feature request is cross_val_predict().

@ablaom
Copy link
Member

ablaom commented Jun 25, 2020

Well, you can give evaluate/evaluate! a custom measure (see https://alan-turing-institute.github.io/MLJ.jl/dev/performance_measures/#Traits-and-custom-measures-1) and "measure" can be just about any function of the data (this issue notwithstanding: JuliaAI/MLJBase.jl#352). So, how about this?

y = float.(1:12)

model = ConstantRegressor()

predicted_target(yhat, y) = yhat
target(yhat, y) = y
MLJ.reports_each_observation(::typeof(predicted_target)) = true
MLJ.reports_each_observation(::typeof(target)) = true

e = evaluate(model, X, y,
         measures=[predicted_target, target],
         resampling=CV(nfolds=3),
         operation=predict_mean)

julia> e.per_observation
2-element Array{Array{Array{Float64,1},1},1}:
 [[8.5, 8.5, 8.5, 8.5], [6.5, 6.5, 6.5, 6.5], [4.5, 4.5, 4.5, 4.5]]   
 [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]]

@CameronBieganek
Copy link
Author

Thanks! That's a pretty good solution. However, I think it would be worth adding a method to the MLJ API that does this out of the box. I think something like the following could be pretty useful.

struct CVPredictions{T}
    ŷ::Vector{<:AbstractVector{T}}
    y::Vector{<:AbstractVector{T}}
end

cv_predict(model, X, y) = # returns a CVPredictions object
cv_predict!(mach) = # returns a CVPredictions object

function evaluate(cvp; measure)
    # evaluate measures on each fold
    # return model evaluation
end

function evaluate(model, X, y; measure)
    cvp = cv_predict(model, X, y)
    evaluate(cvp; measure)
end

function evaluate!(mach; measure)
    cvp = cv_predict!(mach)
    evaluate(cvp; measure)
end

export cv_predict, cv_predict!, evaluate, evaluate!

Motivating example

Here's one example where it would be nice to have the separate cv_predict function. Suppose I have a classification task. Suppose that running evaluate is expensive because I have a lot of data and/or I'm doing a grid search. I would like to be able to do my cross-validation just once and save the predictions to disk. That way if I want to evaluate other metrics that I didn't evaluate the first time, I can simply run evaluate(cvp; measure=newmeasures), which should be fast.

To extend the example, suppose I define a measure like this:

cost = let
    cost_matrix = [0   10;
                   100  0]
    
    function cost(ŷ, y)
        confusion = confusion_matrix(ŷ, y)
        sum(confusion .* cost_matrix)
    end
end

Then if I later decide that I want to change the cost matrix, it would be nice if I could just run evaluate(cvp; measure=cost) instead of having to re-run the cross-validation.

@ablaom
Copy link
Member

ablaom commented Aug 5, 2020

So basically you just want to insert a new interface point. Sounds like a good idea. A few comments:

  • some measures require weightsw (weights) to be evaluated, and some (usually custom) measures also require X. If X is large, this could be a problem. Do you have a suggestion for handling this case?

  • we'd probably want to have acceleration options for cv_predict and evaluate

  • not too crazy on the name, as resampling includes things like Holdout which is not CV. Maybe resample_predict and ResampledPredictions ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants