Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the Explicit tuning strategy #836

Merged
merged 4 commits into from Aug 23, 2021
Merged

Document the Explicit tuning strategy #836

merged 4 commits into from Aug 23, 2021

Conversation

rikhuijzer
Copy link
Contributor

In response to #822.

@ablaom, In this PR, I've tried to add an example for nested cross-validation as discussed before. You said it could be done now since JuliaAI/MLJTuning.jl#142. However, I cannot figure out how to do it. I would expect this PR to work, but instead I get:

c.value = ArgumentError: You have specified an explicit iterator `models` of MLJModels and so cannot specify any `tuning` strategy except `Explicit`. Either omit the `tuning=...` specification, or specify a *single* model using `model=...` instead.

@ablaom
Copy link
Collaborator

ablaom commented Aug 18, 2021

@rikhuijzer Thanks for returning to this! Have you looked at the example in the tests?

https://github.com/JuliaAI/MLJTuning.jl/blob/dev/test/strategies/explicit.jl

@rikhuijzer
Copy link
Contributor Author

rikhuijzer commented Aug 19, 2021

Have you looked at the example in the tests?

Yes, I did. But what I unfortunately don't understand is how to do a nested cross-validation with that. For a nested cross-validation, you need to do, basically, for multiple holdouts (outer loop), do a cross-validation over the models (the inner loop). Finally, after having finished the inner loop, test the best performing model against the holdout.

As far as I see, I can only do cross-validation runs over multiple models with the Explicit strategy, but maybe I'm missing something.

@ablaom
Copy link
Collaborator

ablaom commented Aug 20, 2021

Right. Here's an example:

# models:
tree = (@load DecisionTreeClassifier pkg=DecisionTree)()
knn = (@load KNNClassifier pkg=NearestNeighborModels)()


# data:
X, y = make_blobs();

# model equivalent to best in `models` using 3-fold CV:
blended = TunedModel(models=[tree, knn],
                     resampling=CV(nfolds=3),
                     measure=log_loss)

# evaluating `blended` implies nested cross-validation (each model
# gets evaluated 2 x 3 times):
e = evaluate(blended, X, y,
             resampling=CV(nfolds=2),
             measure=log_loss,
             verbosity=6)

# measurements at any level are accessible, eg:
e.report_per_fold[1].history[1].per_fold

Does that help?

BTW, I found that closed PR. It is here.

@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2021

Codecov Report

Merging #836 (7052f76) into dev (5d1a6c6) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##              dev     #836   +/-   ##
=======================================
  Coverage   28.84%   28.84%           
=======================================
  Files           2        2           
  Lines          52       52           
=======================================
  Hits           15       15           
  Misses         37       37           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d1a6c6...7052f76. Read the comment docs.

@rikhuijzer
Copy link
Contributor Author

rikhuijzer commented Aug 22, 2021

Thanks a lot! After commit 7052f76, the tutorial looks as follows:

Tuning multiple models

We can also compare multiple models via the tuning strategy. This can, for example, be used to run nested cross-validation. In summary, normal (k-fold) cross-validation would check one or more models and split the data into k folds. Nested cross-validation, first, splits the data in multiple train and test sets and, then, runs cross-validation for each model in order to reduce bias. To do this in MLJ, we use a TunedModel:

tree = (@load DecisionTreeClassifier pkg=DecisionTree verbosity=0)()
knn = (@load KNNClassifier pkg=NearestNeighborModels verbosity=0)()

This model is equivalent to best in models by using 3-fold cross-validation:

blended = TunedModel(models=[tree, knn],
                     resampling=CV(nfolds=3),
                     measure=log_loss,
                     check_measure=false)

Evaluating blended implies nested cross-validation (each model gets evaluated 2 x 3 times):

X, y = make_blobs()

e = evaluate(blended, X, y,
             resampling=CV(nfolds=2),
             measure=log_loss,
             verbosity=6)
┌───────────────────────┬───────────────┬──────────────┐
│ _.measure             │ _.measurement │ _.per_fold   │
├───────────────────────┼───────────────┼──────────────┤
│ LogLoss{Float64} @113 │ 1.15          │ [0.106, 2.2] │
└───────────────────────┴───────────────┴──────────────┘
_.per_observation = [[[2.22e-16, 0.223, ..., 2.22e-16], [2.22e-16, 0.223, ..., 2.22e-16]]]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]
_.train_test_rows = [ … ]

Now, for example, we can get the best model for the first fold out of the two folds:

e.report_per_fold[1].best_model
KNNClassifier(
    K = 5,
    algorithm = :kdtree,
    metric = Distances.Euclidean(0.0),
    leafsize = 10,
    reorder = true,
    weights = NearestNeighborModels.Uniform()) @473

And the losses in the outer loop (these still have to be matched to the best performing model):

e.per_fold
1-element Vector{Vector{Float64}}:
 [0.10583412615667337, 2.200087272512401]

It is also possible to get the results for the nested evaluations. For example, for the first fold of the outer loop and the second model:

e.report_per_fold[2].history[1]
(model = DecisionTreeClassifier @726,
 measure = LogLoss{Float64}[LogLoss{Float64} @113],
 measurement = [2.20855719296061],
 per_fold = [[2.2204460492503136e-16, 2.1202149052421855, 4.505456673639644]],)

@rikhuijzer
Copy link
Contributor Author

Feel free to update the text or ask for more changes 👍

Copy link
Collaborator

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rikhuijzer Thanks for this. Much appreciated.

It may take a few days to get this successfully deployed as there is a holdup here.

@ablaom ablaom merged commit d6a0d56 into alan-turing-institute:dev Aug 23, 2021
@rikhuijzer
Copy link
Contributor Author

It may take a few days to get this successfully deployed as there is a holdup here.

That one seems, also, to be blocking me at 2 places (TuringTutorials.jl and my blog).

Thanks for this. Much appreciated.

Anyway, thanks to you! ❤️ I'm very happily gonna use this for my next paper.

@rikhuijzer rikhuijzer deleted the document-explicit branch September 23, 2021 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants