Introduce cross-validation earlier in module 1 and related changes #415

lesteve · 2021-07-23T14:04:24Z

1. add a new notebook between the notebook 1 and 2 to introduce the model evaluation. It should discuss train_test_split and introduce cross_validate. This would be a notebook called something like "Evaluate your first model".
2. We would move some content from https://inria.github.io/scikit-learn-mooc/python_scripts/02_numerical_pipeline_scaling.html#model-evaluation-using-cross-validation but some discussions can be reduced since we don't use a Pipeline anymore.
3. If necessary, add a small section only speaking about the Pipeline inside cross_validate (I guess when this happens later in the "Preprocessing for numerical features" notebook
4. "Exercise M1.03" -> check if we should add an additional exercise that uses cross-validation. Consensus: let's not add an exercise that may be too simple and distract from the main ideas.

The text was updated successfully, but these errors were encountered:

ArturoAmorQ · 2021-08-06T12:51:00Z

I would add

5. Add discussion/plots on score distributions
6. Add a figure similar to this one illustrating cross-validation. Good enough for now.

lesteve · 2021-08-06T13:29:20Z

Add discussion/plots on score distributions

If it is consise enough why not, I would have thought that module 2 was more the right place for a longer discussion (see #416). I guess we can quickly show here that the score depends on the train-test split which motivates the introduction of cross-validation

Add this figure illustrating cross-validation

The figure you mention has things about hyperparameter tuning so we probably don't want to include this exact figure. We already have a cross-validation figure: https://inria.github.io/scikit-learn-mooc/python_scripts/02_numerical_pipeline_scaling.html#model-evaluation-using-cross-validation. If you think it can be improved (which is quite likely) you can look at the script generating the figure: https://github.com/INRIA/scikit-learn-mooc/blob/master/figures/plot_cross_validation_diagram.py

ArturoAmorQ · 2021-08-12T12:37:09Z

My idea / proposition for points 5. and 6. Notice that the distribution plot matches the example.

lesteve · 2022-11-23T13:37:50Z

I think most of this has been tackled.

lesteve mentioned this issue Jul 23, 2021

Proposal for reordering the contents in M1 and M2 #398

Closed

ArturoAmorQ mentioned this issue Aug 10, 2021

Accessing coefficients inside a pipeline #435

Closed

ArturoAmorQ mentioned this issue Aug 17, 2021

Restructuring of module 2 #416

Closed

3 tasks

lesteve mentioned this issue Sep 17, 2021

Notebook introducing cross-validation uses train_test_split #340

Closed

lesteve added this to the MOOC 3.0 milestone Jan 6, 2022

lesteve modified the milestones: MOOC 3.0, MOOC 4.0 Oct 18, 2022

lesteve closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce cross-validation earlier in module 1 and related changes #415

Introduce cross-validation earlier in module 1 and related changes #415

lesteve commented Jul 23, 2021 •

edited

Loading

ArturoAmorQ commented Aug 6, 2021 •

edited by lesteve

Loading

lesteve commented Aug 6, 2021

ArturoAmorQ commented Aug 12, 2021

lesteve commented Nov 23, 2022

Introduce cross-validation earlier in module 1 and related changes #415

Introduce cross-validation earlier in module 1 and related changes #415

Comments

lesteve commented Jul 23, 2021 • edited Loading

ArturoAmorQ commented Aug 6, 2021 • edited by lesteve Loading

lesteve commented Aug 6, 2021

ArturoAmorQ commented Aug 12, 2021

lesteve commented Nov 23, 2022

lesteve commented Jul 23, 2021 •

edited

Loading

ArturoAmorQ commented Aug 6, 2021 •

edited by lesteve

Loading