Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce cross-validation earlier in module 1 and related changes #415

Closed
2 of 4 tasks
lesteve opened this issue Jul 23, 2021 · 4 comments
Closed
2 of 4 tasks

Introduce cross-validation earlier in module 1 and related changes #415

lesteve opened this issue Jul 23, 2021 · 4 comments
Milestone

Comments

@lesteve
Copy link
Collaborator

lesteve commented Jul 23, 2021

  • 1. add a new notebook between the notebook 1 and 2 to introduce the model evaluation. It should discuss train_test_split and introduce cross_validate. This would be a notebook called something like "Evaluate your first model".
  • 2. We would move some content from https://inria.github.io/scikit-learn-mooc/python_scripts/02_numerical_pipeline_scaling.html#model-evaluation-using-cross-validation but some discussions can be reduced since we don't use a Pipeline anymore.
  • 3. If necessary, add a small section only speaking about the Pipeline inside cross_validate (I guess when this happens later in the "Preprocessing for numerical features" notebook
  • 4. "Exercise M1.03" -> check if we should add an additional exercise that uses cross-validation. Consensus: let's not add an exercise that may be too simple and distract from the main ideas.
@ArturoAmorQ
Copy link
Collaborator

ArturoAmorQ commented Aug 6, 2021

I would add

  • 5. Add discussion/plots on score distributions
  • 6. Add a figure similar to this one illustrating cross-validation. Good enough for now.

@lesteve
Copy link
Collaborator Author

lesteve commented Aug 6, 2021

Add discussion/plots on score distributions

If it is consise enough why not, I would have thought that module 2 was more the right place for a longer discussion (see #416). I guess we can quickly show here that the score depends on the train-test split which motivates the introduction of cross-validation

Add this figure illustrating cross-validation

The figure you mention has things about hyperparameter tuning so we probably don't want to include this exact figure. We already have a cross-validation figure: https://inria.github.io/scikit-learn-mooc/python_scripts/02_numerical_pipeline_scaling.html#model-evaluation-using-cross-validation. If you think it can be improved (which is quite likely) you can look at the script generating the figure: https://github.com/INRIA/scikit-learn-mooc/blob/master/figures/plot_cross_validation_diagram.py

@ArturoAmorQ
Copy link
Collaborator

My idea / proposition for points 5. and 6. Notice that the distribution plot matches the example.

CV
plot

@lesteve
Copy link
Collaborator Author

lesteve commented Nov 23, 2022

I think most of this has been tackled.

@lesteve lesteve closed this as completed Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants