Add wrap-up table for ensemble methods #448

ArturoAmorQ · 2021-08-31T09:42:24Z

Adding a wrap-up table to summarize the differences and similarities between bagging vs. boosting methods (ways of training, combining and computation times) may help setting down ideas and improve the success rate of Quiz M6.3 Q1 (which is currently below 70%).

It could be at the end of the Ensemble based on boosting lectures, i.e., inside the ensemble_hist_gradient_boosting notebook.

What do you think?

## August 31th, 2021 ### Gael * TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig ### Olivier - input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`) - reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444) - next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release) - next: assist Adrin for the release process - next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432) - next: come back to intel to write a technical roadmap for a possible collaboration ### Julien - Was on holidays - Planned week @ Nexedi, Lille, from September 13th to 17th - Reviewed PRs - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE) - Others PRs prior to holidays - [`#20254`](scikit-learn/scikit-learn#20254) - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs - Next: comparing against scipy's - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help - Next: I need to block time to study Cython code. ### Mathis - `sklearn_benchmarks` - Adapting benchmark script to run on Margaret - Fix issue with profiling files too big to be deployed on Github Pages - Ensure deterministic benchmark results - Working on declarative pipeline specification - Next: run long HPO benchmarks on Margaret ### Arturo - Finished MOOC! - Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432)) - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22) - currently working on expanding the notes up to 70% - Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448) ### Jérémie - back from holidays, catching up - Mathis' benchmarks - trying to find what's going on with ASV benchmarks (asv should display the versions of all build and runtime depndencies for each run) ### Guillaume - back from holidays - Next: - release with Adrin - check the PR and issue trackers ### TODO / Next - Expand Loïc’s notes up to 70% (Arturo) - Create presentation to discuss my experience doing the MOOC (Arturo) - Help with the scikit-learn release (Olivier, Guillaume) - HR: Jeremy's renewal, Chiara's replacement (Gael) - Mathis's consulting gig (Olivier, Gael, Mathis)

lesteve · 2022-01-06T10:02:25Z

There is this table for RandomForest vs Bagging https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_random_forest.html but we don't have a table for comparing bagging and boosting.

ogrisel · 2022-02-16T10:09:52Z

I think it's a good idea.

ogrisel · 2023-08-30T09:05:02Z

We could have a similar table at the end of this notebook:

https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_hyperparameters.html

or in a new notebook (without any code) right after this one.

ArturoAmorQ · 2023-08-30T09:16:26Z

We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in #471.

Should we still add it in a notebook?

ogrisel · 2023-08-31T11:49:01Z

We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in #471.
Should we still add it in a notebook?

Only if we expand it a bit, for instance by including extra info about the influence of important hyper-parameters, e.g.:

too many trees can cause overfitting in gradient boosting but not for random forests.
Gradient boosting requires tuning a learning rate parameter while random forests do no have such a parameter.

lesteve added this to the MOOC 3.0 milestone Jan 6, 2022

ArturoAmorQ added the good first issue Good for newcomers label Oct 10, 2022

lesteve modified the milestones: MOOC 3.0, MOOC 4.0 Oct 18, 2022

ArturoAmorQ mentioned this issue Aug 31, 2023

Replace GBDT by HGBT and add wrap-up table #706

Merged

ArturoAmorQ closed this as completed in #706 Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wrap-up table for ensemble methods #448

Add wrap-up table for ensemble methods #448

ArturoAmorQ commented Aug 31, 2021

lesteve commented Jan 6, 2022

ogrisel commented Feb 16, 2022

ogrisel commented Aug 30, 2023

ArturoAmorQ commented Aug 30, 2023 •

edited

ogrisel commented Aug 31, 2023

Add wrap-up table for ensemble methods #448

Add wrap-up table for ensemble methods #448

Comments

ArturoAmorQ commented Aug 31, 2021

lesteve commented Jan 6, 2022

ogrisel commented Feb 16, 2022

ogrisel commented Aug 30, 2023

ArturoAmorQ commented Aug 30, 2023 • edited

ogrisel commented Aug 31, 2023

ArturoAmorQ commented Aug 30, 2023 •

edited