Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrap-up table for ensemble methods #448

Closed
ArturoAmorQ opened this issue Aug 31, 2021 · 5 comments · Fixed by #706
Closed

Add wrap-up table for ensemble methods #448

ArturoAmorQ opened this issue Aug 31, 2021 · 5 comments · Fixed by #706
Labels
good first issue Good for newcomers
Milestone

Comments

@ArturoAmorQ
Copy link
Collaborator

Adding a wrap-up table to summarize the differences and similarities between bagging vs. boosting methods (ways of training, combining and computation times) may help setting down ideas and improve the success rate of Quiz M6.3 Q1 (which is currently below 70%).

It could be at the end of the Ensemble based on boosting lectures, i.e., inside the ensemble_hist_gradient_boosting notebook.

What do you think?

ogrisel added a commit to scikit-learn-inria-fondation/follow-up that referenced this issue Sep 7, 2021
## August 31th, 2021

### Gael

* TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig

### Olivier

- input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs
  - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`)
- reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444)
- next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release)
- next: assist Adrin for the release process
- next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432)
- next: come back to intel to write a technical roadmap for a possible collaboration

### Julien

 - Was on holidays
 - Planned week @ Nexedi, Lille, from September 13th to 17th
 - Reviewed PRs
     - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module
     - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE)
     - Others PRs prior to holidays
 - [`#20254`](scikit-learn/scikit-learn#20254)
     - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex
     - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs
     - Next: comparing against scipy's 
     - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help
- Next: I need to block time to study Cython code.

### Mathis
- `sklearn_benchmarks`
  - Adapting benchmark script to run on Margaret
  - Fix issue with profiling files too big to be deployed on Github Pages
  - Ensure deterministic benchmark results
  - Working on declarative pipeline specification
  - Next: run long HPO benchmarks on Margaret

### Arturo

- Finished MOOC!
- Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432))
    - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22)
    - currently working on expanding the notes up to 70%
- Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448)

### Jérémie
- back from holidays, catching up
- Mathis' benchmarks
- trying to find what's going on with ASV benchmarks
  (asv should display the versions of all build and runtime depndencies for each run)

### Guillaume

- back from holidays
- Next:
    - release with Adrin
    - check the PR and issue trackers

### TODO / Next

- Expand Loïc’s notes up to 70% (Arturo)
- Create presentation to discuss my experience doing the MOOC (Arturo)
- Help with the scikit-learn release (Olivier, Guillaume)
- HR: Jeremy's renewal, Chiara's replacement (Gael)
- Mathis's consulting gig (Olivier, Gael, Mathis)
@lesteve
Copy link
Collaborator

lesteve commented Jan 6, 2022

There is this table for RandomForest vs Bagging https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_random_forest.html but we don't have a table for comparing bagging and boosting.

@lesteve lesteve added this to the MOOC 3.0 milestone Jan 6, 2022
@ogrisel
Copy link
Collaborator

ogrisel commented Feb 16, 2022

I think it's a good idea.

@ArturoAmorQ ArturoAmorQ added the good first issue Good for newcomers label Oct 10, 2022
@lesteve lesteve modified the milestones: MOOC 3.0, MOOC 4.0 Oct 18, 2022
@ogrisel
Copy link
Collaborator

ogrisel commented Aug 30, 2023

We could have a similar table at the end of this notebook:

https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_hyperparameters.html

or in a new notebook (without any code) right after this one.

@ArturoAmorQ
Copy link
Collaborator Author

ArturoAmorQ commented Aug 30, 2023

We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in #471.

Should we still add it in a notebook?

@ogrisel
Copy link
Collaborator

ogrisel commented Aug 31, 2023

We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in #471.
Should we still add it in a notebook?

Only if we expand it a bit, for instance by including extra info about the influence of important hyper-parameters, e.g.:

  • too many trees can cause overfitting in gradient boosting but not for random forests.

  • Gradient boosting requires tuning a learning rate parameter while random forests do no have such a parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants