Skip to content

Commit

Permalink
Split ensemble slides into bagging and boosting (#471)
Browse files Browse the repository at this point in the history
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
  • Loading branch information
ArturoAmorQ and ogrisel committed Jan 11, 2022
1 parent 5071d11 commit ac011b5
Show file tree
Hide file tree
Showing 10 changed files with 467 additions and 83 deletions.
14 changes: 6 additions & 8 deletions jupyter-book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -149,34 +149,32 @@ parts:
- caption: Ensemble of models
chapters:
- file: ensemble/ensemble_module_intro
- file: ensemble/ensemble_intuitions_index
sections:
- file: ensemble/slides
- file: ensemble/ensemble_quiz_m6_01
- file: python_scripts/ensemble_introduction
- file: ensemble/ensemble_bootstrap_index
sections:
- file: ensemble/bagging_slides
- file: python_scripts/ensemble_introduction
- file: python_scripts/ensemble_bagging
- file: python_scripts/ensemble_ex_01
- file: python_scripts/ensemble_sol_01
- file: python_scripts/ensemble_random_forest
- file: python_scripts/ensemble_ex_02
- file: python_scripts/ensemble_sol_02
- file: ensemble/ensemble_quiz_m6_02
- file: ensemble/ensemble_quiz_m6_01
- file: ensemble/ensemble_boosting_index
sections:
- file: ensemble/boosting_slides
- file: python_scripts/ensemble_adaboost
- file: python_scripts/ensemble_gradient_boosting
- file: python_scripts/ensemble_ex_03
- file: python_scripts/ensemble_sol_03
- file: python_scripts/ensemble_hist_gradient_boosting
- file: ensemble/ensemble_quiz_m6_03
- file: ensemble/ensemble_quiz_m6_02
- file: ensemble/ensemble_hyperparameters_index
sections:
- file: python_scripts/ensemble_hyperparameters
- file: python_scripts/ensemble_ex_05
- file: python_scripts/ensemble_sol_05
- file: ensemble/ensemble_quiz_m6_04
- file: ensemble/ensemble_quiz_m6_03
- file: ensemble/ensemble_wrap_up_quiz
- file: ensemble/ensemble_module_take_away
- caption: Evaluating model performance
Expand Down
11 changes: 11 additions & 0 deletions jupyter-book/ensemble/bagging_slides.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# 🎥 Intuitions on ensemble models: bagging

TODO: insert video player here once ready

<iframe class="slides"
src="../slides/index.html?file=../slides/bagging.md"></iframe>

To navigate in the slides, **first click on the slides**, then:
- press the **arrow keys** to go to the next/previous slide;
- press **"P"** to toggle presenter mode to see the notes;
- press **"F"** to toggle full-screen mode.
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
# 🎥 Intuitions on ensemble of tree-based models
# 🎥 Intuitions on ensemble models: boosting

<iframe class="video" width="640px" height="480px"
src="https://www.youtube.com/embed/Gv1tPH08ciA?rel=0"
allowfullscreen></iframe>
TODO: insert video player here once ready

<iframe class="slides"
src="../slides/index.html?file=../slides/ensemble.md"></iframe>
src="../slides/index.html?file=../slides/boosting.md"></iframe>

To navigate in the slides, **first click on the slides**, then:
- press the **arrow keys** to go to the next/previous slide;
Expand Down
43 changes: 38 additions & 5 deletions jupyter-book/ensemble/ensemble_quiz_m6_01.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,45 @@
# ✅ Quiz M6.01

```{admonition} Question
Select the correct answers:
By default, a
[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html)
draw:
- a) Both bagging and boosting are combining predictors
- b) Both bagging and boosting are only working with decision trees
- c) Boosting combines predictors sequentially
- d) Bagging combines predictors simultaneously
- a) random samples with replacement over training points
- b) random samples with replacement over features
- c) random samples without replacement over training points
- d) random samples without replacement over features
Hint: it is possible to access the documentation for those classes by
clicking on the links on their names.
```

+++

```{admonition} Question
In a
[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html),
the parameter `base_estimator` can be:
- a) any predictor
- b) a decision tree predictor
- c) a linear model predictor
```

+++

```{admonition} Question
In the context of a classification problem, what are the differences between a
bagging classifier and a random forest classifier:
- a) in a random forest, the base model is always a decision tree
- b) in a random forest, the split threshold values are decided completely at
random
- c) in a random forest, a random resampling is performed both over features
as well as over samples
_Select several answers_
```
54 changes: 27 additions & 27 deletions jupyter-book/ensemble/ensemble_quiz_m6_02.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,43 @@
# ✅ Quiz M6.02

```{admonition} Question
By default, a
[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html)
draw:
- a) random samples with replacement over training points
- b) random samples with replacement over features
- c) random samples without replacement over training points
- d) random samples without replacement over features
Hint: it is possible to access the documentation for those classes by
clicking on the links on their name.
Select the correct statements:
- a) Both bagging and boosting combine several predictors
- b) Both bagging and boosting are based on decision trees
- c) Boosting combines predictors sequentially
- d) Bagging combines predictors simultaneously
_Select several answers_
```

+++

```{admonition} Question
In a
[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html),
the parameter `base_estimator` is:
- a) any predictor
- b) a decision tree predictor
- c) a linear model predictor
Boosting algorithms learn their predictor:
- a) by training predictors in parallel on slightly different datasets
- b) by training predictors sequentially which correct previous prediction errors
- c) by taking a linear combination of weak predictors
_Select several answers_
```

+++

```{admonition} Question
Histogram gradient boosting is an accelerated gradient boosting algorithm that:
In the context of a classification problem, what are the differences between a
bagging classifier and a random forest classifier:
- a) takes a subsample of the original samples
- b) bins the numerical features
- c) takes a subsample of the original features
```

+++

```{admonition} Question
Boosting tends to overfit when increasing the number of predictors:
- a) in a random forest, the base model is always a decision tree
- b) in a random forest, the split threshold values are decided completely at
random
- c) in a random forest, a random resampling is performed both over features
as well as over samples
- a) true
- b) false
```
32 changes: 24 additions & 8 deletions jupyter-book/ensemble/ensemble_quiz_m6_03.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,35 @@
# ✅ Quiz M6.03

```{admonition} Question
Boosting algorithms are building a predictor:
When compared to random forests, gradient boosting is usually trained using:
- a) by training predictors in parallel on slightly different datasets
- b) by training predictors sequentially which will correct errors successively
- c) by taking a linear combination of weak predictors
- a) shallower trees
- b) deeper trees
- c) a subset of features
- d) all features
_Select several answers_
```

+++

```{admonition} Question
Histogram gradient boosting is an accelerated gradient boosting algorithm that:
Which of the hyperparameter(s) do not exist in random forest but exists in gradient boosting:
- a) number of estimators
- b) maximum depth
- c) learning rate
```

+++

```{admonition} Question
Which of the following options are correct about the benefits of ensemble models?
- a) Better generalization performance
- b) Reduced sensitivity to hyperparameter tuning of individual predictors
- c) Better interpretability
- a) takes a subsample of the original samples
- b) bin the original dataset
- c) take a subsample of the original features
_Select several answers_
```
30 changes: 0 additions & 30 deletions jupyter-book/ensemble/ensemble_quiz_m6_04.md

This file was deleted.

Loading

0 comments on commit ac011b5

Please sign in to comment.