Split ensemble slides into bagging and boosting (#471)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
INRIA · Jan 11, 2022 · ac011b5 · ac011b5
1 parent 5071d11
commit ac011b5
Show file tree

Hide file tree

Showing 10 changed files with 467 additions and 83 deletions.
diff --git a/jupyter-book/_toc.yml b/jupyter-book/_toc.yml
@@ -149,34 +149,32 @@ parts:
 - caption: Ensemble of models
   chapters:
   - file: ensemble/ensemble_module_intro
-  - file: ensemble/ensemble_intuitions_index
-    sections:
-    - file: ensemble/slides
-    - file: ensemble/ensemble_quiz_m6_01
-    - file: python_scripts/ensemble_introduction
   - file: ensemble/ensemble_bootstrap_index
     sections:
+    - file: ensemble/bagging_slides
+    - file: python_scripts/ensemble_introduction
     - file: python_scripts/ensemble_bagging
     - file: python_scripts/ensemble_ex_01
     - file: python_scripts/ensemble_sol_01
     - file: python_scripts/ensemble_random_forest
     - file: python_scripts/ensemble_ex_02
     - file: python_scripts/ensemble_sol_02
-    - file: ensemble/ensemble_quiz_m6_02
+    - file: ensemble/ensemble_quiz_m6_01
   - file: ensemble/ensemble_boosting_index
     sections:
+    - file: ensemble/boosting_slides
     - file: python_scripts/ensemble_adaboost
     - file: python_scripts/ensemble_gradient_boosting
     - file: python_scripts/ensemble_ex_03
     - file: python_scripts/ensemble_sol_03
     - file: python_scripts/ensemble_hist_gradient_boosting
-    - file: ensemble/ensemble_quiz_m6_03
+    - file: ensemble/ensemble_quiz_m6_02
   - file: ensemble/ensemble_hyperparameters_index
     sections:
     - file: python_scripts/ensemble_hyperparameters
     - file: python_scripts/ensemble_ex_05
     - file: python_scripts/ensemble_sol_05
-    - file: ensemble/ensemble_quiz_m6_04
+    - file: ensemble/ensemble_quiz_m6_03
   - file: ensemble/ensemble_wrap_up_quiz
   - file: ensemble/ensemble_module_take_away
 - caption: Evaluating model performance

diff --git a/jupyter-book/ensemble/bagging_slides.md b/jupyter-book/ensemble/bagging_slides.md
@@ -0,0 +1,11 @@
+# 🎥 Intuitions on ensemble models: bagging
+
+TODO: insert video player here once ready
+
+<iframe class="slides"
+        src="../slides/index.html?file=../slides/bagging.md"></iframe>
+
+To navigate in the slides, **first click on the slides**, then:
+- press the **arrow keys** to go to the next/previous slide;
+- press **"P"** to toggle presenter mode to see the notes;
+- press **"F"** to toggle full-screen mode.
diff --git a/jupyter-book/ensemble/slides.md → jupyter-book/ensemble/boosting_slides.md b/jupyter-book/ensemble/slides.md → jupyter-book/ensemble/boosting_slides.md
@@ -1,11 +1,9 @@
-# 🎥 Intuitions on ensemble of tree-based models
+# 🎥 Intuitions on ensemble models: boosting
 
-<iframe class="video" width="640px" height="480px"
-        src="https://www.youtube.com/embed/Gv1tPH08ciA?rel=0"
-        allowfullscreen></iframe>
+TODO: insert video player here once ready
 
 <iframe class="slides"
-        src="../slides/index.html?file=../slides/ensemble.md"></iframe>
+        src="../slides/index.html?file=../slides/boosting.md"></iframe>
 
 To navigate in the slides, **first click on the slides**, then:
 - press the **arrow keys** to go to the next/previous slide;

diff --git a/jupyter-book/ensemble/ensemble_quiz_m6_01.md b/jupyter-book/ensemble/ensemble_quiz_m6_01.md
@@ -1,12 +1,45 @@
 # ✅ Quiz M6.01
 
 ```{admonition} Question
-Select the correct answers:
+By default, a
+[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
+or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html)
+draw:
 
-- a) Both bagging and boosting are combining predictors
-- b) Both bagging and boosting are only working with decision trees
-- c) Boosting combines predictors sequentially
-- d) Bagging combines predictors simultaneously
+- a) random samples with replacement over training points
+- b) random samples with replacement over features
+- c) random samples without replacement over training points
+- d) random samples without replacement over features
+
+Hint: it is possible to access the documentation for those classes by
+clicking on the links on their names.
+```
+
++++
+
+```{admonition} Question
+In a
+[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
+or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html),
+the parameter `base_estimator` can be:
+
+- a) any predictor
+- b) a decision tree predictor
+- c) a linear model predictor
+```
+
++++
+
+```{admonition} Question
+
+In the context of a classification problem, what are the differences between a
+bagging classifier and a random forest classifier:
+
+- a) in a random forest, the base model is always a decision tree
+- b) in a random forest, the split threshold values are decided completely at
+  random
+- c) in a random forest, a random resampling is performed both over features
+  as well as over samples
 
 _Select several answers_
 ```
diff --git a/jupyter-book/ensemble/ensemble_quiz_m6_02.md b/jupyter-book/ensemble/ensemble_quiz_m6_02.md
@@ -1,43 +1,43 @@
 # ✅ Quiz M6.02
 
 ```{admonition} Question
-By default, a
-[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
-or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html)
-draw:
-
-- a) random samples with replacement over training points
-- b) random samples with replacement over features
-- c) random samples without replacement over training points
-- d) random samples without replacement over features
-
-Hint: it is possible to access the documentation for those classes by
-clicking on the links on their name.
+Select the correct statements:
+
+- a) Both bagging and boosting combine several predictors
+- b) Both bagging and boosting are based on decision trees
+- c) Boosting combines predictors sequentially
+- d) Bagging combines predictors simultaneously
+
+_Select several answers_
 ```
 
 +++
 
 ```{admonition} Question
-In a
-[`BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)
-or [`BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html),
-the parameter `base_estimator` is:
-
-- a) any predictor
-- b) a decision tree predictor
-- c) a linear model predictor
+Boosting algorithms learn their predictor:
+
+- a) by training predictors in parallel on slightly different datasets
+- b) by training predictors sequentially which correct previous prediction errors
+- c) by taking a linear combination of weak predictors
+
+_Select several answers_
 ```
 
 +++
 
 ```{admonition} Question
+Histogram gradient boosting is an accelerated gradient boosting algorithm that:
 
-In the context of a classification problem, what are the differences between a
-bagging classifier and a random forest classifier:
+- a) takes a subsample of the original samples
+- b) bins the numerical features
+- c) takes a subsample of the original features
+```
+
++++
+
+```{admonition} Question
+Boosting tends to overfit when increasing the number of predictors:
 
-- a) in a random forest, the base model is always a decision tree
-- b) in a random forest, the split threshold values are decided completely at
-  random
-- c) in a random forest, a random resampling is performed both over features
-  as well as over samples
+- a) true
+- b) false
 ```
diff --git a/jupyter-book/ensemble/ensemble_quiz_m6_03.md b/jupyter-book/ensemble/ensemble_quiz_m6_03.md
@@ -1,19 +1,35 @@
 # ✅ Quiz M6.03
 
 ```{admonition} Question
-Boosting algorithms are building a predictor:
+When compared to random forests, gradient boosting is usually trained using:
 
-- a) by training predictors in parallel on slightly different datasets
-- b) by training predictors sequentially which will correct errors successively
-- c) by taking a linear combination of weak predictors
+- a) shallower trees
+- b) deeper trees
+- c) a subset of features
+- d) all features
+
+_Select several answers_
 ```
 
 +++
 
 ```{admonition} Question
-Histogram gradient boosting is an accelerated gradient boosting algorithm that:
+Which of the hyperparameter(s) do not exist in random forest but exists in gradient boosting:
+
+- a) number of estimators
+- b) maximum depth
+- c) learning rate
+
+```
+
++++
+
+```{admonition} Question
+Which of the following options are correct about the benefits of ensemble models?
+
+- a) Better generalization performance
+- b) Reduced sensitivity to hyperparameter tuning of individual predictors
+- c) Better interpretability
 
-- a) takes a subsample of the original samples
-- b) bin the original dataset
-- c) take a subsample of the original features
+_Select several answers_
 ```
diff --git a/jupyter-book/ensemble/ensemble_quiz_m6_04.md b/jupyter-book/ensemble/ensemble_quiz_m6_04.md