Skip to content

Commit 15c6172

Browse files
authored
[doc] Improve the model introduction. (dmlc#10822)
1 parent 96bbf80 commit 15c6172

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

doc/tutorials/model.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Introduction to Boosted Trees
33
#############################
44
XGBoost stands for "Extreme Gradient Boosting", where the term "Gradient Boosting" originates from the paper *Greedy Function Approximation: A Gradient Boosting Machine*, by Friedman.
55

6-
The **gradient boosted trees** has been around for a while, and there are a lot of materials on the topic.
6+
The term **gradient boosted trees** has been around for a while, and there are a lot of materials on the topic.
77
This tutorial will explain boosted trees in a self-contained and principled way using the elements of supervised learning.
88
We think this explanation is cleaner, more formal, and motivates the model formulation used in XGBoost.
99

@@ -119,13 +119,16 @@ Let the following be the objective function (remember it always needs to contain
119119

120120
.. math::
121121
122-
\text{obj} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^t\omega(f_i)
122+
\text{obj} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{k=1}^t\omega(f_k)
123+
124+
in which :math:`t` is the number of trees in our ensemble.
125+
(Each training step will add one new tree, so that at step :math:`t` the ensemble contains :math:`K=t` trees).
123126

124127
Additive Training
125128
=================
126129

127130
The first question we want to ask: what are the **parameters** of trees?
128-
You can find that what we need to learn are those functions :math:`f_i`, each containing the structure
131+
You can find that what we need to learn are those functions :math:`f_k`, each containing the structure
129132
of the tree and the leaf scores. Learning tree structure is much harder than traditional optimization problem where you can simply take the gradient.
130133
It is intractable to learn all the trees at once.
131134
Instead, we use an additive strategy: fix what we have learned, and add one new tree at a time.
@@ -150,7 +153,7 @@ If we consider using mean squared error (MSE) as our loss function, the objectiv
150153

151154
.. math::
152155
153-
\text{obj}^{(t)} & = \sum_{i=1}^n (y_i - (\hat{y}_i^{(t-1)} + f_t(x_i)))^2 + \sum_{i=1}^t\omega(f_i) \\
156+
\text{obj}^{(t)} & = \sum_{i=1}^n (y_i - (\hat{y}_i^{(t-1)} + f_t(x_i)))^2 + \sum_{k=1}^t\omega(f_k) \\
154157
& = \sum_{i=1}^n [2(\hat{y}_i^{(t-1)} - y_i)f_t(x_i) + f_t(x_i)^2] + \omega(f_t) + \mathrm{constant}
155158
156159
The form of MSE is friendly, with a first order term (usually called the residual) and a quadratic term.

0 commit comments

Comments
 (0)