-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make docs regarding Random Forest and Ensebles more clear #945
Comments
I appreciate the comment, thanks @KronosTheLate edited Now you have me a little confused. 😕 What exactly do you mean by "random selection of features"? Do you mean at the level of nodes? In this case, if the individual trees provide this option (for example, you have the Or do you mean feature bagging (one subsample of features for each tree, further subsampled at nodes as above). The latter could be an option for EnsembleModel, but isn't. On the other hand, I'm not sure common random forest implementations include feature bagging in this latter sense (although tree boosting algorithms do). |
I do mean feature bagging. I did very much like your original suggestion:
The goal is simply to avoid having people think that an ensemble of trees is equivalent to a random forest. |
Mmm, yes, but I deleted my original suggestion because I the standard implementations (eg, ScikitLearn) only subsample features at the level of nodes, and not additionally at the level of trees. So they are not "feature baggers" in the sense of model-generic feature bagging. Or am I still missing something? |
That is, I would say that bagging No? |
I have to admit, I am also not sure. This is my first encounter with decision trees and random forests, this 5 ECTS elective I am taking. My understanding was that each tree only gets a subset of features that it is allowed to use for classification. I am not sure if that corresponds to fiddling with |
From https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/:
"Homogeneous Ensembles - for blending the predictions of multiple supervised models all of the same type, but which receive different views of the training data to reduce overall variance. The technique is known as observation bagging. Bagging decision trees, like a DecisionTreeClassifier, gives what is known as a random forest, although MLJ also provides several canned random forest models."
The quote suggests (to me at least) that an Ensemble of trees is equivalent to a random forest. However, from https://en.wikipedia.org/wiki/Random_forest#:~:text=Random%20forests%20or%20random%20decision,class%20selected%20by%20most%20trees.:
"An extension of the algorithm was developed by Leo Breiman[9] and Adele Cutler,[10] who registered[11] "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab, Inc.).[12] The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho[1] and later independently by Amit and Geman[13] in order to construct a collection of decision trees with controlled variance."
Of particular interest is "and random selection of features", which is not an option for Ensembles as far as I can tell. This difference between a random forest and ensemble of trees should be made more clear in the docs, or potentially the option to only classify by a random subset of features should be added to ensembles.
The text was updated successfully, but these errors were encountered: