ed-dash comments #26

alanocallaghan · 2024-03-27T12:27:53Z

base_estimator -> estimator in recent sklearn

In the random forest page, we specify max_features=1 but the decision boundaries are all bivariate. This makes for a very confusing introduction to random forests
https://carpentries-incubator.github.io/machine-learning-trees-python/06-random-forest/index.html

The text was updated successfully, but these errors were encountered:

tompollard · 2024-03-27T13:13:21Z

I've found this lesson to work well with just two features, but I do play around with some of the parameters to demonstrate what is happening. These should be captured in the materials, so I'll try to make some updates to explain things more clearly.

alanocallaghan · 2024-03-27T13:15:08Z

What I mean is that if we're fitting a random forest to two variables, then I'd expect the feature subsampling to produce trees with one feature, otherwise it's just a regular tree ensemble

tompollard · 2024-03-27T15:12:36Z

What I mean is that if we're fitting a random forest to two variables, then I'd expect the feature subsampling to produce trees with one feature, otherwise it's just a regular tree ensemble

One of the nice things about dealing with only two variables is that we can demonstrate that this expectation is not true for random forests (at least for this particular implementation).

If it was true that setting max_features=1 as an argument led to trees with a single variable, we would not see the following trees (which all make decisions based on both variables).

The explanation is that features are being limited at each split, not at the model level:

alanocallaghan · 2024-03-27T15:14:32Z

Ah. In that case it'd be good to explain that in the lesson

@alanocallaghan

@alanocallaghan points out that the max_features is confusing for Random Forests. Why does a Random Forest with max_features=1 still result in sub-trees that make decisions based on >1 feature? The explanation is that the max_feature argument is applied at the split level, not the tree level.

tompollard · 2024-03-27T16:27:58Z

@alanocallaghan Please could you take a look at #27 and let me know if this resolves the issue?

Explain the purpose of max_features for Random Forests. Closes #26.

This was referenced Mar 27, 2024

Explain the purpose of max_features for Random Forests. Closes #26. #27

Merged

base_estimator -> estimator in recent sklearn #28

Open

tompollard closed this as completed in 02430e2 Mar 27, 2024

tompollard added a commit that referenced this issue Mar 27, 2024

Merge pull request #27 from carpentries-incubator/tp/issue_26

c076fdd

Explain the purpose of max_features for Random Forests. Closes #26.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ed-dash comments #26

ed-dash comments #26

alanocallaghan commented Mar 27, 2024

tompollard commented Mar 27, 2024

alanocallaghan commented Mar 27, 2024

tompollard commented Mar 27, 2024 •

edited

alanocallaghan commented Mar 27, 2024

tompollard commented Mar 27, 2024

ed-dash comments #26

ed-dash comments #26

Comments

alanocallaghan commented Mar 27, 2024

tompollard commented Mar 27, 2024

alanocallaghan commented Mar 27, 2024

tompollard commented Mar 27, 2024 • edited

alanocallaghan commented Mar 27, 2024

tompollard commented Mar 27, 2024

tompollard commented Mar 27, 2024 •

edited