Random forests are a way to make decision trees more powerful and more reliable by combining many of them and adding extra randomness.



### What a random forest is

- A **random forest** is an **ensemble** of many decision trees. [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)
- It uses **bagging** (bootstrap aggregation): each tree is trained on a different bootstrap sample of the rows. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)
- It can be used for **classification** (predicting categories) and **regression** (predicting numbers). [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/random-forest-hyperparameter-tuning-in-python/)
- At prediction time, all trees make a prediction and the forest **combines** them:
  - Classification: usually majority vote over tree predictions. [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)
  - Regression: usually average of tree predictions. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)

So instead of trusting a single tree, you ask many trees and aggregate their opinions.



### Why we need extra randomness (correlation problem)

Bagging already makes trees less similar by changing which rows they see, but all trees:

- Are trained on bootstrap samples from the **same original dataset**,  
- Use the **same full set of features** when deciding splits (if we do plain bagging). [stackoverflow](https://stackoverflow.com/questions/23939750/understanding-max-features-parameter-in-randomforestregressor)

This means:

- Bootstrap samples heavily **overlap**, so many trees repeatedly see the same influential examples.  
- If some features are much more informative than others, **every tree** will tend to use the same top features at the top splits. [stackoverflow](https://stackoverflow.com/questions/23939750/understanding-max-features-parameter-in-randomforestregressor)

As a result, the trees are still **correlated** (they tend to make similar mistakes), which limits how much variance reduction we get from averaging them. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)

Random forests directly attack this by adding **feature randomness**.



### Key idea: random feature selection at each split

In a normal decision tree:

- At each node, the algorithm looks at **all features**, checks all possible splits, and chooses the feature + threshold that gives the best impurity reduction. [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)

In a **random forest**:

- At each node, the algorithm only looks at a **random subset of the features**, not all of them. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
- It then finds the best split **within that subset**. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

This is controlled by the parameter often called **`max_features`**:  

- `max_features` = how many features are randomly chosen and considered at **each split**. [stackoverflow](https://stackoverflow.com/questions/23939750/understanding-max-features-parameter-in-randomforestregressor)
- Example: if there are 64 features and `max_features=10`, then at each node the algorithm randomly picks 10 features and chooses the best split among those 10. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

Effects:

- Some trees are forced to use features that a standard tree would almost **never** pick, because strong features may be temporarily hidden from view at a node. [stackoverflow](https://stackoverflow.com/questions/23939750/understanding-max-features-parameter-in-randomforestregressor)
- This makes individual trees **weaker** (less optimized) but the **ensemble more diverse**. [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)
- Higher diversity → lower correlation between trees → better variance reduction when we aggregate. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)

This is the core “random” part of random forests: randomness in **rows** (bootstrap) and **columns** (features per split).



### Important hyperparameters in random forests

A random forest implementation (like `RandomForestClassifier` in scikit‑learn) exposes two kinds of settings: [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/random-forest-hyperparameter-tuning-in-python/)

1. **Tree‑level parameters** (how each tree is built):
   - `max_depth`: maximum depth of each tree. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/random-forest-hyperparameter-tuning-in-python/)
   - `criterion`: how to measure impurity (e.g., `gini`, `entropy`). [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
   - `max_leaf_nodes`, `min_samples_split`, etc., controlling complexity. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/random-forest-hyperparameter-tuning-in-python/)

2. **Forest‑level parameters** (how the ensemble behaves):
   - `n_estimators`: number of trees in the forest. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/random-forest-hyperparameter-tuning-in-python/)
   - `max_features`: number of features considered at each split. [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)
   - `oob_score`: whether to compute out‑of‑bag evaluation. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/oob-errors-for-random-forests-in-scikit-learn/)

You train and predict with familiar methods:

- `fit(X, y)` to train. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
- `predict(X)` to get predictions. [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)



### How the number of trees affects performance

When you increase **`n_estimators`** (the number of trees): [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)

- At first, performance improves quickly, because averaging more trees reduces variance. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)
- After some point, the curve **levels off**: adding more trees brings almost no gain but costs more time and memory. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)

So in practice:

- You pick `n_estimators` large enough that performance is stable (for example, where a validation curve or OOB score flattens out), but not so large that computation becomes a problem. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/random-forest-hyperparameter-tuning-in-python/)



### Out‑of‑bag (OOB) score in random forests

Because each tree uses a **bootstrap sample**, on average about 63% of training points are in‑bag for that tree and about 37% are **out‑of‑bag**. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)

Random forests can use these out‑of‑bag points as a built‑in validation set:

- For each training sample, consider only the trees for which that sample was **not** used in training. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/oob-errors-for-random-forests-in-scikit-learn/)
- Aggregate their predictions and compare to the true label.  
- This gives an **OOB score** (or OOB error) that estimates performance on unseen data. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/oob-errors-for-random-forests-in-scikit-learn/)

Advantages:

- You can use **all data for training** and still get an unbiased performance estimate. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/oob-errors-for-random-forests-in-scikit-learn/)
- You don’t need a separate validation set just to tune basic forest parameters. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/oob-errors-for-random-forests-in-scikit-learn/)



### Effect of `max_features` and diversity vs strength

`max_features` controls the diversity of trees: [stackoverflow](https://stackoverflow.com/questions/23939750/understanding-max-features-parameter-in-randomforestregressor)

- **Using all features** (large `max_features`):
  - Each split is as strong as possible, because the tree always picks the best feature among all.  
  - Trees resemble each other more, so they are **more correlated**.  

- **Using fewer features**:
  - Each tree is a bit weaker (it sometimes misses the globally best feature).  
  - Trees differ more from one another, so the forest becomes **more diverse**.  

Empirically:

- With **very few trees**, using all features often works better because you want each tree as strong as possible. [geeksforgeeks](https://www.geeksforgeeks.org/machine-learning/random-forest-hyperparameter-tuning-in-python/)
- With **many trees**, using a smaller `max_features` can give **higher final accuracy**, because the boost from diversity outweighs the loss in single‑tree strength. [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)

Common defaults:

- For **classification**, `max_features="sqrt"` (square root of number of features). [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)
- For **regression**, `max_features` often defaults to using all features or a large fraction. [stackoverflow](https://stackoverflow.com/questions/23939750/understanding-max-features-parameter-in-randomforestregressor)



### Feature importance in random forests

Random forests can estimate how **important** each feature is for making predictions. [stackoverflow](https://stackoverflow.com/questions/33837125/interpreting-feature-importance-values-from-a-randomforestclassifier)

A common built‑in measure (scikit‑learn’s default):

- For every split in every tree, record how much that split **reduces impurity** (e.g., Gini or entropy) and how many samples pass through that node. [scikit-learn](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)
- For each feature, **sum up** the impurity reductions for all splits that use that feature, across all trees. [scikit-learn](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)
- Normalize these sums so that feature importances add up to 1. [scikit-learn](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)

Interpretation:

- A feature with **high importance** contributes a large share of the impurity reduction across the forest; the model relies on it heavily. [stackoverflow](https://stackoverflow.com/questions/33837125/interpreting-feature-importance-values-from-a-randomforestclassifier)
- A feature with **very low importance** either:
  - Is not useful, or  
  - Is redundant with other, more dominant features (highly correlated). [explained](https://explained.ai/rf-importance/)

This lets you:

- See which inputs matter most.  
- Potentially drop very unimportant features to simplify the model. [towardsdatascience](https://towardsdatascience.com/interpreting-random-forests-638bca8b49ea/)



### Putting it all together

- Random forests = many **decision trees** + **bootstrap sampling of rows** + **random selection of features at each split**. [digitalocean](https://www.digitalocean.com/community/tutorials/random-forest-in-machine-learning)
- They reduce **variance** through averaging and improve robustness by making trees **less correlated**. [mbrenndoerfer](https://mbrenndoerfer.com/writing/random-forest-ensemble-learning-bootstrap-sampling-feature-selection-classification-regression-guide)
- They provide handy tools like **OOB scoring** (built‑in validation) and **feature importance** (which features matter most). [scikit-learn](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)
