# Feature Importance Calculation in Random Forests

In a **Random Forest**, feature importance measures how much each feature contributes to reducing the overall prediction error (or impurity) **across all trees** in the forest.

Each individual decision tree computes its own feature importances based on impurity reduction (like Gini, Entropy, or MSE).  
The Random Forest then **averages these importances** over all trees to obtain the final score.

###  Impurity Measures Used
- **Classification Random Forest:** uses **Gini impurity** or **entropy**  
- **Regression Random Forest:** uses **Mean Squared Error (MSE)**  

The importance of a feature reflects **how much it decreases impurity on average** across the ensemble.

## Formula (General Form)

For every node $(j)$ in each tree $(t)$ of the forest:

$$
\text{Importance of node } j^t = w_j^t \times \Delta I_j^t
$$

where:

- $w_{j}^t$ = number of samples reaching node $(j)$ in tree $(t)$
- $\Delta I_{j}^t = I_{\text{parent}}^t - (p_L^t I_L^t + p_R^t I_R^t)$
  - $I_{\text{parent}}^t$ = impurity before split  
  - $I_L^t, I_R^t$ = impurity of left and right child nodes  
  - $p_L^t, p_R^t$ = proportion of samples going left/right

Then, for each **feature $(f)$** in a single tree $(t)$:

$$
\text{Feature importance for feature } f \text{ in tree } t =
\sum_{j \, \text{where split uses } f}
w_j^t \times \Delta I_j^t
$$

## Aggregating Across All Trees

The Random Forest computes the **mean importance** of each feature across all trees:

$$
\text{Feature importance}(f) =
\frac{1}{T} \sum_{t=1}^{T}
\left(
\sum_{j \, \text{where split uses } f}
w_j^t \times \Delta I_j^t
\right)
$$

where $(T)$ = total number of trees in the forest.

Finally, normalize so that all importances sum to 1:

$$
\text{Normalized importance}(f) =
\frac{\text{Feature importance}(f)}
{\sum_k \text{Feature importance}(k)}
$$

In [60]:
import matplotlib.pyplot as plt

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

In [61]:
X,y = make_classification(n_samples=5, n_classes=2,
                               n_features=2, n_informative=2, n_redundant=0,
                               random_state=0)

In [62]:
rf = RandomForestClassifier(n_estimators=3)
rf.fit(X,y)

In [63]:
rf.feature_importances_

array([0.66666667, 0.33333333])

In [64]:
rf.estimators_

[DecisionTreeClassifier(max_features='sqrt', random_state=2142954168),
 DecisionTreeClassifier(max_features='sqrt', random_state=1132220420),
 DecisionTreeClassifier(max_features='sqrt', random_state=204339100)]

In [65]:
# Feature importance of 1st Decision Tree
rf.estimators_[0].feature_importances_

array([0., 1.])

In [66]:
# Feature importance of 2nd Decision Tree
rf.estimators_[1].feature_importances_

array([1., 0.])

In [67]:
# Feature importance of 3rd Decision Tree
rf.estimators_[2].feature_importances_

array([1., 0.])

In [None]:
print(2/3)

0.6666666666666666


In [70]:
print(1/3)

0.3333333333333333
