# Gini Index in Decision Trees

The **Gini Index** is a metric used in decision tree algorithms (e.g., CART - Classification and Regression Trees) to measure the **impurity of a node**. It evaluates how well a node splits the dataset into distinct classes.

---

## Formula for Gini Index
The Gini Index for a node \( t \) is given by:

$$
G(t) = 1 - \sum_{i=1}^C p_i^2
$$

Where:
- \( C \): Number of classes in the dataset.
- \( p_i \): Proportion of samples in the node belonging to class \( i \).

---

## Examples
1. **Impure Node**:  
   If a node contains data points with the following distribution:
   - 50% belong to class A (\( p_A = 0.5 \))
   - 50% belong to class B (\( p_B = 0.5 \))

   Then, the Gini Index is:
   $$
   G(t) = 1 - (0.5^2 + 0.5^2) = 1 - (0.25 + 0.25) = 1 - 0.5 = 0.5
   $$

2. **Pure Node**:  
   If the node is pure (e.g., all data belongs to class A), then:
   $$
   G(t) = 1 - (1^2 + 0^2) = 1 - 1 = 0
   $$

---

## Weighted Gini Index for Splits
When a node splits into two child nodes, the Gini Index for the split is calculated as a **weighted average** of the Gini Indices of the child nodes:

$$
G_{\text{split}} = \frac{N_1}{N} G(t_1) + \frac{N_2}{N} G(t_2)
$$

Where:
- \( N \): Total number of samples in the parent node.
- \( $N_1$ \), \( $N_2$ \): Numbers of samples in the child nodes.
- \( G($t_1$) \), \( G($t_2$) \): Gini Indices of the child nodes.

---

## Decision Tree Splitting Criterion
During the construction of a decision tree:
1. The algorithm evaluates all possible splits.
2. The split that minimizes the Gini Index is selected.
3. This ensures that the resulting nodes are as pure as possible.

---

## Key Properties
1. **Range**:  
   - The Gini Index ranges from \( 0 \) (pure node) to $( 1 - \frac{1}{C} $) (maximum impurity for \( C \) classes).

2. **Interpretation**:  
   - \( G(t) = 0 \): All samples in the node belong to a single class (pure node).
   - Higher \( G(t) \): Indicates a more diverse distribution of classes.

---

## Comparison with Entropy
| Metric         | Gini Index                          | Entropy                              |
|----------------|-------------------------------------|--------------------------------------|
| **Formula**    | $( 1 - \sum p_i^2 )$               | $( -\sum p_i log_2(p_i) )$         |
| **Complexity** | Computationally simpler (no logs)  | More computationally expensive (logs) |
| **Sensitivity**| Sensitive to class imbalances      | More robust to smaller differences in class proportions |

---

## Conclusion
The **Gini Index** is a widely used metric in decision trees to measure node impurity. It is simple, efficient, and effective in ensuring optimal splits during tree construction.
