### Solution

#### 1. Derive the Splitting Criterion for a Decision Tree

##### a. Gini Impurity Definition

The Gini Impurity is a metric used to evaluate the impurity of a node in a decision tree. For a node with $k$ classes $C_1, C_2, \dots, C_k$, the Gini Impurity is defined as:
$$
\text{Gini}(p) = 1 - \sum_{i=1}^k p_i^2
$$
where:
- $p_i$ is the proportion of observations in class $C_i$ at the node.

The Gini Impurity ranges between $0$ (perfect purity) and $1 - \frac{1}{k}$ (maximum impurity, when all classes are equally represented).

##### b. Gini Impurity for Splitting

When splitting a node into two child nodes $N_L$ and $N_R$, the total Gini Impurity after the split is a weighted average:
$$
\text{Gini}_{\text{split}} = \frac{n_L}{n} \text{Gini}(N_L) + \frac{n_R}{n} \text{Gini}(N_R)
$$
where:
- $n$: Total number of observations at the parent node.
- $n_L, n_R$: Number of observations in the left and right child nodes.
- $\text{Gini}(N_L)$: Gini Impurity of the left child node.
- $\text{Gini}(N_R)$: Gini Impurity of the right child node.

##### c. Splitting Criterion

To select the best split, we calculate $\text{Gini}_{\text{split}}$ for each possible feature and threshold, and choose the split that minimizes $\text{Gini}_{\text{split}}$.


#### 2. How Random Forest Improves Over a Single Decision Tree

##### a. Bootstrapping

Random Forests use **bootstrapping** to create multiple Decision Trees. In bootstrapping:
- Random subsets (with replacement) of the training data are used to train each tree.
- This introduces diversity among trees and reduces the risk of overfitting, as each tree sees a slightly different dataset.

##### b. Feature Bagging

Random Forests also use **feature bagging** (random feature selection):
- At each split, a random subset of features is considered for splitting rather than using all features.
- This ensures that individual trees do not over-rely on dominant features, enhancing diversity.

##### c. Advantages of Random Forest

- **Reduced Variance**: Combining predictions from multiple trees (via averaging for regression or majority voting for classification) reduces variability, leading to more stable predictions.
- **Reduced Overfitting**: Decision Trees can overfit to training data, especially when deep. By averaging over multiple trees, Random Forests mitigate this risk.
- **Better Generalization**: The combination of bootstrapping and feature bagging helps Random Forests generalize better to unseen data.

---

#### Comparison Summary

| **Aspect**               | **Decision Tree**                            | **Random Forest**                     |
|--------------------------|----------------------------------------------|---------------------------------------|
| **Model Type**           | Single Tree                                  | Ensemble of Trees                    |
| **Splitting Criterion**  | Gini Impurity or Entropy                     | Same as Decision Tree                |
| **Training Data**        | Full dataset                                 | Bootstrapped subsets                 |
| **Features for Splits**  | All features available                       | Random subset of features            |
| **Overfitting**          | High risk for deep trees                     | Reduced due to ensembling            |
| **Variance**             | High (model is sensitive to data changes)    | Lower (due to averaging predictions) |
| **Performance**          | Moderate                                    | Generally better due to ensemble     |

##### Random Forest achieves higher accuracy and robustness by combining multiple trees than a single Decision Tree.


<brb>

<brb>