# Random Forest
---
## 🔍 1. What is a Random Forest?

A **Random Forest** is an **ensemble learning method** that builds **multiple decision trees** and combines their outputs to improve performance and reduce overfitting.

* For **classification**, it takes the **majority vote**.
* For **regression**, it takes the **average** of predictions.

Think of it as "wisdom of the crowd" — lots of weak models (trees) give a strong prediction.

---

## 🌲 2. How Random Forest Works (Both Classifier & Regressor)

### Training Phase:

1. Draw **random bootstrap samples** from training data (sampling with replacement).
2. For each sample, grow a **Decision Tree**:

   * At each node, a **random subset of features** is considered (not all features).
   * Tree is grown to full depth or until stopping conditions.
3. Repeat for **N trees**.

### Prediction Phase:

* **Classification**: Each tree votes for a class → final prediction = majority vote.
* **Regression**: Each tree gives a number → final prediction = average of all outputs.

---

## 🤖 3. Random Forest Classifier

### Outcome:

* Predicted class labels
* Class probabilities (by vote frequency)

### Example Output:

```python
model.predict(X)       # e.g., ['cat', 'dog', 'dog']
model.predict_proba(X) # e.g., [[0.1, 0.9], [0.8, 0.2], [0.3, 0.7]]
```

---

## 🧮 4. Random Forest Regression — Math Intuition & Formulas

Let’s say we want to predict a continuous value $\hat{y}$ for input $\mathbf{x}$.

You grow **$T$** decision trees:

$$
\hat{y}_t = f_t(\mathbf{x}) \quad \text{for } t = 1 \text{ to } T
$$

### Final Prediction:

$$
\hat{y} = \frac{1}{T} \sum_{t=1}^T f_t(\mathbf{x})
$$

This is just the **average** of all predictions.

---

## 🧠 5. Random Forest — Key Ideas & Properties

### 🎲 Randomness from:

* **Bootstrap samples** (bagging)
* **Random feature selection** at each split

### ✅ Benefits:

* Handles high-dimensional data
* Reduces overfitting compared to single trees
* Works well for both classification and regression
* Handles missing data and maintains accuracy

### ❗ Drawbacks:

* Less interpretable than single trees
* Large number of trees = more computation
* May not extrapolate well in regression

---

## 📐 Important Hyperparameters

| Parameter           | Effect                                       |
| ------------------- | -------------------------------------------- |
| `n_estimators`      | Number of trees                              |
| `max_depth`         | Max depth of each tree                       |
| `max_features`      | Number of features to consider at each split |
| `min_samples_split` | Minimum samples to split an internal node    |
| `bootstrap`         | Whether to use bootstrap samples             |

---

## 📊 Metrics for Output Evaluation

### For Classifier:

* Accuracy
* Precision, Recall, F1-Score
* Confusion Matrix
* ROC-AUC

### For Regressor:

* Mean Squared Error (MSE):

  $$
  \text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2
  $$

* Mean Absolute Error (MAE)

* R² Score:

  $$
  R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
  $$

---


| Concept                         | Math Intuition / Formula                                                  |
| ------------------------------- | ------------------------------------------------------------------------- |
| **Prediction (regression)**     | $\hat{y} = \frac{1}{T} \sum f_t(x)$                                       |
| **Prediction (classification)** | $\hat{y} = \text{mode}(f_1(x), f_2(x), \dots)$                            |
| **Variance reduction**          | $\text{Var}[\hat{y}] = \rho \sigma^2 + \frac{1 - \rho}{T} \sigma^2$       |
| **Bias-Variance Tradeoff**      | Slightly higher bias than single tree, but much lower variance            |
| **Tree training**               | Greedy splits on bootstrapped data using random feature subsets           |
| **Why it generalizes well**     | Randomness + averaging → low correlation between trees → less overfitting |


![image.png](attachment:c21912cf-9195-4bca-9b83-20f0b69a7449.png)