# Random Forest

- Random forest is an ensemble learning method that combines many decision trees to make a final prediction.

- Each decision tree is trained on a random subset of data and features, which increases diversity and reduces overfitting.

- For classification, the final output is chosen by majority voting among all the trees.

- For regression, the final prediction is the average of all tree predictions.

- Random forest is more accurate and stable than a single decision tree because it reduces variance.

## 1. Important Terms in Random Forest 

1. **Root Node**
- The starting point of the tree where the entire training data is given. The first split happens at this node.

2. **Splitting**
- The process of dividing data into smaller groups using conditions. Methods like Gini and Entropy are used to select the best split.

3. **Decision Nodes**
- Nodes that appear after splitting and lead the path toward leaf nodes. They contain further conditions based on features.

4. **Leaf Nodes (Terminal Nodes)**
- These are the endpoints of the tree where no further splitting occurs. Final predictions are made here.

5. **Random Forest Context**
- In a random forest, many such trees are created and each tree uses these nodes. The final prediction is based on the combined decision of multiple trees.

## 2. Working of Random Forest

1. **Random Sampling with Replacement**
- Each tree in the random forest is trained on a bootstrap sample, meaning a random subset of the dataset is selected with replacement.

2. **Feature Selection for Splits**
- At each split in a tree, only a random subset of features is considered. This reduces correlation between trees and improves accuracy.

3. **Best Split Selection**
- Each decision tree chooses the best split using measures like Gini Impurity or Information Gain to separate classes effectively.

4. **Bootstrap Aggregation (Bagging)**
- Random forest is an ensemble technique that uses bagging. Each tree makes a prediction and the final output is based on majority vote or averaging.

5. **Improved Stability and Accuracy**
- Combining multiple trees reduces variance and makes the model more stable compared to a single decision tree.

### 2.1 Feature Selection in Random Forest:

**Classification Problems**
- Random forest selects features by default using the square root of the total number of features.
For example, if there are 16 features, only sqrt(16) which is 4 features are considered at each split.

**Regression Problems**
- Random forest selects features by default using one third of the total number of features.
For example, if there are 15 features, only 15 divided by 3 which is 5 features are considered at each split.

**Purpose of Random Feature Selection**
- This randomness reduces correlations between trees and makes the ensemble more robust and accurate.

### 2.2 Bootstrap Aggregation (Bagging):

1. **Multiple Decision Trees**
- Random forest builds many decision trees. Each tree is trained on a different bootstrap sample, which means a random sample of the dataset taken with replacement.

2. **Different Splits and Paths**
- Because each tree gets different data, the splits chosen by the trees are different. This creates diversity in the model.

3. **Prediction from Each Tree**
- For a new input, every tree in the forest predicts a class label.
Example shown: one tree predicts Chinstrap, another predicts Adelie, and others may also vote differently.

4. **Majority Voting**
- The final prediction is based on majority vote among all the trees.
Example: If 2 trees predict Chinstrap and 1 predicts Adelie, the output becomes Chinstrap.

5. **Why Bagging Works**
- It reduces variance and prevents overfitting by averaging the predictions of multiple diverse trees.
The ensemble becomes more accurate and stable than any single decision tree.

### 2.3 Splitting Methods

**Gini Impurity**

- Measures how often a randomly chosen sample would be misclassified.

- Value ranges from 0 to 1, where 0 means perfectly pure and 1 means highly impure.

- Lower Gini is preferred for splits.

**Information Gain**

- Selects the feature that gives the most information about the class.

- Calculated using entropy before and after the split.

- Higher information gain means a better split.

**Entropy**

- Measures randomness or uncertainty in the data.

- High entropy means mixed classes; low entropy means purer nodes.

- Used in calculating information gain.

**How Splitting Works**

- At each node, the algorithm evaluates all features using Gini or Information Gain.

- The feature and threshold giving the best purity improvement is selected for the split.