# Random Forest | [Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2042%20Random%20Forest)

Random Forest is an ensemble learning method used for classification and regression. It builds multiple decision trees and combines their outputs to make a more accurate and stable prediction.

## Key Concepts

1. **Ensemble Method**: Random Forest is an ensemble method because it combines predictions from multiple models (the decision trees) to create a stronger model.

2. **Bootstrapping**: Random Forest creates multiple decision trees by sampling subsets of the training data with replacement (bootstrapping). This means each tree is trained on a different subset of the data.

3. **Feature Selection**: At each node, a random subset of features is selected to split the data, introducing diversity among the trees and helping to reduce overfitting.

4. **Voting or Averaging**:
   - **For Classification**: The final prediction is based on the majority vote from all the trees. 
   - **For Regression**: The final prediction is the average of the predictions from all the trees.

## Formula for Random Forest

For **classification**:
```html
P(y|X) = Majority Voting from all trees
```

For **regression**:
```html
P(y|X) = Average of predictions from all trees
```

## Random Forest Algorithm Steps

1. **Create Bootstrapped Data**: Create multiple subsets of the original dataset using bootstrapping.
2. **Build Decision Trees**: Train a decision tree on each of the bootstrapped datasets.
3. **Combine Predictions**:
   - For classification, use majority voting from all trees.
   - For regression, use the average of all trees' predictions.

## Python Code Example

Here is an example of how to implement a Random Forest classifier in Python using the `sklearn` library:

```python
# Importing necessary libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Loading the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initializing and training the Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

# Making predictions
y_pred = rf_classifier.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Random Forest Classifier: {accuracy * 100:.2f}%")
```

## Advantages of Random Forest
- **Robust**: Random Forest is resistant to overfitting due to its ensemble approach.
- **Handles Missing Values**: It can handle missing data well by averaging the results across trees.
- **Versatile**: It can be used for both classification and regression problems.

## Disadvantages of Random Forest
- **Complexity**: The model can become very complex and harder to interpret due to the large number of trees.
- **Computationally Expensive**: Training many trees requires significant computational power, especially with large datasets.

## Hyperparameters to Tune
- `n_estimators`: Number of trees in the forest.
- `max_depth`: Maximum depth of each tree.
- `min_samples_split`: Minimum number of samples required to split an internal node.
- `min_samples_leaf`: Minimum number of samples required to be at a leaf node.