# Random Forest
#### Random Forest is an ensemble learning algorithm that:
#### Combines multiple decision trees using bagging + random feature selection.

#### It is used for:
<ul>
    <li>Classification</li>
    <li>Regression</li>
</ul>

---

## How Random Forest Works
#### Random Forest builds many decision trees using:
#### Step 1: Bootstrap Sampling (Bagging)
<ul>
    <li>Random subset of training data (with replacement)</li>
</ul>

#### Step 2: Random Feature Selection
<ul>
    <li>At each split, only a random subset of features is considered</li>
</ul>

#### Step 3: Aggregation
<ul>
    <li>Classification → Majority voting</li>
    <li>Regression → Averaging</li>
</ul>

---

## Differences between bagging and random forest
<ul>
    <li>Random forest uses only decision tree as base model whereas bagging can use any classifier.</li>
    <li> Node level feature(column) sampling in random forest where as Tree level feature sampling in bagging.</li>
</ul>


---

## Hyperparameter Tuning in Random Forest using GridSearchCV and RandomizedSearchCV

### Why Hyperparameter Tuning is Needed?
#### Random Forest has many hyperparameters like:
<ul>
    <li>n_estimators</li>
    <li>max_depth</li>
    <li>max_features</li>
    <li>min_samples_split</li>
    <li>min_samples_leaf</li>
</ul>

#### Default values work well, but:
#### Proper tuning improves performance and reduces overfitting.

---

## Hyperparameter Tuning with GridSearchCV
#### Tries ALL possible combinations of given hyperparameters.
#### It performs:
<ul>
    <li>Cross-validation</li>
    <li>For every combination</li>
    <li>Selects best one</li>
</ul>

### How It Works
#### If:
```
n_estimators = [100, 200]
max_depth = [5, 10]
```
#### Total combinations:
``` 2 × 2 = 4 models ```
#### Each model evaluated with cross-validation.

### Code Example (GridSearchCV)
```
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

rf = RandomForestClassifier(random_state=42)

param_grid = {
    'n_estimators': [100, 200, 300],  # no. of trees in forest
    'max_depth': [None, 5, 10],
    'max_features': ['sqrt', 'log2'],
    'min_samples_split': [2, 5]
}

grid = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    cv=5,  # 5-fold cross validation
    scoring='accuracy',
    n_jobs=-1
)

grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Score:", grid.best_score_)
```

---

## RandomizedSearchCV
#### Instead of trying ALL combinations:
#### Randomly samples a fixed number of parameter combinations.

### Why Use It?
#### When:
<ul>
    <li>Many hyperparameters</li>
    <li>Large dataset</li>
    <li>Large parameter ranges</li>
</ul>

### Code Example (RandomizedSearchCV)
```
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint

rf = RandomForestClassifier(random_state=42)

param_dist = {
    'n_estimators': randint(100, 500),  
    # Number of trees in the forest (random integer between 100 and 500)
    
    'max_depth': randint(3, 20),  # controls overfitting
    
    'min_samples_split': randint(2, 10),  
    # Minimum samples required to split a node
    
    'min_samples_leaf': randint(1, 5),  
    # Minimum samples required at a leaf node
    
    'max_features': ['sqrt', 'log2', None]  
    # Number of features considered at each split
    # 'sqrt' = sqrt(total features)
    # 'log2' = log2(total features)
    # None = use all features
}

# Create RandomizedSearchCV object
random_search = RandomizedSearchCV(
    estimator=rf,  
    param_distributions=param_dist,  
    n_iter=20,   # Number of random parameter combinations to try
    cv=5,  # 5-fold cross-validation
    scoring='accuracy',  # Metric used to evaluate performance
    random_state=42,   # Reproducibility
     n_jobs=-1   # Use all CPU cores for faster computation
)

# Perform hyperparameter tuning
random_search.fit(X_train, y_train)


# Print best parameter combination found
print("Best Parameters:", random_search.best_params_)

# Print best cross-validation accuracy score
print("Best Score:", random_search.best_score_)

```

---

## OOB (Out-Of-Bag)
#### OOB score is an internal validation method used in:
<ul>
    <li>Bagging</li>
    <li>Random Forest</li>
    <li>Extra Trees</li>
</ul>

#### It estimates: Model performance on unseen data without using a separate validation set.
#### How to Use OOB in sklearn
```
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(
    n_estimators=200,
    bootstrap=True,
    oob_score=True,
    random_state=42
)

rf.fit(X_train, y_train)

print("OOB Score:", rf.oob_score_)

```