# Out-of-Bag (OOB) Evaluation üìä

## Core Concept üéØ

**Out-of-Bag (OOB) evaluation** is a clever validation technique that comes "free" with bootstrap-based ensemble methods like Random Forests. It uses the samples **not selected** during bootstrap sampling as a natural validation set.

> *"Getting validation for free - no need to split your data!"*

---

## Mathematical Foundation üìê

### Bootstrap Sampling Probability

For a dataset with $n$ samples, the probability that a specific sample is **NOT selected** in one bootstrap draw:

$$P(\text{not selected once}) = 1 - \frac{1}{n}$$

For $n$ bootstrap draws (with replacement):

$$P(\text{not selected in bootstrap}) = \left(1 - \frac{1}{n}\right)^n$$

### Asymptotic Behavior

As $n \to \infty$:

$$\lim_{n \to \infty} \left(1 - \frac{1}{n}\right)^n = \frac{1}{e} \approx 0.368$$

**Key Insight**: ~36.8% of samples are **out-of-bag** for each bootstrap sample!

---

## OOB Sets Definition üé≤

### For Each Bootstrap Sample $b$:

**In-Bag Set**: $\mathcal{I}_b = \{(x_i, y_i) : i \text{ selected in bootstrap } b\}$

**Out-of-Bag Set**: $\mathcal{O}_b = \mathcal{D} \setminus \mathcal{I}_b$

### For Each Original Sample $i$:

**OOB Trees for sample $i$**: $\mathcal{T}_i^{\text{OOB}} = \{b : i \notin \mathcal{I}_b\}$

Sample $i$ is **out-of-bag** for tree $b$ if $i \notin \mathcal{I}_b$

---

## OOB Prediction Process üîÑ

### Step-by-Step Algorithm

1. **Train Forest**: Build $B$ trees using bootstrap samples
2. **Identify OOB Trees**: For each sample $i$, find trees that didn't see it during training
3. **Make OOB Predictions**: Use only OOB trees to predict sample $i$
4. **Aggregate**: Combine OOB predictions

### Mathematical Formulation

For sample $i$, the **OOB prediction** is:

**Classification** (Majority Vote):
$$\hat{y}_i^{\text{OOB}} = \text{mode}\{h_b(x_i) : b \in \mathcal{T}_i^{\text{OOB}}\}$$

**Regression** (Average):
$$\hat{y}_i^{\text{OOB}} = \frac{1}{|\mathcal{T}_i^{\text{OOB}}|} \sum_{b \in \mathcal{T}_i^{\text{OOB}}} h_b(x_i)$$

**Probabilistic Classification**:
$$P(y_i = c)^{\text{OOB}} = \frac{1}{|\mathcal{T}_i^{\text{OOB}}|} \sum_{b \in \mathcal{T}_i^{\text{OOB}}} P_b(y_i = c | x_i)$$

---

## OOB Error Calculation üìè

### Classification Error

**OOB Accuracy**:
$$\text{OOB Accuracy} = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}[y_i = \hat{y}_i^{\text{OOB}}]$$

**OOB Error Rate**:
$$\text{OOB Error} = 1 - \text{OOB Accuracy} = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}[y_i \neq \hat{y}_i^{\text{OOB}}]$$

### Regression Error

**OOB Mean Squared Error**:
$$\text{OOB MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i^{\text{OOB}})^2$$

**OOB R¬≤ Score**:
$$\text{OOB } R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i^{\text{OOB}})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$

---

## Implementation Details üíª

### Scikit-learn Usage

```python
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

# Enable OOB scoring
rf = RandomForestClassifier(
    n_estimators=100,
    bootstrap=True,      # Required for OOB
    oob_score=True,      # Enable OOB calculation
    random_state=42
)

rf.fit(X, y)

# Access OOB score
oob_score = rf.oob_score_          # OOB accuracy (classification)
oob_predictions = rf.oob_prediction_  # Individual OOB predictions
```

### Requirements

| Parameter | Value | Why Required |
|-----------|-------|--------------|
| `bootstrap` | `True` | Need bootstrap sampling for OOB concept |
| `oob_score` | `True` | Enable OOB calculation |
| `n_estimators` | > 1 | Need multiple trees for meaningful OOB |

---

## OOB vs Traditional Validation üÜö

### Comparison Table

| Aspect | OOB Evaluation | Traditional Validation |
|--------|----------------|----------------------|
| **Data Usage** | Uses full dataset | Splits data (e.g., 80/20) |
| **Sample Size** | $n$ for training | Reduced training set |
| **Validation Size** | ~37% per tree | Fixed validation set |
| **Computational Cost** | Free during training | Requires separate evaluation |
| **Variance** | Higher (different OOB sets) | Lower (fixed validation) |
| **Bias** | Lower (more data) | Higher (less training data) |

### Mathematical Relationship

**Effective validation size per sample**:
$$\text{Expected OOB trees per sample} = B \times \frac{1}{e} \approx 0.368 \times B$$

---

## Advantages ‚úÖ

### üöÄ **Efficiency**
- No need to split training data
- Validation comes "for free" during training
- Maximizes use of available data

### üìä **Statistical Properties**
- Unbiased estimate of generalization error
- Uses more data for training than traditional validation
- Provides sample-specific uncertainty estimates

### üîß **Practical Benefits**
- Built-in model validation
- Useful for hyperparameter tuning
- Enables online learning curve monitoring

---

## Limitations ‚ùå

### üìâ **Statistical Issues**
- Higher variance than fixed validation set
- Dependent on bootstrap sampling randomness
- May be optimistic for some model types

### üéØ **Technical Constraints**
- **Requires bootstrap=True** (not all ensemble methods)
- **Not available for all algorithms** (specific to bagging-based methods)
- **Sample size dependent**: Less reliable for small datasets

### ‚ö†Ô∏è **Interpretation Challenges**
- Different effective validation set sizes per sample
- May not reflect true test performance perfectly
- Correlation between training and validation sets

---

## Advanced Applications üöÄ

### 1. **Feature Importance via OOB**

**Permutation Importance using OOB**:
$$\text{Importance}_j = \text{OOB Score}_{\text{original}} - \text{OOB Score}_{\text{permuted}_j}$$

### 2. **Hyperparameter Tuning**

```python
# Use OOB score for quick hyperparameter evaluation
def tune_with_oob(X, y, param_grid):
    best_score = 0
    best_params = None
    
    for params in param_grid:
        rf = RandomForestClassifier(oob_score=True, **params)
        rf.fit(X, y)
        
        if rf.oob_score_ > best_score:
            best_score = rf.oob_score_
            best_params = params
    
    return best_params, best_score
```

### 3. **Learning Curves**

Monitor OOB error as trees are added:
$$\text{OOB Error}_k = f(\text{first } k \text{ trees})$$

### 4. **Uncertainty Quantification**

**Sample-wise confidence**:
$$\text{Confidence}_i = \frac{|\mathcal{T}_i^{\text{OOB}}|}{B}$$

Samples with fewer OOB trees have higher uncertainty.

---

## Best Practices üìã

### ‚úÖ **When to Use OOB**
- **Quick model evaluation** without data splitting
- **Hyperparameter tuning** with limited data
- **Feature importance** calculation
- **Ensemble size selection** (monitoring OOB curve)

### ‚ö†Ô∏è **When to Avoid OOB**
- **Final model evaluation** (use separate test set)
- **Small datasets** (< 100 samples)
- **High-stakes decisions** (use robust cross-validation)
- **Model comparison** (use consistent validation strategy)

### üéØ **Implementation Tips**

1. **Always set random_state** for reproducible OOB scores
2. **Monitor OOB learning curves** to detect overfitting
3. **Use OOB for quick iteration**, proper validation for final evaluation
4. **Consider ensemble size**: More trees ‚Üí more stable OOB estimates
5. **Check OOB sample coverage**: Ensure all samples have sufficient OOB predictions

---

## Theoretical Insights üß†

### **Generalization Theory**

OOB error provides an **unbiased estimate** of the generalization error:

$$\mathbb{E}[\text{OOB Error}] = \mathbb{E}[\text{Test Error}]$$

### **Convergence Properties**

As $B \to \infty$:
$$\text{OOB Error} \xrightarrow{p} \text{True Generalization Error}$$

### **Relationship to Cross-Validation**

OOB evaluation is similar to **"built-in cross-validation"**:
- Each sample validated on ~37% of trees
- Different validation sets for each sample
- Averages across all samples for final estimate

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

In [2]:
df = pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [3]:
X = df.iloc[:,0:-1]
y = df.iloc[:,-1]

In [4]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [6]:
rf = RandomForestClassifier(oob_score=True, n_jobs=-1)

In [7]:
rf.fit(X_train,y_train)

In [8]:
rf.oob_score_*100

80.57851239669421

In [9]:
y_pred = rf.predict(X_test)
accuracy_score(y_test,y_pred)*100

85.24590163934425