# Random Forest Hyperparameters 🎛️

## Core Tree Parameters 🌳

### **n_estimators** 
**Number of decision trees in the forest**

| Parameter | Default | Range | Impact |
|-----------|---------|-------|--------|
| **Value** | 100 | 10-2000+ | More trees = better performance but slower |

$$\text{Performance} \propto \log(\text{n\_estimators})$$

- **Low values (10-50)**: Fast but may underfit
- **High values (500-1000+)**: Better performance, diminishing returns
- **Sweet spot**: 100-500 for most problems

---

### **max_features** 
**Number of features considered for each split**

| Option | Formula | Best For | Mathematical Expression |
|--------|---------|----------|------------------------|
| `'sqrt'` | $\sqrt{p}$ | **Classification** | $m = \sqrt{p}$ |
| `'log2'` | $\log_2(p)$ | High-dimensional data | $m = \log_2(p)$ |
| `None` | $p$ | Small datasets | $m = p$ |
| `int` | Custom | Fine-tuning | $m = \text{specified}$ |
| `float` | $p \times \text{fraction}$ | Proportional control | $m = \lfloor p \times f \rfloor$ |

**Default**: 
- **Classifier**: `'sqrt'` 
- **Regressor**: `None` (uses all features)

---

### **max_depth**
**Maximum depth of individual trees**

$$\text{Depth} = \max\{\text{path from root to leaf}\}$$

| Setting | Value | Effect | Use Case |
|---------|-------|--------|----------|
| **None** | ∞ | Trees grow until pure/min_samples | Default, good starting point |
| **Shallow** | 3-10 | Prevents overfitting | Small datasets, noisy data |
| **Deep** | 15-30 | More complex patterns | Large, clean datasets |

**Relationship**: $\text{Max nodes} \leq 2^{\text{max\_depth}} - 1$

---

## Node Splitting Controls 🎯

### **min_samples_split**
**Minimum samples required to split an internal node**

$$\text{Split occurs only if } |S_{\text{node}}| \geq \text{min\_samples\_split}$$

| Value | Type | Effect | Best For |
|-------|------|--------|----------|
| **2** | Default | Maximum splitting | Large datasets |
| **5-20** | Conservative | Reduces overfitting | Medium datasets |
| **0.01-0.1** | Fraction | Proportional to dataset size | Variable dataset sizes |

---

### **min_samples_leaf**
**Minimum samples required in each leaf node**

$$\forall \text{ leaf } L: |L| \geq \text{min\_samples\_leaf}$$

| Value | Effect | Trade-off |
|-------|--------|-----------|
| **1** | Maximum granularity | May overfit |
| **5-50** | Smoother decision boundaries | Better generalization |
| **Fraction** | Dataset-proportional | Adaptive to data size |

---

### **min_weight_fraction_leaf**
**Minimum weighted fraction of samples in leaf**

$$\sum_{i \in \text{leaf}} w_i \geq \text{min\_weight\_fraction\_leaf} \times \sum_{i=1}^{n} w_i$$

- **Range**: [0.0, 0.5]
- **Use**: When samples have different importance weights

---

## Tree Structure Controls 🏗️

### **max_leaf_nodes**
**Maximum number of leaf nodes per tree**

$$|\{L : L \text{ is leaf}\}| \leq \text{max\_leaf\_nodes}$$

- **None**: No limit
- **Integer**: Explicit constraint
- **Effect**: Controls tree complexity directly

---

### **min_impurity_decrease**
**Minimum impurity decrease required for split**

$$\Delta I = I_{\text{parent}} - \frac{|S_L|}{|S|} I_L - \frac{|S_R|}{|S|} I_R \geq \text{threshold}$$

Where:
- $I$ = impurity measure (Gini/Entropy)
- $S_L, S_R$ = left and right child samples

---

## Randomness & Sampling 🎲

### **bootstrap**
**Whether to use bootstrap sampling**

| Setting | Sampling Method | Effect |
|---------|----------------|--------|
| **True** | With replacement | Standard Random Forest |
| **False** | Without replacement | Each tree sees all data |

$$\text{Bootstrap sample size} = n \text{ (with replacement)}$$

**OOB Error** only available when `bootstrap=True`

---

### **random_state**
**Controls randomness for reproducibility**

- **None**: Different results each run
- **Integer**: Fixed seed for reproducible results
- **Critical for**: Model comparison, debugging, production

---

### **oob_score**
**Whether to calculate Out-of-Bag score**

$$\text{OOB Score} = 1 - \frac{1}{n}\sum_{i=1}^{n} \mathbb{I}[y_i \neq \hat{y}_i^{\text{OOB}}]$$

- **Requires**: `bootstrap=True`
- **Benefit**: No need for validation set
- **Access via**: `model.oob_score_`

---

## Performance & Efficiency ⚡

### **n_jobs**
**Number of parallel jobs**

| Value | Meaning | Performance |
|-------|---------|-------------|
| **None/1** | Sequential | Slower but less memory |
| **-1** | All CPUs | Maximum speed |
| **Integer** | Specific CPU count | Balanced approach |

**Parallelization**: Trees are independent → perfect parallelization

---

### **verbose**
**Verbosity level during fitting**

- **0**: Silent
- **1**: Progress for each tree (if `n_jobs=1`)
- **2+**: More detailed output

---

## Classification-Specific 📊

### **criterion**
**Function to measure split quality**

| Criterion | Formula | Use Case |
|-----------|---------|----------|
| **'gini'** | $1 - \sum_{i=1}^{c} p_i^2$ | Default, faster |
| **'entropy'** | $-\sum_{i=1}^{c} p_i \log_2(p_i)$ | Information gain |
| **'log_loss'** | Logistic loss | Probability estimates |

---

### **class_weight**
**Weights for imbalanced classes**

| Option | Effect | Formula |
|--------|--------|---------|
| **None** | Equal weights | $w_i = 1$ |
| **'balanced'** | Inverse frequency | $w_i = \frac{n}{n_{\text{classes}} \times n_i}$ |
| **Dict** | Custom weights | User-defined |

---

## Regression-Specific 📈

### **criterion**
**Function to measure split quality**

| Criterion | Formula | Characteristics |
|-----------|---------|----------------|
| **'squared_error'** | $\sum (y - \bar{y})^2$ | Standard MSE |
| **'absolute_error'** | $\sum |y - \text{median}(y)|$ | Robust to outliers |
| **'friedman_mse'** | Modified MSE | Faster convergence |
| **'poisson'** | Poisson deviance | Count data |

$$\text{Friedman MSE} = \text{MSE} - \frac{(\sum y)^2}{n}$$

---

## Hyperparameter Tuning Strategy 🎯

### **Priority Order** (High to Low Impact)

1. **n_estimators** → Start with 100, increase until performance plateaus
2. **max_features** → Try `'sqrt'`, `'log2'`, `None`
3. **max_depth** → Try `None`, then 10, 20, 30
4. **min_samples_split** → Try 2, 5, 10, 20
5. **min_samples_leaf** → Try 1, 2, 5, 10

### **Grid Search Template**

```python
param_grid = {
    'n_estimators': [100, 200, 500],
    'max_features': ['sqrt', 'log2', None],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 5]
}
```

### **Quick Tuning Rules** 📝

| If your model is... | Adjust these parameters |
|-------------------|------------------------|
| **Overfitting** | ↓ `max_depth`, ↑ `min_samples_split/leaf` |
| **Underfitting** | ↑ `n_estimators`, ↓ `max_features` |
| **Too slow** | ↓ `n_estimators`, ↑ `max_depth` limit |
| **Memory issues** | ↓ `n_estimators`, ↑ `min_samples_leaf` |

---

## Advanced Tips 🚀

### **Feature Engineering Integration**
- Use **`max_features='sqrt'`** with many irrelevant features
- Use **`max_features=None`** with pre-selected features
- Consider **feature importance** for recursive feature elimination

### **Validation Strategy**
- Use **OOB score** for quick validation (`oob_score=True`)
- Combine with **cross-validation** for robust estimates
- Monitor **learning curves** to detect overfitting

### **Production Considerations**
- Set **`random_state`** for reproducible models
- Use **`n_jobs=-1`** for training, `n_jobs=1` for prediction
- Consider **model size** vs **prediction speed** trade-offs