## Overview

Ensemble learning is a technique that combines multiple base models to produce a stronger overall model. The main idea is that by combining the predictions of several models, the ensemble model can achieve better performance and generalization than any individual model.

## Import libraries

In [22]:
import numpy as np
from scipy.stats import mode

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.datasets import make_classification, make_regression

# Both regression and classification RandomForest model (Bagging)

## Random Forest

Random Forest is an ensemble learning method that operates by constructing a multitude of decision trees during training time and outputting either the **mode** of the classes (classification) or the **mean** of the predictions (regression) of the individual trees.

It combines the concepts of **bagging** and **feature randomness** to build a collection of **de-correlated decision trees**, whose predictions are then aggregated.

---

### Key Concepts of Random Forest

1. **Bootstrap Aggregation (Bagging)**:
   - Each decision tree is trained on a different subset of the training data, sampled **with replacement**.
   - This technique reduces **variance** and helps prevent **overfitting**.

2. **Feature Subsampling**:
   - At each split in a tree (or for each tree, depending on implementation), only a **random subset of features** is considered.
   - For classification, it is common to use:  
     $$ \text{feature\_size} = \lfloor \sqrt{d} \rfloor $$
   - For regression, a common choice is:  
     $$ \text{feature\_size} = \left\lfloor \frac{d}{3} \right\rfloor $$
   - Where \( d \) is the number of features in the dataset.

---

### Random Forest Algorithm

#### Training (fit)

Given a dataset (X, y):

For each of the T decision trees (i.e., the number of ensembles):

1. **Random Sampling**:
   - Select a random subset of samples from X, with replacement (bootstrap sample).
   - Randomly select a subset of features (without replacement).

2. **Training**:
   - Train a decision tree (classification or regression) on the sampled data and features.

The result is a collection of models:
$$ \{ (h_1, F_1), (h_2, F_2), \dots, (h_T, F_T) \} $$
Where:
- $h_i$ is the trained tree,
- $F_i \subset \{1, \dots, d\}$ is the subset of features used by that tree.

---

#### Prediction (predict)

To make a prediction for a new data point $\mathbf{x}$:

- For each trained tree $h_i$, predict using only the features in $F_i$.
- For classification:
  $$ \hat{y} = \text{mode}(\{ h_1(\mathbf{x}_{F_1}), \dots, h_T(\mathbf{x}_{F_T}) \}) $$
- For regression:
  $$ \hat{y} = \frac{1}{T} \sum_{i=1}^{T} h_i(\mathbf{x}_{F_i}) $$

---

### Advantages of Random Forest

- Handles both classification and regression tasks.
- Works well with both numerical and categorical data.
- Reduces overfitting compared to individual decision trees.
- Can handle large datasets efficiently.

### Limitations

- Less interpretable compared to a single decision tree.
- More computationally expensive (especially with many trees or high-dimensional data).
- Can be biased towards dominant classes in imbalanced classification tasks unless sampling is adjusted.

---


In [None]:
class RandomForest:
    """
    A RandomForest implementation for either classification or regression tasks.
    
    Attributes:
        n_ensembles (int): The number of decision trees in the forest.
        weak_learner (class): The type of weak learner, which is a decision tree.
        learner_type (str): Type of model ('classification' or 'regression').
        feature_size (int): Number of features to consider when building each tree.
        models (list): List of trained decision trees.
    """
    
    def __init__(self, n_ensembles=100, learner_type='classification'):
        """
        Initializes the RandomForest with the specified number of ensembles and learner type.
        
        Parameters:
            n_ensembles (int): The number of decision trees in the forest. Default is 100.
            learner_type (str): Type of model ('classification' or 'regression'). Default is 'classification'.
        """
        self.n_ensembles = n_ensembles
        self.weak_learner = None
        self.learner_type = learner_type
        self.feature_size = 0
        self.models = []
        self.set_weak_learner()

    def set_weak_learner(self):
        """
        Sets the weak learner to be used for training the forest.
        If the learner type is 'classification', uses DecisionTreeClassifier.
        If the learner type is 'regression', uses DecisionTreeRegressor.
        
        Raises:
            ValueError: If the learner type is neither 'classification' nor 'regression'.
        """
        if self.learner_type == 'classification':
            self.weak_learner = DecisionTreeClassifier
        elif self.learner_type == 'regression':
            self.weak_learner = DecisionTreeRegressor
        else:
            raise ValueError('Invalid learner type, use "classification" or "regression"')

    def fit(self, X: np.ndarray, y: np.ndarray):
        """
        Trains the RandomForest using bootstrapping and feature sampling for each ensemble.
        
        Parameters:
            X (np.ndarray): The input feature matrix.
            y (np.ndarray): The target values.
        
        The method trains `n_ensembles` number of weak learners (decision trees), each trained on
        a bootstrapped sample of the data with a random subset of features.
        """
        n_samples, n_features = X.shape
        if self.learner_type == 'classification':
            self.feature_size = np.floor(np.sqrt(n_features)).astype(int)
        elif self.learner_type == 'regression':
            self.feature_size = max(1, n_features // 3)

        for _ in range(self.n_ensembles):
            random_features = np.random.choice(n_features, size=self.feature_size, replace=False)
            bootstrapped_indices = np.random.choice(n_samples, size=n_samples, replace=True)
            X_data, y_data = X[bootstrapped_indices, :][:, random_features], y[bootstrapped_indices]

            model = self.weak_learner()
            model.fit(X_data, y_data)
            self.models.append((model, random_features))

    def predict(self, X: np.ndarray):
        """
        Predicts the output by aggregating the predictions of all weak learners (decision trees).
        
        Parameters:
            X (np.ndarray): The input feature matrix.
        
        Returns:
            np.ndarray: The predicted values for each sample.
            
        The method aggregates the predictions from all trees in the forest. For classification, it
        returns the majority vote. For regression, it returns the mean of the predictions.
        """
        weak_predicts = []
        for model, features in self.models:
            weak_predicts.append(model.predict(X[:, features]))

        predicits = np.array(weak_predicts)
        if self.learner_type == 'classification':
            return mode(predicits, axis=0).mode[0]
        return np.mean(predicits, axis=0)

# Boostings

## AdaBoost (Adaptive Boosting)

AdaBoost, short for **Adaptive Boosting**, is an ensemble learning technique that combines multiple **weak learners** to create a **strong classifier**. It works by training weak models sequentially, each one correcting the errors of the previous one through **reweighting** the training samples.

---

### Intuition Behind AdaBoost

- Emphasizes difficult samples that previous models misclassified.
- Reduces weight of samples correctly classified.
- Combines learners using weighted majority vote (for classification).

---

### Workflow of AdaBoost

#### Step-by-Step

Given a training set (X, y), where $y \in \{-1, +1\}$:

1. **Initialize sample weights**:  
   Each data point gets an equal weight:  
   $$ w_i^{(1)} = \frac{1}{N} $$

2. **Iterate for each weak learner** (total of \( M \) estimators):

   - Train a weak classifier $h_m(x)$ using the current sample weights $ w_i^{(m)} $.
   - Compute weighted error:
     $$
     \epsilon_m = \sum_{i=1}^{N} w_i^{(m)} \cdot \mathbb{1}(h_m(x_i) \ne y_i)
     $$
   - Compute learner weight (confidence):
     $$
     \alpha_m = \frac{1}{2} \ln\left(\frac{1 - \epsilon_m}{\epsilon_m}\right)
     $$
   - Update sample weights:
     $$
     w_i^{(m+1)} = w_i^{(m)} \cdot \exp(-\alpha_m y_i h_m(x_i))
     $$
   - Normalize weights so they sum to 1.

3. **Final prediction** is a **weighted majority vote**:
   $$
   H(x) = \text{sign}\left( \sum_{m=1}^{M} \alpha_m h_m(x) \right)
   $$

---

### Key Components in Code

- `sample_weight`: Gives importance to each sample.
- `error_m`: Measures how well a weak model performed.
- `alpha`: Influence of each model in the final prediction.
- `np.sign`: Final decision using majority vote weighted by alpha.

---

### Advantages of AdaBoost

- Often improves accuracy of weak learners.
- Less prone to overfitting than other ensemble methods.
- Works well even when weak learners are very simple (e.g., decision stumps).

### Limitations

- Sensitive to noisy data and outliers (due to exponential weighting).
- Slower than bagging-based models (due to sequential nature).
- Requires base learner that supports weighted samples (like `DecisionTreeClassifier` with `sample_weight`).

---

### Visual Overview

1. Initialize equal weights for all data points.  
2. Train a weak learner.  
3. Increase weights for misclassified samples.  
4. Add weak learner to ensemble with weight $\alpha$.  
5. Final prediction is sign of weighted sum of predictions.

---


In [None]:
class AdaBoost:
    """
    AdaBoost (Adaptive Boosting) implementation for classification tasks.
    
    AdaBoost is an ensemble method that combines multiple weak learners (typically decision trees) 
    to create a strong classifier by focusing on the mistakes made by previous models.
    
    Attributes:
        model_type (str): The type of base model, default is 'tree' (DecisionTreeClassifier).
        model (class): The weak learner model class, either DecisionTreeClassifier or another specified model.
        n_estimators (int): The number of base models to be trained, default is 100.
        weights (np.ndarray): The sample weights used for boosting.
        models (list): List of weak learners (trained models).
        alphas (list): The model coefficients for each weak learner.
    """
    
    def __init__(self, model_type='tree', n_estimators=100):
        """
        Initializes the AdaBoost ensemble model with the specified base model and number of estimators.
        
        Parameters:
            model_type (str): The type of model to use for weak learners. Defaults to 'tree' (DecisionTreeClassifier).
            n_estimators (int): The number of base learners to train. Defaults to 100.
        """
        self.model_type = model_type
        self.model = None
        self.n_estimators = n_estimators
        self.weights = None
        self.models = []
        self.alphas = []
        self.set_model()

    def set_model(self):
        """
        Sets the weak learner (base model) for boosting based on the specified model type.
        
        If `model_type` is 'tree', sets the model as `DecisionTreeClassifier`.
        """
        if self.model_type == 'tree':
            self.model = DecisionTreeClassifier

    def fit(self, X: np.ndarray, y: np.ndarray):
        """
        Trains the AdaBoost model using the specified data and labels.
        
        The method iteratively trains weak learners and adjusts the sample weights 
        to emphasize the misclassified samples, boosting the model's performance.
        
        Parameters:
            X (np.ndarray): The input feature matrix.
            y (np.ndarray): The target labels.
        
        The method updates the model weights and stores the weak learners and their associated alphas.
        """
        n_samples = X.shape[0]
        self.weights = np.ones(n_samples) / n_samples

        for _ in range(self.n_estimators):
            model = self.model()
            model.fit(X, y, sample_weight=self.weights)

            y_hat = model.predict(X)

            j_w = np.sum(self.weights * (y_hat != y))
            error_m = j_w

            # Calculate model weight (alpha)
            alpha = 0.5 * np.log((1 - error_m) / max(error_m, 1e-10))

            # Update the sample weights
            self.weights = self.weights * np.exp(-alpha * y_hat * y)
            self.weights /= np.sum(self.weights)

            # Store the model and its corresponding alpha value
            self.models.append(model)
            self.alphas.append(alpha)

    def predict(self, X: np.ndarray):
        """
        Makes predictions using the trained AdaBoost model by combining the predictions 
        of all weak learners weighted by their corresponding alphas.
        
        Parameters:
            X (np.ndarray): The input feature matrix.
        
        Returns:
            np.ndarray: The final predicted labels, obtained by taking the weighted majority vote.
        """
        predicts = np.array([model.predict(X) for model in self.models])
        return np.sign(np.array(self.alphas) @ predicts)