# **AdaBoostRegressor Model Theory**


## Theory
AdaBoostRegressor (Adaptive Boosting Regressor) is a machine learning model that combines multiple weak regressors (typically decision trees) into a strong ensemble. It works by iteratively training regressors while adjusting the weights of training samples based on their prediction errors. Unlike gradient boosting, AdaBoost uses a weighted training set where misclassified samples get higher weights in subsequent iterations.

The model function for AdaBoostRegressor is a weighted sum of weak regressors:
$$ f(x) = \sum_{t=1}^{T} w_t h_t(x) $$
Where:
- $f(x)$ is the final prediction
- $T$ is the number of weak regressors
- $w_t$ is the weight assigned to regressor $t$
- $h_t(x)$ is the prediction of regressor $t$ for input $x$

## Model Training
### Forward Pass
During training, AdaBoostRegressor iteratively builds weak regressors (typically decision trees) where each regressor tries to correct the errors of the ensemble by focusing on the harder-to-predict samples through sample weighting.

### Loss Functions
AdaBoostRegressor supports three types of loss functions:
1. **Linear**: Uses linear loss for error calculation
2. **Square**: Uses squared error loss
3. **Exponential**: Uses exponential loss function

### AdaBoost.R2 Algorithm
AdaBoostRegressor implements the AdaBoost.R2 algorithm which follows these steps:
1. Initialize sample weights equally: $D_1(i) = 1/N$
2. For each iteration $t=1,\dots,T$:
   - Train weak regressor $h_t$ using weighted samples
   - Calculate individual error for each sample
   - Compute weighted error: $\epsilon_t = \sum_{i=1}^{N} D_t(i)E_t(i)$
   - Calculate regressor weight: $w_t = \log((1-\epsilon_t)/\epsilon_t)$
   - Update sample weights for next iteration

## Training Process
The training process involves these main steps:
1. **Initialize**: Start with equal weights for all training samples
2. **Iterate**:
   - Train a weak regressor on weighted training data
   - Calculate prediction errors and regressor weight
   - Update sample weights (increase weights of poorly predicted samples)
   - Add weighted regressor to ensemble
3. **Combine**: Final prediction is weighted median of all regressors

### Key Parameters
- `n_estimators`: Controls the number of weak regressors (default=50)
- `learning_rate`: Shrinks the contribution of each regressor (default=1.0)
- `loss`: Type of loss function to use ('linear', 'square', 'exponential')
- `estimator`: Base regressor to use (default=DecisionTreeRegressor(max_depth=3))

### Differences from Gradient Boosting
1. **Weight Updates**: AdaBoost updates sample weights, while gradient boosting fits to residuals
2. **Base Estimators**: AdaBoost typically uses shallow trees, while gradient boosting can use deeper trees
3. **Error Correction**: AdaBoost focuses on hard examples through weight updates, while gradient boosting directly optimizes the loss function
4. **Prediction**: AdaBoost uses weighted median for final prediction, while gradient boosting uses sum of predictions

## **Model Evaluation**

### 1. Mean Squared Error (MSE)

**Formula:**
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_{\text{true}_i} - y_{\text{pred}_i})^2
$$

**Description:**
- **Mean Squared Error (MSE)** is a widely used metric for evaluating the accuracy of regression models.
- It measures the average squared difference between the predicted values ($y_{\text{pred}}$) and the actual target values ($y_{\text{true}}$).
- The squared differences are averaged across all data points in the dataset.

**Interpretation:**
- A lower MSE indicates a better fit of the model to the data, as it means the model's predictions are closer to the actual values.
- MSE is sensitive to outliers because the squared differences magnify the impact of large errors.
- **Limitations:**
  - MSE can be hard to interpret because it is in squared units of the target variable.
  - It disproportionately penalizes larger errors due to the squaring process.

---

### 2. Root Mean Squared Error (RMSE)

**Formula:**
$$
\text{RMSE} = \sqrt{\text{MSE}}
$$

**Description:**
- **Root Mean Squared Error (RMSE)** is a variant of MSE that provides the square root of the average squared difference between predicted and actual values.
- It is often preferred because it is in the same unit as the target variable, making it more interpretable.

**Interpretation:**
- Like MSE, a lower RMSE indicates a better fit of the model to the data.
- RMSE is also sensitive to outliers due to the square root operation.
- **Advantages over MSE:**
  - RMSE provides a more intuitive interpretation since it is in the same scale as the target variable.
  - It can be more directly compared to the values of the actual data.

---

### 3. R-squared ($R^2$)

**Formula:**
$$
R^2 = 1 - \frac{\text{SSR}}{\text{SST}}
$$

**Description:**
- **R-squared ($R^2$)**, also known as the coefficient of determination, measures the proportion of the variance in the dependent variable ($y_{\text{true}}$) that is predictable from the independent variable(s) ($y_{\text{pred}}$) in a regression model.
- It ranges from 0 to 1, where 0 indicates that the model does not explain any variance, and 1 indicates a perfect fit.

**Interpretation:**
- A higher $R^2$ value suggests that the model explains a larger proportion of the variance in the target variable.
- However, $R^2$ does not provide information about the goodness of individual predictions or whether the model is overfitting or underfitting.
- **Limitations:**
  - $R^2$ can be misleading in cases of overfitting, especially with polynomial regression models. Even if $R^2$ is high, the model may not generalize well to unseen data.
  - It doesn’t penalize for adding irrelevant predictors, so adjusted $R^2$ is often preferred for models with multiple predictors.

---

### 4. Adjusted R-squared

**Formula:**
$$
\text{Adjusted } R^2 = 1 - \left(1 - R^2\right) \frac{n-1}{n-p-1}
$$
where \(n\) is the number of data points and \(p\) is the number of predictors.

**Description:**
- **Adjusted R-squared** adjusts the R-squared value to account for the number of predictors in the model, helping to prevent overfitting when adding more terms to the model.
- Unlike $R^2$, it can decrease if the additional predictors do not improve the model significantly.

**Interpretation:**
- A higher adjusted $R^2$ suggests that the model is not just overfitting, but has genuine explanatory power with the number of predictors taken into account.
- It is especially useful when comparing models with different numbers of predictors.

---

### 5. Mean Absolute Error (MAE)

**Formula:**
$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_{\text{true}_i} - y_{\text{pred}_i}|
$$

**Description:**
- **Mean Absolute Error (MAE)** measures the average of the absolute errors between the predicted and actual values.
- Unlike MSE and RMSE, MAE is not sensitive to outliers because it does not square the errors.

**Interpretation:**
- MAE provides a straightforward understanding of the average error magnitude.
- A lower MAE suggests better model accuracy, but it may not highlight the impact of large errors as much as MSE or RMSE.

## sklearn template [scikit-learn: AdaBoostRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html)

### class sklearn.ensemble.AdaBoostRegressor(estimator=None, *, n_estimators=50, learning_rate=1.0, loss='linear', random_state=None)

| **Parameter**        | **Description**                                                                                          | **Default**  |
|---------------------|----------------------------------------------------------------------------------------------------------|--------------|
| `estimator`         | Base estimator from which boosted ensemble is built. If None, uses DecisionTreeRegressor(max_depth=3)     | `None`       |
| `n_estimators`      | Maximum number of estimators at which boosting is terminated. Values must be in range [1, inf)            | `50`         |
| `learning_rate`     | Weight applied to each regressor. Higher rate increases each regressor's contribution. Range (0.0, inf)    | `1.0`        |
| `loss`              | Loss function to use when updating weights after each boosting iteration                                   | `'linear'`   |
| `random_state`      | Controls random seed for estimator boosting iterations and weight bootstrapping                           | `None`       |

---

| **Attribute**           | **Description**                                                                                        |
|------------------------|--------------------------------------------------------------------------------------------------------|
| `estimator_`           | The base estimator from which the ensemble is grown                                                    |
| `estimators_`          | The collection of fitted sub-estimators                                                                |
| `estimator_weights_`   | Weights for each estimator in the boosted ensemble                                                     |
| `estimator_errors_`    | Regression error for each estimator in the boosted ensemble                                            |
| `feature_importances_` | The impurity-based feature importances                                                                 |
| `n_features_in_`       | Number of features seen during fit                                                                     |
| `feature_names_in_`    | Names of features seen during fit (if X has feature names)                                             |

---

| **Method**             | **Description**                                                                                        |
|------------------------|--------------------------------------------------------------------------------------------------------|
| `fit(X, y)`           | Build a boosted regressor from the training set (X, y)                                                 |
| `predict(X)`          | Predict regression value for X using weighted median prediction                                         |
| `score(X, y)`         | Return the coefficient of determination R²                                                             |
| `staged_predict(X)`   | Return staged predictions after each boosting iteration                                                |
| `staged_score(X, y)`  | Return staged scores after each boosting iteration                                                     |
| `get_params()`        | Get parameters of this estimator                                                                       |
| `set_params(**params)`| Set parameters of this estimator                                                                       |


# XXXXXXXX regression - Example

## Data loading

##  Data processing

## Plotting data

## Model definition

## Model evaulation