# Boosting
## Terminologies
The terminologies and concepts in boosting is crucial for effectively applying these methods in machine learning. Here’s a breakdown of key terms and concepts related to boosting:

### Key Terminologies in Boosting

1. **Weak Learner**:
   - **Definition**: A weak learner is a model that performs slightly better than random chance. It’s a simple model, like a decision tree with a shallow depth (e.g., a stump), which individually may not have high accuracy but can contribute to the final model when combined.
   - **Role in Boosting**: Boosting algorithms iteratively train weak learners to correct the errors of previous learners. By combining many weak learners, boosting creates a strong learner with high predictive power.

2. **Strong Learner**:
   - **Definition**: A strong learner is a model that performs well and has high accuracy. It is typically an ensemble of weak learners combined to form a more accurate and robust model.
   - **Role in Boosting**: The goal of boosting is to create a strong learner by aggregating the predictions of weak learners.

3. **Ensemble Learning**:
   - **Definition**: Ensemble learning refers to combining multiple models to improve performance and robustness compared to a single model.
   - **Role in Boosting**: Boosting is a form of ensemble learning where multiple weak learners are combined to form a strong learner. Each weak learner corrects the errors of its predecessors, leading to improved performance.

4. **Boosting**:
   - **Definition**: Boosting is an iterative ensemble technique that combines the predictions of several base models (weak learners) to produce a single strong model. Each new model in the sequence focuses on the errors made by the previous models.
   - **How it Works**: Boosting adjusts the weights of misclassified instances so that subsequent models focus more on difficult cases, leading to a reduction in bias and variance.

5. **Learning Rate**:
   - **Definition**: The learning rate (or step size) is a parameter that controls the contribution of each weak learner to the final model. It determines how much the weights of misclassified instances are adjusted.
   - **Role in Boosting**: A smaller learning rate requires more weak learners to converge, potentially leading to a more robust model but at the cost of increased computation. A larger learning rate speeds up training but may risk overfitting.

6. **Weighted Data**:
   - **Definition**: In boosting, instances that are misclassified or have higher errors are given higher weights so that the next model focuses more on them.
   - **Role in Boosting**: By re-weighting data, boosting focuses on correcting mistakes made by previous models, leading to improved accuracy.

7. **Residuals**:
   - **Definition**: Residuals are the differences between the actual values and the predicted values of the model. They represent the errors or discrepancies in predictions.
   - **Role in Boosting**: Boosting algorithms aim to reduce the residuals by iteratively training models to correct errors made by previous models.

8. **AdaBoost (Adaptive Boosting)**:
   - **Definition**: AdaBoost is a specific boosting algorithm that adjusts the weights of instances based on errors. It combines weak learners to improve model performance, focusing more on instances that were misclassified by previous models.
   - **Key Feature**: AdaBoost assigns higher weights to misclassified instances, making them more influential in the training of subsequent weak learners.

9. **Gradient Boosting**:
   - **Definition**: Gradient Boosting is another boosting algorithm that builds models in a stage-wise manner and optimizes a loss function using gradient descent.
   - **Key Feature**: Each new model is trained to minimize the residual errors of the combined ensemble, resulting in better performance over iterative stages.

10. **XGBoost (Extreme Gradient Boosting)**:
    - **Definition**: XGBoost is an optimized implementation of gradient boosting that includes regularization to prevent overfitting, efficient handling of missing values, and parallel processing.
    - **Key Feature**: XGBoost often achieves high performance and efficiency due to its advanced optimization techniques and additional features.

### Why Boosting?

Boosting is widely used because it offers several advantages:

1. **Improved Accuracy**:
   - Boosting combines multiple weak learners to create a strong learner with high accuracy, often outperforming individual models.

2. **Bias-Variance Tradeoff**:
   - Boosting reduces bias by combining weak learners and can also help manage variance by focusing on misclassified instances and correcting errors.

3. **Flexibility**:
   - Boosting can be applied to various types of models and can handle different types of data, making it versatile for various tasks.

4. **Robustness**:
   - By iteratively correcting errors, boosting can create robust models that perform well on unseen data.

5. **Feature Importance**:
   - Boosting algorithms can provide insights into feature importance, helping to understand which features are most influential for predictions.

Overall, boosting is a powerful technique that enhances model performance by iteratively refining predictions, making it a valuable tool in machine learning.

## Boosting For Classification Problems

In [1]:
import seaborn as sns
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier, AdaBoostRegressor, GradientBoostingRegressor, VotingClassifier, VotingRegressor
import xgboost as xgb

In [2]:
from sklearn.preprocessing import LabelEncoder

# Load iris dataset
iris = sns.load_dataset('iris')

# Need to encode species because that is categorical variable
# Initialize LabelEncoder
le = LabelEncoder()

# Fit and transform the 'species' column
iris['species'] = le.fit_transform(iris['species'])

iris

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [3]:
# Features and target
X = iris.drop('species', axis=1)
y = iris['species']

In [4]:
# For classification
X_train_class, X_test_class, y_train_class, y_test_class = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
# Initialize and train AdaBoost Classifier
ada_clf = AdaBoostClassifier()
ada_clf.fit(X_train_class, y_train_class)

# Predict and evaluate
y_pred_class = ada_clf.predict(X_test_class)
print("AdaBoost Classifier Accuracy:", accuracy_score(y_test_class, y_pred_class))

AdaBoost Classifier Accuracy: 1.0


In [6]:
# Initialize and train Gradient Boosting Classifier
gb_clf = GradientBoostingClassifier()
gb_clf.fit(X_train_class, y_train_class)

# Predict and evaluate
y_pred_class = gb_clf.predict(X_test_class)
print("Gradient Boosting Classifier Accuracy:", accuracy_score(y_test_class, y_pred_class))

Gradient Boosting Classifier Accuracy: 1.0


In [7]:
# Initialize and train XGBoost Classifier
xgb_clf = xgb.XGBClassifier()
xgb_clf.fit(X_train_class, y_train_class)

# Predict and evaluate
y_pred_class = xgb_clf.predict(X_test_class)
print("XGBoost Classifier Accuracy:", accuracy_score(y_test_class, y_pred_class))

XGBoost Classifier Accuracy: 1.0


### Hyperparameter Tuning for Classifier

### For AdaBoost

In [8]:
param_grid = {
    'n_estimators': [50, 100],
    'learning_rate': [0.01, 0.1, 1]
}
grid_search = GridSearchCV(AdaBoostClassifier(), param_grid, cv=5)
grid_search.fit(X_train_class, y_train_class)
print("Best Parameters for AdaBoost:", grid_search.best_params_)

Best Parameters for AdaBoost: {'learning_rate': 1, 'n_estimators': 50}


### For Gradient Boosting

In [9]:
param_grid = {
    'n_estimators': [100, 200],
    'learning_rate': [0.01, 0.1],
    'max_depth': [3, 5, 7]
}
grid_search = GridSearchCV(GradientBoostingClassifier(), param_grid, cv=5)
grid_search.fit(X_train_class, y_train_class)
print("Best Parameters for Gradient Boosting:", grid_search.best_params_)

Best Parameters for Gradient Boosting: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 100}


### For XGBoost

In [10]:
param_grid = {
    'n_estimators': [100, 200],
    'learning_rate': [0.01, 0.1],
    'max_depth': [3, 5, 7]
}
grid_search = GridSearchCV(xgb.XGBClassifier(), param_grid, cv=5)
grid_search.fit(X_train_class, y_train_class)
print("Best Parameters for XGBoost:", grid_search.best_params_)

Best Parameters for XGBoost: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 200}


## Boosting For Regression Problems

In [11]:
import seaborn as sns
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier, AdaBoostRegressor, GradientBoostingRegressor, VotingClassifier, VotingRegressor
import xgboost as xgb

In [12]:
# Load diamonds dataset
diamonds = sns.load_dataset('diamonds')
diamonds

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.20,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75
...,...,...,...,...,...,...,...,...,...,...
53935,0.72,Ideal,D,SI1,60.8,57.0,2757,5.75,5.76,3.50
53936,0.72,Good,D,SI1,63.1,55.0,2757,5.69,5.75,3.61
53937,0.70,Very Good,D,SI1,62.8,60.0,2757,5.66,5.68,3.56
53938,0.86,Premium,H,SI2,61.0,58.0,2757,6.15,6.12,3.74


In [13]:
# For simplicity, use only numerical features and a target variable
# Note: The 'price' column is our target variable
X = diamonds.select_dtypes(include=[np.number]).drop('price', axis=1).fillna(0)
y = diamonds['price']

In [14]:
X

Unnamed: 0,carat,depth,table,x,y,z
0,0.23,61.5,55.0,3.95,3.98,2.43
1,0.21,59.8,61.0,3.89,3.84,2.31
2,0.23,56.9,65.0,4.05,4.07,2.31
3,0.29,62.4,58.0,4.20,4.23,2.63
4,0.31,63.3,58.0,4.34,4.35,2.75
...,...,...,...,...,...,...
53935,0.72,60.8,57.0,5.75,5.76,3.50
53936,0.72,63.1,55.0,5.69,5.75,3.61
53937,0.70,62.8,60.0,5.66,5.68,3.56
53938,0.86,61.0,58.0,6.15,6.12,3.74


In [15]:
y

0         326
1         326
2         327
3         334
4         335
         ... 
53935    2757
53936    2757
53937    2757
53938    2757
53939    2757
Name: price, Length: 53940, dtype: int64

In [16]:
# For regression
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X, y, test_size=0.2, random_state=42)

In [17]:
# Initialize and train AdaBoost Regressor
ada_reg = AdaBoostRegressor()
ada_reg.fit(X_train_reg, y_train_reg)

# Predict and evaluate
y_pred_reg = ada_reg.predict(X_test_reg)
print("AdaBoost Regressor MSE:", mean_squared_error(y_test_reg, y_pred_reg))

AdaBoost Regressor MSE: 3829080.421358767


In [18]:
# Initialize and train Gradient Boosting Regressor
gb_reg = GradientBoostingRegressor()
gb_reg.fit(X_train_reg, y_train_reg)

# Predict and evaluate
y_pred_reg = gb_reg.predict(X_test_reg)
print("Gradient Boosting Regressor MSE:", mean_squared_error(y_test_reg, y_pred_reg))

Gradient Boosting Regressor MSE: 1807043.3489397182


In [19]:
# Initialize and train XGBoost Regressor
xgb_reg = xgb.XGBRegressor()
xgb_reg.fit(X_train_reg, y_train_reg)

# Predict and evaluate
y_pred_reg = xgb_reg.predict(X_test_reg)
print("XGBoost Regressor MSE:", mean_squared_error(y_test_reg, y_pred_reg))

XGBoost Regressor MSE: 1856425.204958547


The MSE is very very high therefore the model does not perform well.The reason of applying this for understanding.In order to reduce mse,the following things you should consider

A Mean Squared Error (MSE) of 1,806,821.48 indicates that the predictions of your Gradient Boosting Regressor model are not performing well. To improve the model, you might consider the following steps:

1. **Feature Engineering**: Reevaluate the features you're using. Sometimes, adding new features or transforming existing ones can improve model performance.

2. **Hyperparameter Tuning**: Gradient Boosting models have several hyperparameters that can be tuned, such as the number of trees (`n_estimators`), the learning rate (`learning_rate`), and the maximum depth of the trees (`max_depth`). Use techniques like Grid Search or Random Search to find the best combination of these parameters.

3. **Model Complexity**: Check if your model is too complex or too simple. Overfitting can occur if the model is too complex, while underfitting can occur if it's too simple. Adjust the model complexity by changing hyperparameters.

4. **Cross-Validation**: Use cross-validation to ensure that your model's performance is consistent across different subsets of the data.

5. **Data Quality**: Ensure that your data is clean and representative. Outliers, missing values, or noise in the data can negatively impact model performance.

6. **Comparison with Other Models**: Compare the performance of Gradient Boosting with other regression models like Random Forest, XGBoost, or even linear models to see if they perform better.

7. **Scaling and Normalization**: Ensure that your features are properly scaled and normalized if required by the algorithm.


### Hyperparameter Tuning for Regressor

### For AdaBoost

In [20]:
param_grid = {
    'n_estimators': [50, 100],
    'learning_rate': [0.01, 0.1, 1]
}
grid_search = GridSearchCV(AdaBoostRegressor(), param_grid, cv=5)
grid_search.fit(X_train_reg, y_train_reg)
print("Best Parameters for AdaBoost:", grid_search.best_params_)

Best Parameters for AdaBoost: {'learning_rate': 0.01, 'n_estimators': 100}


### For Gradient Boosting

In [21]:
param_grid = {
    'n_estimators': [10, 50],
    'learning_rate': [0.01, 0.1],
    'max_depth': [3, 5, 7]
}
grid_search = GridSearchCV(GradientBoostingRegressor(), param_grid, cv=5)
grid_search.fit(X_train_reg, y_train_reg)
print("Best Parameters for Gradient Boosting:", grid_search.best_params_)

Best Parameters for Gradient Boosting: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 50}


### For XGBoost

In [22]:
param_grid = {
    'n_estimators': [10, 50],
    'learning_rate': [0.01, 0.1],
    'max_depth': [3, 5, 7]
}
grid_search = GridSearchCV(xgb.XGBRegressor(), param_grid, cv=5)
grid_search.fit(X_train_reg, y_train_reg)
print("Best Parameters for XGBoost:", grid_search.best_params_)

Best Parameters for XGBoost: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 50}


## Custom Boosting with Multiple Algorithms

### Classification

```python
voting_clf = VotingClassifier(estimators=[
    ('ada', AdaBoostClassifier()),
    ('gb', GradientBoostingClassifier()),
    ('xgb', xgb.XGBClassifier())
], voting='soft')

voting_clf.fit(X_train_class, y_train_class)
y_pred_class = voting_clf.predict(X_test_class)
print("Voting Classifier Accuracy:", accuracy_score(y_test_class, y_pred_class))
```

### Regression

```python
voting_reg = VotingRegressor(estimators=[
    ('ada', AdaBoostRegressor()),
    ('gb', GradientBoostingRegressor()),
    ('xgb', xgb.XGBRegressor())
])

voting_reg.fit(X_train_reg, y_train_reg)
y_pred_reg = voting_reg.predict(X_test_reg)
print("Voting Regressor MSE:", mean_squared_error(y_test_reg, y_pred_reg))
```


#### Prepared By,
Ahamed Basith