In [None]:
# -*- coding: utf-8 -*-
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Ensemble Learning
- Ensemble learning is an advanced technique in the field of machine learning that involves combining multiple models to achieve better results compared to individual models. 
- These models are often referred to as weak learners, and the intuition behind ensemble learning is that when several weak learners are combined, they can become strong learners.
- Each weak learner is trained on the training set and provides predictions. The final prediction is then computed by combining the results from all the weak learners.

### Basic Ensemble Learning Techniques:
- Max Voting
- Averaging
- Weighted Average

### Advanced ensemble learning techniques:
- Stacking
- Blending
- Bagging
- Boosting

In [None]:
# !pip install xgboost
import xgboost as xgb

# Max Voting

It is mainly used for classification problems. The Max Voting technique treats the predictions from each model as votes. The final prediction is determined by the prediction with the most votes.

#### Theoretical example: 

Imagine you have a dataset with movie ratings from various film critics. You want to determine if a movie will be successful or not (e.g., whether it will make a lot of money at the box office or receive positive reviews).
You have three classifiers, each trained on different data and using different approaches to evaluate movies:
- **Classifier 1** - Predicts a successful movie.
- **Classifier 2** - Predicts an unsuccessful movie.
- **Classifier 3** - Predicts a successful movie.
  
If the majority of classifiers (in this case, 2 out of 3) predict the movie as successful, it is likely to be successful.

#### Practical example:
In this  example, three classification models (logistic regression, SVC, and random forest) are combined using sklearn VotingClassifier, that model is trained and the class with maximum votes is returned as output. The final prediction output is **prediction**.

In [None]:
# Importing utility modules
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Importing machine learning models for prediction
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

# Importing voting classifier
from sklearn.ensemble import VotingClassifier

# Load the digits dataset
digits = load_digits()
X, y = digits.data, digits.target

# Split the data into training and validation sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Initialize models with their default parameters
model_lr = LogisticRegression(max_iter=10000)  # Increased max_iter for convergence
model_svc = SVC(probability=True)  # Enable probability for soft voting
model_rf = RandomForestClassifier()

# Making the final model using voting classifier
final_model = VotingClassifier(
    estimators=[('lr', model_lr), ('svc', model_svc), ('rf', model_rf)], voting='soft')

# Fit the final model on the training dataset
final_model.fit(X_train, y_train)

# Predict the output on the test dataset
prediction = final_model.predict(X_test)
 
# Print accuracy between actual and predicted value
accuracy = accuracy_score(y_test, prediction)
print(f'Accuracy of the final model: {accuracy:.2f}')
# The accuracy score provides a straightforward interpretation of how well the combined model performs.
# An accuracy of 1.0 means the model is perfect at classifying the digits in the test set.

# Averaging

It is mainly used for regression problems. The Averaging technique calculates the final output as the average of all predictions. For instance, in random forest regression, the final result is the average of the predictions from individual decision trees.

#### Theoretical example: 

Suppose you have three different temperature sensors in a particular location, but each sensor has some degree of error. Each sensor provides temperature values at a given moment:
- **Sensor 1** - Records 20°C.
- **Sensor 2** - Records 22°C.
- **Sensor 3** - Records 19°C.

When predicting the actual temperature, the result is obtained by averaging the values from all sensors:

(20 + 22 + 19) / 3 = 20.33°C. 

This allows us to obtain a more accurate estimate of the current temperature.

#### Practical example:
In this  example, three regression models (linear regression, xgboost, and random forest) are trained and their predictions are averaged (using the California housing dataset). The final prediction output is **prediction**.

In [None]:
# Importing utility modules
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Importing machine learning models
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

# Load the California housing dataset
california_housing = fetch_california_housing(as_frame=True)
df = california_housing.frame

# The target variable is already continuous and suitable for regression
X = df.drop('MedHouseVal', axis=1)  # Features
y = df['MedHouseVal']  # Target variable

# Split the data into training and validation sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Initialize models with their default parameters
model_Lr = LinearRegression()
model_Xgb = xgb.XGBRegressor(objective ='reg:squarederror')  # Specify objective for regression
model_Rfr = RandomForestRegressor()

# Train all the models on the training data
model_Lr.fit(X_train, y_train)
model_Xgb.fit(X_train, y_train)
model_Rfr.fit(X_train, y_train)

# Predict on the validation dataset
pred_Lr = model_Lr.predict(X_test)
pred_Xgb = model_Xgb.predict(X_test)
pred_Rfr = model_Rfr.predict(X_test)

# Final prediction after averaging the predictions of all 3 models
prediction = (pred_Lr + pred_Xgb + pred_Rfr) / 3.0

# Print the mean squared error between the actual values and predicted values
mse = mean_squared_error(y_test, prediction)
# A lower MSE value indicates that the model is more accurate and that your predictions are closer to the actual values.
print("Mean Squared Error:", mse)

# Weighted Average

In weighted averaging, the base model with higher predictive power is given more importance. In the price prediction example, each of the regressors is assigned a weight. The sum of the weights equals one.

#### Theoretical example:

Imagine you are trying to predict stock prices in the market. You have three different models, and each model has a different track record of success in predicting stock prices.
The models are as follows:
- **Model 1** - Accuracy 60%.
- **Model 2** - Accuracy 75%.
- **Model 3** - Accuracy 80%.
  
In this case, you can assign weights to each model based on its accuracy. For example:

- **Model 1** - Weight 0.2.
- **Model 2** - Weight 0.3.
- **Model 3** - Weight 0.5.

The final stock price can be calculated using a weighted average prediction:

(0.2 * Model 1 Prediction) + (0.3 * Model 2 Prediction) + (0.5 * Model 3 Prediction) = Final Stock Price Prediction.

This way, you can effectively combine predictions from different models with varying levels of accuracy in predicting stock prices.

#### Practical example:

In [None]:
# Importing utility modules
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing

# Importing machine learning models
from sklearn.linear_model import LinearRegression
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor

# Load the California Housing dataset
housing = fetch_california_housing(as_frame=True)
df = housing.frame

# Assuming 'MedHouseVal' is the target variable
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']

# Split the data into training and validation sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Initialize models with their default parameters
model_Lr = LinearRegression()
model_Xgb = xgb.XGBRegressor(objective='reg:squarederror')  # Specify the objective to avoid warnings
model_Rfr = RandomForestRegressor()

# Train all the models on the training data
model_Lr.fit(X_train, y_train)
model_Xgb.fit(X_train, y_train)
model_Rfr.fit(X_train, y_train)

# Predict on the validation dataset
pred_Lr = model_Lr.predict(X_test)
pred_Xgb = model_Xgb.predict(X_test)
pred_Rfr = model_Rfr.predict(X_test)

# Weights for the models, determined based on their performance
# You can try changing the weights and see how it will change prediction and MSE, but ensure the sum of weights is 1
weights = {'Lr': 0.5, 'Xgb': 0.1, 'Rfr': 0.4}

# Calculate the weighted average of predictions
weighted_prediction = (pred_Lr * weights['Lr'] +
                       pred_Xgb * weights['Xgb'] +
                       pred_Rfr * weights['Rfr'])

# Calculate and print the MSE for the weighted average
weighted_mse = mean_squared_error(y_test, weighted_prediction)
print("Mean Squared Error for the weighted average:", weighted_mse)

# Stacking

Stacking is the process of combining various estimators in order to reduce their biases. Predictions from each estimator are stacked together and used as input to a final estimator (usually called a meta-model) that computes the final prediction. Training of the final estimator happens via cross-validation. Stacking can be done for both regression and classification problems.

#### Algorithm:

Stacking can be considered to happen in the following steps:
1. Split the train dataset into n parts
2. A base model (say linear regression) is fitted on n-1 parts and predictions are made for the nth part. This is done for each one of the n part of the train set.
3. The base model is then fitted on the whole train dataset.
4. This model is used to predict the test dataset
5. The Steps 2 to 4 are repeated for another base model which results in another set of predictions for the train and test dataset.
6. The predictions on train data set are used as a feature to build the new model.
7. This final model is used to make the predictions on test dataset.

#### Practical example:

In [None]:
# Importing utility modules
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Importing machine learning models
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.ensemble import StackingRegressor

# Load the California Housing dataset
housing = fetch_california_housing(as_frame=True)
df = housing.frame

# Assuming 'MedHouseVal' is the target variable
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']

# Split the data into training and validation sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Define base models
base_models = [
    ('rf', RandomForestRegressor(n_estimators=10, random_state=42)),
    ('gb', GradientBoostingRegressor(n_estimators=10, random_state=42)),
    ('svr', SVR(C=1, gamma='auto'))
]

# Define meta-learner model
meta_model = LinearRegression()

# Create the stacking model
stacked_model = StackingRegressor(estimators=base_models, final_estimator=meta_model, cv=5)

# Train the stacking model
stacked_model.fit(X_train, y_train)

# Predict on the validation dataset
stacked_prediction = stacked_model.predict(X_test)

# Calculate and print the MSE for the stacking model
stacking_mse = mean_squared_error(y_test, stacked_prediction)
print("Mean Squared Error for the Stacking Model:", stacking_mse)

# Blending

Blending is similar to stacking, but uses a holdout set from the training set to make predictions. So, predictions are done on the holdout set only. The predictions and holdout set are used to build a final model that makes predictions on the test set. You can think of blending as a type of stacking, where the meta-model is trained on predictions made by the base model on the hold-out validation set.

#### Algorithm:
You can consider the blending process to be:
1. Split the data into a test and validation set.
2. Fit base models on the validation set.
3. Make predictions on the validation and test set.
4. Use the validation set and its predictions to build a final model.
5. Make final predictions using this model.

#### Practical example:

In [None]:
# Importing utility modules
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Importing machine learning models
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR

# Load the California Housing dataset
housing = fetch_california_housing(as_frame=True)
df = housing.frame

# Assuming 'MedHouseVal' is the target variable
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']

# Split the data into training, validation, and holdout sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.30, random_state=42)
X_valid, X_holdout, y_valid, y_holdout = train_test_split(X_temp, y_temp, test_size=0.50, random_state=42)

# Define base models
base_models = [
    ('rf', RandomForestRegressor(n_estimators=10, random_state=42)),
    ('gb', GradientBoostingRegressor(n_estimators=10, random_state=42)),
    ('svr', SVR(C=1, gamma='auto'))
]

# Define meta-learner model
meta_model = LinearRegression()

# Train and create predictions for each base model, and stack the predictions together for the holdout set
base_predictions_holdout = np.hstack([
    model.fit(X_train, y_train).predict(X_holdout).reshape(-1,1)
    for _, model in base_models
])

# Convert the stacked array to DataFrame for holdout set
blend_data_holdout = pd.DataFrame(base_predictions_holdout)

# Train meta-learner on holdout predictions
meta_model.fit(blend_data_holdout, y_holdout)

# Generate and stack predictions for each base model on validation set
base_predictions_valid = np.hstack([
    model.predict(X_valid).reshape(-1,1)
    for _, model in base_models
])

# Convert the stacked array to DataFrame for validation set
blend_data_valid = pd.DataFrame(base_predictions_valid)

# Predict on the validation dataset using meta-learner
blended_prediction = meta_model.predict(blend_data_valid)

# Calculate and print the MSE for the blended model
blending_mse = mean_squared_error(y_valid, blended_prediction)
print("Mean Squared Error for the Blending Model:", blending_mse)

# Bagging

Bagging, short for Bootstrap Aggregating, is an ensemble learning technique in machine learning. It aims to improve the accuracy and stability of a predictive model by reducing variance and mitigating overfitting. Bagging works by creating multiple subsets (bags) of the training data using bootstrapping and training a separate base model on each subset. The final prediction is typically obtained by aggregating the predictions of these base models.

#### Algorithm:
The method involves:
1. Creating multiple subsets from the original dataset with replacement.
2. Building a base model for each of the subsets.
3. Running all the models in parallel.
4. Combining predictions from all models to obtain final predictions.

#### Practical example:

In [None]:
# Importing utility modules
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

# Importing machine learning models
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing(as_frame=True)
df = housing.frame

# Assuming 'MedHouseVal' is the target variable
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Define the base model for bagging
# Create a Bagging ensemble of Decision Trees
bagging_model = BaggingRegressor(
    estimator=DecisionTreeRegressor(),
    n_estimators=100,
    random_state=42
)

# Train the bagging ensemble
bagging_model.fit(X_train, y_train)

# Predict on the test dataset
bagging_prediction = bagging_model.predict(X_test)

# Calculate and print the MSE for the bagging model
bagging_mse = mean_squared_error(y_test, bagging_prediction)
print("Mean Squared Error for the Manual Bagging Model:", bagging_mse)

# Boosting

Boosting is an ensemble learning technique in machine learning that focuses on converting a set of weak learners into a strong learner. It does this by sequentially training the weak learners and giving more weight to examples that the previous learners found difficult to classify correctly. This process continues iteratively, allowing each weak learner to fix the errors made by the previous ones. The final prediction is a weighted combination of the individual learners' predictions.

#### Algorithm:
Here’s what the entire process looks like:
1. Take a subset of the train dataset.
2. Train a base model on that dataset.
3. Use third model to make predictions on the whole dataset.
4. Calculate errors using the predicted values and actual values.
5. Initialize all data points with same weight.
6. Assign higher weight to incorrectly predicted data points.
7. Make another model, make predictions using the new model in such a way that errors made by the previous model are mitigated/corrected.
8. Similarly, create multiple models–each successive model correcting the errors of the previous model.
9. The final model (strong learner) is the weighted mean of all the previous models (weak learners).

#### Practical example:

In [None]:
# Importing utility modules
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

# Importing machine learning models
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing(as_frame=True)
df = housing.frame

# Assuming 'MedHouseVal' is the target variable
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Define the boosting model
boosting_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Train the boosting model
boosting_model.fit(X_train, y_train)

# Predict on the test dataset
boosting_prediction = boosting_model.predict(X_test)

# Calculate and print the MSE for the boosting model
boosting_mse = mean_squared_error(y_test, boosting_prediction)
print("Mean Squared Error for the Boosting Model:", boosting_mse)