### 8.2 BAGGING (BOOTSTRAP AGGREGATING) 

Bagging stands for Bootstrap Aggregating; it is a technique that creates multiple versions of the original dataset by randomly sampling the data with replacement. Each sample is then used to train a separate model, and the final prediction is made by averaging the predictions of all the models. Bagging is particularly useful for reducing the variance of the final model.

The basic idea behind bagging is to create multiple subsets of the original dataset by randomly sampling the data with replacement. Each subset is used to train a separate model, and the final prediction is made by averaging the predictions of all the models. This technique can be used with any type of model, but it is particularly useful for decision tree models, which are known to have high variance.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Generate random data
np.random.seed(42)
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit a Random Forest classifier using bagging
clf = RandomForestClassifier(n_estimators=10, max_features='sqrt')
clf.fit(X_train, y_train)

# Evaluate the model
score = clf.score(X_test, y_test)
print("Accuracy: %.2f%%" % (score * 100))


Accuracy: 87.50%


### 8.3 BOOSTING: ADAPTING THE WEAK TO THE STRONG 

Boosting is a technique that combines multiple weak models to create a stronger model. The basic idea behind boosting is to train a series of models sequentially, where each model tries to correct the mistakes made by the previous model. Boosting can be used with any type of model, but it is particularly useful for decision tree models.


In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

# Generate random data
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit AdaBoost classifier with decision tree as base estimator
dt_clf = DecisionTreeClassifier(max_depth=1)
ada_clf = AdaBoostClassifier(base_estimator=dt_clf, n_estimators=50, learning_rate=0.1, random_state=42)
ada_clf.fit(X_train, y_train)

# Predict using trained AdaBoost classifier
y_pred = ada_clf.predict(X_test)

# Calculate accuracy score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy score: {:.2f}%".format(accuracy*100))


Accuracy score: 85.50%


#### Gradient Boosting

Gradient Boosting is a powerful ensemble learning method used in supervised learning problems for classification and regression. It combines the power of decision trees with the concept of gradient descent, and its flexibility and high accuracy make it a popular choice for many machine learning problems.

Gradient Boosting works by iteratively training a sequence of decision trees. In each iteration, a new decision tree is trained on the residual errors of the previous tree. The predictions of each tree are then combined to give the final prediction.

One of the key advantages of Gradient Boosting is that it can handle a variety of loss functions, which makes it a versatile method for different types of machine learning problems. The most commonly used loss functions are the mean squared error (MSE) for regression problems and the log loss for classification problems.


In [4]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate random dataset
np.random.seed(42)
X, y = make_regression(n_samples=1000, n_features=10, noise=20, random_state=42)

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Fit Gradient Boosting model
gb = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb.fit(X_train, y_train)

# Make predictions on test data
y_pred = gb.predict(X_test)

# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
print("Mean squared error: ", mse)


Mean squared error:  1600.1626279309244


#### XGBoost (Extreme Gradient Boosting) 

XGBoost (Extreme Gradient Boosting) is a popular implementation of gradient boosting. It is known for its speed and accuracy in handling large-scale datasets. In this article, we will provide a coding example for XGBoost with random dataset and explain how it works.

XGBoost is a machine learning algorithm that uses decision trees for regression and classification problems. The algorithm works by building a series of trees, where each tree corrects the mistakes of the previous tree. The trees are built using a greedy algorithm that finds the best split at each node.

XGBoost uses a technique called gradient boosting to optimize the trees. Gradient boosting involves adding new trees to the model that predict the residual errors of the previous trees. The idea is to gradually improve the model by reducing the errors at each iteration.

In [7]:
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate random data
np.random.seed(42)
X = np.random.rand(100, 5)
y = np.random.randint(0, 2, 100)

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create XGBoost DMatrix objects
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set hyperparameters for XGBoost
params = {
    'max_depth': 3, 
    'eta': 0.1, 
    'objective': 'binary:logistic', 
    'eval_metric': 'error'
}

# Train the model
num_round = 50
xg_model = xgb.train(params, dtrain, num_round)

# Make predictions
y_pred = xg_model.predict(dtest)
y_pred = [1 if x > 0.5 else 0 for x in y_pred]

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: %.2f%%" % (accuracy * 100.0))


Accuracy: 40.00%


In [6]:
### If you need to install xgboost you can use the below command. Just uncomment the below command and run it, it will install the xgboost library on your system.
#####!pip install xgboost

Defaulting to user installation because normal site-packages is not writeable
Collecting xgboost
  Downloading xgboost-1.7.4-py3-none-win_amd64.whl (89.1 MB)
Installing collected packages: xgboost
Successfully installed xgboost-1.7.4


#### LightGBM (Light Gradient Boosting Machine)

LightGBM (Light Gradient Boosting Machine) is a gradient boosting framework that uses tree-based learning algorithms. It is designed to be lightweight, fast, and scalable, making it a popular choice for large-scale machine learning tasks.

LightGBM is a high-performance gradient boosting framework that uses decision trees as its base model. It is designed to handle large datasets with millions of instances and features. LightGBM uses a technique called "leaf-wise growth" to grow decision trees, which allows it to find the optimal split points more efficiently.

One of the key features of LightGBM is its ability to handle categorical features. Unlike other gradient boosting frameworks, LightGBM can directly handle categorical features without the need for one-hot encoding. This can significantly reduce the memory footprint and training time for datasets with a large number of categorical features.

Another important feature of LightGBM is its ability to handle imbalanced datasets. It provides a parameter called "is_unbalance" that can be set to true to automatically adjust the weights of the training instances based on their class distribution.


In [9]:
### If you need to install xgboost you can use the below command
####!pip install lightgbm

Defaulting to user installation because normal site-packages is not writeable
Collecting lightgbm
  Downloading lightgbm-3.3.5-py3-none-win_amd64.whl (1.0 MB)
Installing collected packages: lightgbm
Successfully installed lightgbm-3.3.5


In [10]:
import numpy as np
import lightgbm as lgb

np.random.seed(42)

# Generate random data
X = np.random.rand(1000, 5)
y = np.random.randint(0, 2, 1000)

# Split data into training and testing sets
train_data = lgb.Dataset(X[:800], label=y[:800])
test_data = lgb.Dataset(X[800:], label=y[800:])

# Set parameters
params = {
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

# Train model
model = lgb.train(params, train_data, valid_sets=[test_data])

# Make predictions on test set
y_pred = model.predict(X[800:])


[LightGBM] [Info] Number of positive: 405, number of negative: 395
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1275
[LightGBM] [Info] Number of data points in the train set: 800, number of used features: 5
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.506250 -> initscore=0.025001
[LightGBM] [Info] Start training from score 0.025001
[1]	valid_0's binary_logloss: 0.691565
[2]	valid_0's binary_logloss: 0.692726
[3]	valid_0's binary_logloss: 0.692952
[4]	valid_0's binary_logloss: 0.692721
[5]	valid_0's binary_logloss: 0.695557
[6]	valid_0's binary_logloss: 0.696472
[7]	valid_0's binary_logloss: 0.696717
[8]	valid_0's binary_logloss: 0.698908
[9]	valid_0's binary_logloss: 0.700466
[10]	valid_0's binary_logloss: 0.700591
[11]	valid_0's binary_logloss: 0.704054
[12]	valid_0's binary_logloss: 0.705939
[13]	valid_0's binary_logloss: 0.705556
[14]	valid_0's binary_logloss: 0.706172
[15]	valid_0's binary_logloss: 0.706068
[16]	valid_0's binary_loglos

### 8.4 STACKING: BUILDING A POWERFUL META MODEL 

Stacking is a technique that combines multiple models to create a stronger model. It works by training a series of models using different subsets of the data and then using the predictions of these models as inputs to train a final model. Stacking can be used with any type of model, but it is particularly useful for combining models of different types.

The basic idea behind stacking is to divide the data into two subsets: the training set and the holdout set. The training set is used to train multiple models, and the holdout set is used to make predictions for these models. The predictions of the models are then concatenated with the original features and used to train a final model, called the meta-model. The final model can be any type of model such as a linear model, decision tree, or neural network.


In [11]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, StratifiedKFold

# Generate random dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Define base models
model_1 = RandomForestClassifier(n_estimators=50, random_state=42)
model_2 = GradientBoostingClassifier(n_estimators=50, random_state=42)

# Define meta model
meta_model = LogisticRegression(random_state=42)

# Create k-fold cross-validation splits
n_splits = 5
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

# Train base models and make predictions on test set
X_meta_train = np.zeros((len(X), 2))
for i, model in enumerate([model_1, model_2]):
    for train_index, test_index in skf.split(X, y):
        model.fit(X[train_index], y[train_index])
        X_meta_train[test_index, i] = model.predict_proba(X[test_index])[:, 1]

# Train meta model on meta features and evaluate
score = cross_val_score(meta_model, X_meta_train, y, cv=skf, scoring='roc_auc').mean()
print(f"Stacking AUC score: {score:.4f}")


Stacking AUC score: 0.9643


### 8.5 BLENDING 

Blending is a technique that is similar to stacking, but it combines the predictions of multiple models rather than the models themselves. It works by training multiple models on different subsets of the data, then using the predictions of these models to train a final model. The main difference between blending and stacking is that blending uses the predictions of the models as inputs to the final model, while stacking uses the models themselves as inputs.

In scikit-learn, blending can be implemented by training multiple models on different subsets of the data, then using the predictions of these models as inputs to train a final model. Here's an example of how to use blending to train a model on a dataset:


In [14]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

# Generate random dataset
X, y = make_regression(n_samples=1000, n_features=5, noise=0.5)

# Split the data into two parts
split = 0.8
split_idx = int(split * len(X))
X_train = X[:split_idx]
y_train = y[:split_idx]
X_blend = X[split_idx:]
y_blend = y[split_idx:]

# Train base models
models = [LinearRegression(), DecisionTreeRegressor()]
for model in models:
    model.fit(X_train, y_train)

# Make predictions on the blend data using base models
blend_preds = np.column_stack([model.predict(X_blend) for model in models])

# Train blending model on the blend data
blend_model = LinearRegression()
blend_model.fit(blend_preds, y_blend)

# Make predictions on the test data using the blended model
test_preds = np.column_stack([model.predict(X[split_idx:]) for model in models])
final_preds = blend_model.predict(test_preds)

# Calculate the RMSE
rmse = np.sqrt(mean_squared_error(y[split_idx:], final_preds))
print(f"RMSE: {rmse}")


RMSE: 0.5098260853286533


### 8.6 ROTATION FOREST 

Rotation Forest is an ensemble learning method that was introduced by Rodriguez et al. in 2006. It belongs to the family of decision tree-based ensemble methods, and its main idea is to increase the diversity among the base classifiers by applying a random feature transformation before building each tree.

Rotation Forest uses a technique called PCA (Principal Component Analysis) to randomly select a subset of features from the original dataset, and then rotates these features in a way that maximizes the variance of the transformed features. This process is repeated for each tree, resulting in a set of diverse base classifiers that are less correlated with each other.

The idea behind Rotation Forest is that by applying random feature transformations, it is more likely to capture the underlying structure of the data and reduce the risk of overfitting. Additionally, the use of PCA ensures that the transformed features are uncorrelated and therefore, more informative.


In [17]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import KFold

# Generate random data
np.random.seed(42)
X = np.random.rand(100, 10)
y = np.random.randint(2, size=100)

# Define the number of trees and features to select
num_trees = 10
num_features = 3

# Initialize an empty set of rotated datasets
rotated_datasets = []

# For each tree in the forest
for i in range(num_trees):
    # Randomly select k features from the dataset
    selected_features = np.random.choice(X.shape[1], size=num_features, replace=False)
    # Compute the PCA projection matrix for the selected features
    pca = PCA(n_components=num_features)
    pca.fit(X[:, selected_features])
    projection_matrix = pca.components_
    # Rotate the dataset using the projection matrix
    rotated_data = np.dot(X[:, selected_features], projection_matrix.T)
    rotated_datasets.append(rotated_data)

# Train a meta-model using the rotated datasets
meta_features = np.hstack(rotated_datasets)
model = DecisionTreeClassifier()
model.fit(meta_features, y)

# Evaluate the performance using 10-fold cross-validation
kf = KFold(n_splits=10)
scores = []
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    # Compute the rotated datasets for the training and test sets
    rotated_train = []
    rotated_test = []
    for dataset in rotated_datasets:
        rotated_train.append(dataset[train_index])
        rotated_test.append(dataset[test_index])
    meta_train = np.hstack(rotated_train)
    meta_test = np.hstack(rotated_test)
    # Train a meta-model on the rotated training set
    model = DecisionTreeClassifier()
    model.fit(meta_train, y_train)
    # Evaluate the meta-model on the rotated test set
    score = model.score(meta_test, y_test)
    scores.append(score)

print("Mean accuracy: {:.2f}%".format(np.mean(scores) * 100))


Mean accuracy: 53.00%


### 8.7 CASCADING CLASSIFIERS 

Cascading classifiers is a type of ensemble learning method that combines multiple weak classifiers into a single strong classifier. This method is often used in object detection applications where the objective is to identify an object within an image or a video.

The basic idea of cascading classifiers is to break down the object detection problem into multiple stages or layers. Each layer is responsible for detecting a particular aspect of the object. For example, the first layer might detect the edges of the object, the second layer might detect its shape, and the final layer might identify the object itself.

The advantage of cascading classifiers is that it allows for faster and more efficient object detection. Since each layer is designed to detect a specific feature of the object, it can quickly eliminate any parts of the image that do not contain that feature. This reduces the number of false positives and speeds up the overall detection process.

To implement cascading classifiers, we can use a combination of feature extraction techniques and machine learning algorithms. For feature extraction, we might use techniques such as Haar cascades, which are commonly used in object detection applications. For machine learning algorithms, we might use techniques such as support vector machines (SVMs) or neural networks.


In [24]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Generate random dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=42)

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define cascading classifiers pipeline
cascading_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(kernel='linear', probability=True))
#     ('random_forest', RandomForestClassifier())
])

# Train the first classifier on the entire training set
cascading_pipeline.fit(X_train, y_train)

# Make predictions on the test set using the first classifier
first_classifier_predictions = cascading_pipeline.predict(X_test)

# Extract the samples which were misclassified by the first classifier
misclassified_samples_mask = first_classifier_predictions != y_test
misclassified_samples_X = X_test[misclassified_samples_mask]
misclassified_samples_y = y_test[misclassified_samples_mask]

# Train the second classifier on the misclassified samples
cascading_pipeline.fit(misclassified_samples_X, misclassified_samples_y)

# Make predictions on the test set using both classifiers
final_predictions = cascading_pipeline.predict(X_test)

# Calculate the accuracy of the final predictions
accuracy = accuracy_score(y_test, final_predictions)

# Print the accuracy score
print(f'Accuracy score: {accuracy:.2f}')


Accuracy score: 0.20


### 8.8 ADVERSARIAL TRAINING 

Adversarial examples are crafted by adding a small perturbation to the input data that is not noticeable to the human eye but can significantly change the model's output. Adversarial training involves generating such examples and training the model with them, which makes the model more robust and able to handle adversarial attacks.

One common approach to generate adversarial examples is the Fast Gradient Sign Method (FGSM). This method computes the gradient of the loss function with respect to the input data and adds a small perturbation in the direction that maximizes the loss. The perturbation is scaled by a small value, which controls the magnitude of the perturbation.

Adversarial training involves generating such adversarial examples and incorporating them into the training data. During training, the model learns to recognize and classify these examples correctly, which improves its ability to handle adversarial attacks.


In [30]:
###If cleverhans is not installed on your device, you need to install cleverhans by using the below command:
### !pip install cleverhans

Defaulting to user installation because normal site-packages is not writeable
Collecting cleverhans
  Downloading cleverhans-4.0.0-py3-none-any.whl (92 kB)
Collecting tensorflow-probability
  Downloading tensorflow_probability-0.19.0-py2.py3-none-any.whl (6.7 MB)
Collecting easydict
  Downloading easydict-1.10.tar.gz (6.4 kB)
Collecting mnist
  Downloading mnist-0.2.2-py2.py3-none-any.whl (3.5 kB)
Collecting absl-py
  Downloading absl_py-1.4.0-py3-none-any.whl (126 kB)
Collecting gast>=0.3.2
  Downloading gast-0.5.3-py3-none-any.whl (19 kB)
Collecting dm-tree
  Downloading dm_tree-0.1.8-cp39-cp39-win_amd64.whl (101 kB)
Building wheels for collected packages: easydict
  Building wheel for easydict (setup.py): started
  Building wheel for easydict (setup.py): finished with status 'done'
  Created wheel for easydict: filename=easydict-1.10-py3-none-any.whl size=6506 sha256=4d50c11736700e791344b3eb6e94f9471643a149c5d0cc48237c143ecb1cc25f
  Stored in directory: c:\users\rajen\appdata\local\

In [29]:
### !pip install -qq -e git+http://github.com/tensorflow/cleverhans.git#egg=cleverhans

  ERROR: Error [WinError 2] The system cannot find the file specified while executing command git clone -q http://github.com/tensorflow/cleverhans.git 'C:\Users\rajen\Documents\KnowledgeBase\machine learning\src\cleverhans'
ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?


In [32]:
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from cleverhans.future.tf2.attacks import fast_gradient_method
from cleverhans.future.tf2.attacks import projected_gradient_descent
from cleverhans.future.tf2.attacks import sparse_l1_descent_attack
from cleverhans.future.tf2.attacks import carlini_wagner_l2_attack

# Generate random data
np.random.seed(42)
X_train = np.random.rand(100, 2)
y_train = (X_train[:, 0] < X_train[:, 1]).astype(int)

# Create SVM classifier
clf = SVC(kernel='linear', probability=True)

# Train SVM classifier on original data
clf.fit(X_train, y_train)
y_pred = clf.predict(X_train)
print('Accuracy on original data:', accuracy_score(y_train, y_pred))

# Generate adversarial examples using FGSM method
eps = 0.1
X_adv = fast_gradient_method(clf, X_train, eps=eps, norm=np.inf, targeted=False)

# Train SVM classifier on adversarial examples
clf.fit(X_adv, y_train)
y_pred_adv = clf.predict(X_train)
print('Accuracy on adversarial data:', accuracy_score(y_train, y_pred_adv))


ModuleNotFoundError: No module named 'cleverhans.future'

### 8.9 VOTING CLASSIFIER 

Ensemble learning methods combine multiple machine learning models to improve the predictive performance of the overall model. One of the simplest and most popular ensemble methods is the Voting Classifier, which combines the predictions of multiple individual classifiers to make a final prediction. In this section, we will discuss the concept of the Voting Classifier and provide a real-life coding example.

#### What is a Voting Classifier? 

A Voting Classifier is an ensemble learning method that combines the predictions of multiple individual classifiers to make a final prediction. The idea behind the Voting Classifier is that by combining the predictions of multiple classifiers, the overall prediction will be more accurate and less prone to errors than any individual classifier.

The Voting Classifier can be implemented in two ways: hard voting and soft voting. In hard voting, each individual classifier makes a binary prediction, and the final prediction is based on the majority vote of the individual predictions. In soft voting, each individual classifier produces a probability estimate for each class, and the final prediction is based on the average probability of each class across all individual classifiers.
#### Coding Example: 

Let's implement a Voting Classifier on a random dataset using the scikit-learn library. We will first generate a random dataset using the make_classification function of scikit-learn, which generates a random n-class classification problem.


In [33]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Generate a random dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the individual classifiers
clf1 = DecisionTreeClassifier(random_state=42)
clf2 = LogisticRegression(random_state=42)
clf3 = SVC(kernel='linear', probability=True, random_state=42)

# Define the Voting Classifier
voting_clf = VotingClassifier(estimators=[('dt', clf1), ('lr', clf2), ('svm', clf3)], voting='hard')

# Train the Voting Classifier
voting_clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = voting_clf.predict(X_test)

# Evaluate the accuracy of the Voting Classifier
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)


Accuracy: 0.865
