# Ensemble Learning


# The Technique (Ensemble Learning)

Ensemble learning is a widely-used and preferred machine learning technique in which multiple individual models, often called base models, are combined to produce an effective optimal prediction model. **Bagging and Random Forest** algorithm are examples of ensemble learning.

# Bagging
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that helps to improve the performance and accuracy of machine learning algorithms. It is used to deal with bias-variance trade-offs and reduces the variance of a prediction model. Bagging avoids overfitting of data and is used for both regression and classification models, specifically for decision tree algorithms.

# Random Forest
Random forest is an ensemble model using bagging as the ensemble method and decision tree as the individual model.

# The Problem
We explicitly use ensemble learning to seek better predictive performance, such as lower error on regression or high accuracy for classification.

# Code

In [None]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.tree import DecisionTreeClassifier


def bagging_fit(X, y, n_estimators, max_depth=5, max_samples=200):
    n_examples = len(y)
    estimators = [DecisionTreeClassifier(max_depth=max_depth)
                  for _ in range(n_estimators)]

    for tree in estimators:
        bag = np.random.choice(n_examples, max_samples, replace=True)
        tree.fit(X[bag, :], y[bag])

    return estimators


from scipy.stats import mode


def bagging_predict(X, estimators):
    all_predictions = np.array([tree.predict(X) for tree in estimators])
    ypred, _ = mode(all_predictions, axis=0)
    return np.squeeze(ypred)


X, y = make_moons(n_samples=300, noise=.25, random_state=0)
Xtrn, Xtst, ytrn, ytst = train_test_split(X, y, test_size=0.33)

bag_ens = bagging_fit(Xtrn, ytrn, n_estimators=500,
                      max_depth=12, max_samples=200)
ypred = bagging_predict(Xtst, bag_ens)

# Result of Bagging

In [None]:
print('Accuracy of bagging', accuracy_score(ytst, ypred))

Accuracy of bagging 0.9595959595959596


# Results of Decision Tree

In [None]:
# Using decision tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

base_estimator = DecisionTreeClassifier(max_depth=10)
bag_ens = BaggingClassifier(base_estimator=base_estimator, n_estimators=500,
                            max_samples=100, oob_score=True)
bag_ens.fit(Xtrn, ytrn)
ypred2 = bag_ens.predict(Xtst)

print('OOB score ',bag_ens.oob_score_)
print('Accuracy of decision tree',accuracy_score(ytst, ypred2))

OOB score  0.9601990049751243
Accuracy of decision tree 0.9595959595959596


# Lab Assigned

Apply KNN and compare the result with Bagging results.

# Code

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors=10, metric='euclidean')
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix

# Results of KNN

# Confusion Matrix

In [None]:
print('KNN Confusion Matrix')
print(confusion_matrix(y_test, y_pred))

KNN Confusion Matrix
[[22  1]
 [ 1 36]]


# Classification Report

In [None]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.96      0.96      0.96        23
           1       0.97      0.97      0.97        37

    accuracy                           0.97        60
   macro avg       0.96      0.96      0.96        60
weighted avg       0.97      0.97      0.97        60



# Conclusion

Accuracy of Bagging algorithm is more than KNN algorithm.