# Bagging Logistic Regression

We demonstrate how to use bagging (**b**ootstrap **agg**regat**ing**) to improve the performance of logistic regression on the [Wisconsin breast cancer dataset](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29).

In [1]:
import numpy as np
from sklearn.datasets import load_breast_cancer

from stattools.ensemble import BaggingClassifier
from stattools.glm import LogisticRegression
from stattools.optimization import NewtonRaphson
from stattools.preprocessing import Standardizer

In [2]:
# Set NumPy random number generator seed for replicability
np.random.seed(10)

## Loading the breast cancer dataset

In [3]:
data = load_breast_cancer()
x = data.data
y = data.target_names[data.target]

# Shuffle data and split it into training/testing samples
idx = np.random.permutation(y.size)
x_train, x_test = np.array_split(x[idx], 2)
y_train, y_test = np.array_split(y[idx], 2)

# Standardize training data to have mean 0 and variance 1
std = Standardizer()
std.fit(x_train)
x_train = std.transform(x_train)

# Standardize testing data using training standardization
x_test = std.transform(x_test)

## Ordinary logistic regression model with an $L^2$ penalty

In [4]:
%%time
model = LogisticRegression(reg="l2", penalty=0.01, standardize=False)
newton = NewtonRaphson(iterations=100)
model.fit(x=x_train, y=y_train, optimizer=newton)

CPU times: user 90.2 ms, sys: 7.48 ms, total: 97.7 ms
Wall time: 71.3 ms


In [5]:
mcr_train = model.mcr(x_train, y_train)
mcr_test = model.mcr(x_test, y_test)
print(f"Training misclassification rate: {mcr_train:.4f}")
print(f"Testing misclassification rate:  {mcr_test:.4f}")

Training misclassification rate: 0.0035
Testing misclassification rate:  0.0423


## Bagging the previous logistic regression model

In [6]:
%%time
model = BaggingClassifier(base=LogisticRegression, reg="l2", penalty=0.01, standardize=False)
newton = NewtonRaphson(iterations=100)
model.fit(x=x_train, y=y_train, optimizer=newton)

CPU times: user 10.2 s, sys: 628 ms, total: 10.8 s
Wall time: 8.54 s


In [7]:
mcr_train = model.mcr(x_train, y_train)
mcr_test = model.mcr(x_test, y_test)
print(f"Training misclassification rate: {mcr_train:.4f}")
print(f"Testing misclassification rate:  {mcr_test:.4f}")

Training misclassification rate: 0.0035
Testing misclassification rate:  0.0387


## References

Leo Breiman. "Bagging Predictors".
Machine Learning, 1996, Vol. 24, No. 2: pp. 123–140. ([DOI](https://doi.org/10.1007/BF00058655))