In [1]:
from sklearn.tree import DecisionTreeClassifier
from mlxtend.classifier import StackingClassifier
import time
from mnist import *
import numpy as np
import warnings

warnings.simplefilter("ignore")

training_set_path = "D:\\Projects\\ml-experiments\\datasets\\mnist\\train-images-idx3-ubyte.gz"
train_labels_path = "D:\\Projects\\ml-experiments\\datasets\\mnist\\train-labels-idx1-ubyte.gz"

f_train = gzip.open(training_set_path)
f_train_labels = gzip.open(train_labels_path)

training_set = parse_idx(f_train)
training_labels = parse_idx(f_train_labels)

training_set_tr = training_set.reshape((60000, 784))

**Some utility function to reuse throughout experiment**

In [2]:
def get_random_digit(training_set, labels, digit):
    indexes = np.where(labels == digit)[0]
    return training_set[indexes[np.random.randint(0, len(indexes) - 1)]]

**Stacking is based on a simple idea: instead of using trivial functions (such as hard voting for classification or average for regression)to aggregate the predictions of all predictors in an ensemble, why don’t we train a model to perform this aggregation?**

**The simplest stacking model involves one layer of predictors and a single aggregating predictor on top called a blender. First, the training set is split in two subsets. The first subset is used to train the predictors in the first layer. Next, the second subset is used with the predictors to make clean predictions (predictors never saw the test instances) This results in n * m predictions where n is the number of predictors in the first layer and m is the size of the second subset. Finally, these predictions and m labels for the second subset are used to train the blender(m instances of n features each). After the training phase, to make a prediction for a new instance, the ensemble will be fed the new instance starting from the bottom layer which will result in a new n-features instance which will be fed to the blender to make a final prediction. It is also possible to create multiple layers of blenders up to a single final blender.**

**Scikit learn doesn't support Stacking out of the box but it's easy to create a custom implementation or use existing extensions such as the one in mlxtend modules**

In [3]:
classifiers = []

for i in range(0, 3):
    classifiers.append(DecisionTreeClassifier())

blender = DecisionTreeClassifier() # Obviously, the predictors and the blender can be different type of classifiers

stacking_clf = StackingClassifier(classifiers=classifiers, meta_classifier=blender)
stacking_clf.fit(training_set_tr, training_labels)

a_six = get_random_digit(training_set_tr, training_labels, 6)
print(f"StackingClassifier prediction is: {stacking_clf.predict([a_six])}")

StackingClassifier prediction is: [6]
