# Exercises

8. Load the MNIST dataset (introduced in Chapter 3), and split it into a training set, a validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation, and 10,000 for testing). Then train various classifiers, such as a random forest classifier, an extra-trees classifier, and an SVM classifier. Next, try to combine them into an ensemble that outperforms each individual classifier on the validation set, using soft or hard voting. Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers?


9. Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image’s class. Train a classifier on this new training set. Congratulations—you have just trained a blender, and together with the classifiers it forms a stacking ensemble! Now evaluate the ensemble on the test set. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get the ensemble’s pre‐ dictions. How does it compare to the voting classifier you trained earlier? Now try again using a StackingClassifier instead. Do you get better performance? If so, why?

In [54]:
from sklearn.datasets import fetch_openml
from sklearn.ensemble import ExtraTreesClassifier,VotingClassifier,RandomForestClassifier
from sklearn.svm import SVC
import warnings
warnings.filterwarnings('ignore')

In [55]:
mnist = fetch_openml('mnist_784',as_frame=False)

In [30]:
X = mnist.data
y = mnist.target
X_train,X_valid,X_test,y_train,y_valid,y_test = X[:50000],X[50000:60000],X[60000:],y[:50000],y[50000:60000],y[60000:]

In [31]:
X_train

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [32]:
vote_clf = VotingClassifier(
        estimators=[('rnd',RandomForestClassifier(random_state=42,n_jobs=-1)),
                    ('extra_tree',ExtraTreesClassifier(random_state=42,n_jobs=-1)),
                   ('svc',SVC(probability=True,random_state=42))],n_jobs=-1)

# vote_clf.fit(X_train,y_train)

In [33]:
vote_clf.fit(X_train,y_train)

In [34]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
y_valid_encoded = encoder.fit_transform(y_valid)

for name,clf in vote_clf.named_estimators_.items():
    print(name,'=',clf.score(X_valid,y_valid_encoded))

rnd = 0.9736
extra_tree = 0.9743
svc = 0.9802


In [35]:
vote_clf.score(X_valid,y_valid)

0.9778

In [36]:
vote_clf.voting='soft'
vote_clf.fit(X_train,y_train)
vote_clf.score(X_valid,y_valid)

0.9813

# 2

In [49]:
from sklearn.neural_network import MLPClassifier
import numpy as np
from sklearn.metrics import accuracy_score

In [39]:
random_clf = RandomForestClassifier(random_state=42,n_jobs=-1)
extra_tree_clf = ExtraTreesClassifier(random_state=42,n_jobs=-1)
mlp_clf = MLPClassifier(random_state=42)

In [40]:
classifiers = [random_clf,extra_tree_clf,mlp_clf]
for classifier in classifiers:
    classifier.fit(X_train,y_train)

In [41]:
predictions = [classifier.predict(X_valid) for classifier in classifiers]

In [44]:
preds = np.vstack(predictions).T

In [45]:
random1 = RandomForestClassifier(random_state=42,n_jobs=-1)
random1.fit(preds,y_valid)

In [46]:
test_predictions = [classifier.predict(X_test) for classifier in classifiers]
test_preds = np.vstack(test_predictions).T

In [47]:
random1.predict(test_preds)

array(['7', '2', '1', ..., '4', '5', '6'], dtype=object)

In [50]:
accuracy_score(y_test,random1.predict(test_preds))

0.97

In [51]:
X_train1, y_train1 = X[:60000], y[:60000]

In [52]:
from sklearn.ensemble import StackingClassifier
stacking_clf = StackingClassifier(
        estimators=[('rnd',RandomForestClassifier(random_state=42,n_jobs=-1)),
                    ('extra_tree',ExtraTreesClassifier(random_state=42,n_jobs=-1)),
                   ('mlp',MLPClassifier(random_state=42))],
        final_estimator=RandomForestClassifier(random_state=43), cv=5 )
    
stacking_clf.fit(X_train1, y_train1)


In [53]:
stacking_clf.score(X_test,y_test)

0.9779