# Exersize 8
# Task

Load the MNIST data (introduced in Chapter 3), and split it into a training set, a
validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation,
and 10,000 for testing). Then train various classifiers, such as a Random
Forest classifier, an Extra-Trees classifier, and an SVM classifier. Next, try to combine
them into an ensemble that outperforms each individual classifier on the
validation set, using soft or hard voting. Once you have found one, try it on the
test set. How much better does it perform compared to the individual classifiers?

# Getting and splitiing the data

In [None]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml("mnist_784", version=1)

In [4]:
mnist.keys()

dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])

In [11]:
x_train, x_val, x_test = mnist["data"][:50000], mnist["data"][50000 : 60000], mnist["data"][60000:]
y_train, y_val, y_test = mnist["target"][:50000], mnist["target"][50000 : 60000], mnist["target"][60000:]

# Training classifiers, without any particular calibration

In [17]:
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier

In [31]:
svm_clf = SVC(probability=True)
svm_clf.fit(x_train, y_train)

In [32]:
forest_clf = RandomForestClassifier(max_depth=4)
forest_clf.fit(x_train, y_train)

In [33]:
extra_clf = ExtraTreesClassifier(max_depth=3)
extra_clf.fit(x_train, y_train)

In [34]:
from sklearn.metrics import accuracy_score

y_hat = svm_clf.predict(x_val)
accuracy = accuracy_score(y_val, y_hat)
print("SVM accuracy : {}".format(accuracy))

SVM accuracy : 0.9802


In [35]:
y_hat = forest_clf.predict(x_val)
accuracy = accuracy_score(y_val, y_hat)
print("Forest accuracy : {}".format(accuracy))

Forest accuracy : 0.8248


In [36]:
y_hat = extra_clf.predict(x_val)
accuracy = accuracy_score(y_val, y_hat)
print("ExtraForest accuracy : {}".format(accuracy))

ExtraForest accuracy : 0.7473


In [37]:
from sklearn.ensemble import VotingClassifier

svm_clf = SVC(probability=True, n_jobs=-1)
forest_clf = RandomForestClassifier(max_depth=4, n_jobs=-1)
extra_clf = ExtraTreesClassifier(max_depth=3, n_jobs=-1)

hard_voting_clf = VotingClassifier(estimators=[("svm", svm_clf),
                                              ("forest", forest_clf),
                                              ("extra", extra_clf)],
                                  voting="hard")

hard_voting_clf.fit(x_train, y_train)

In [38]:
svm_clf = SVC(probability=True, n_jobs=-1)
forest_clf = RandomForestClassifier(max_depth=4, n_jobs=-1)
extra_clf = ExtraTreesClassifier(max_depth=3, n_jobs=-1)

soft_voting_clf = VotingClassifier(estimators=[("svm", svm_clf),
                                              ("forest", forest_clf),
                                              ("extra", extra_clf)],
                                  voting="soft")

soft_voting_clf.fit(x_train, y_train)

In [39]:
y_hat = hard_voting_clf.predict(x_val)
accuracy = accuracy_score(y_val, y_hat)
print("HardVoting accuracy : {}".format(accuracy))

HardVoting accuracy : 0.847


In [40]:
y_hat = soft_voting_clf.predict(x_val)
accuracy = accuracy_score(y_val, y_hat)
print("SoftVoting accuracy : {}".format(accuracy))

SoftVoting accuracy : 0.9788


# Exersize 9
# Task

Run the individual classifiers from the previous exercise to make predictions on
the validation set, and create a new training set with the resulting predictions:
each training instance is a vector containing the set of predictions from all your
classifiers for an image, and the target is the image’s class. Train a classifier on
this new training set. Congratulations, you have just trained a blender, and
together with the classifiers it forms a stacking ensemble! Now evaluate the
ensemble on the test set. For each image in the test set, make predictions with all
your classifiers, then feed the predictions to the blender to get the ensemble’s predictions.
How does it compare to the voting classifier you trained earlier?

In [41]:
svm_clf = SVC(probability=True, n_jobs=-1)
forest_clf = RandomForestClassifier(max_depth=4, n_jobs=-1)
extra_clf = ExtraTreesClassifier(max_depth=3, n_jobs=-1)

In [43]:
svm_clf.fit(x_train, y_train)
y_hat_svm = svm_clf.predict(x_val)

In [44]:
forest_clf.fit(x_train, y_train)
y_hat_forest = forest_clf.predict(x_val)

In [45]:
extra_clf.fit(x_train, y_train)
y_hat_extra = extra_clf.predict(x_val)

In [47]:
import numpy as np

x_train_new = np.c_[y_hat_svm, y_hat_forest, y_hat_extra]

In [61]:
from sklearn.tree import DecisionTreeClassifier

final_forest_clf = rnd_forest_blender = RandomForestClassifier(n_estimators=200, oob_score=True, random_state=42)
final_forest_clf.fit(x_train_new, y_val)

In [62]:
y_hat_final = final_forest_clf.predict(x_train_new)
accuracy_score(y_val, y_hat_final)

0.9823