### 1. If you have trained five different models on the exact same training data and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?

Yes, we can.  We can try combining them into a voting ensemble, which often gives even better results.  It works best when the models vary widely, such as combining an SVM classifier, decision tree classifier, and logistic regression classifier.  It is also better when they are trained on different training instances.

### 2. What is the difference between a hard and soft voting classifier?

A hard voting classifier counts the votes of each classifier in the ensemble and picks the class that gets the most votes.  A soft voting classifier computes the average estimated class probability for each class and picks the class with the highest probability.  This gives high-confidence votes more weight and generally performs better, but only when we can estimate class probabilities from every classifier.  

### 3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, random forests, or stacking ensembles?

Yes.  We can speed up training of a bagging ensemble by distributing it across multiple servers, since each predictor in the ensemble is independent of the others.  Same with pasting and random forests.  In boosting, however, each predictor is built upon a previous predictor, so the training must be sequential.  In stacking ensembles, all predictors in a given layer are independent of each other, but the predictors in one layer can only be trained after the predictors in the previous layer have all been trained.  

### 4. What is the benefit of out-of-bag evaluation?

In out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that were held out.  Therefore, we are able to achieve an approximately unbiased evaluation of the ensemble with needing an extra validation set.  There are more instances available for training, so the ensmble can perform slightly better. 

### 5. What makes extra-trees more random than regular random forest? How can the extra randomness help? Are extra-trees slower or faster than regular random forests?

When we grow a tree in random forest, only a subset of features is considered for splitting at each node.  Same for extra-trees, but they use random thresholds for each feature instead of searching for the best possible threshold.  The extra randomness is a form of regularization: if a random forest overfits on the training data, an extra-tree might perform better.  Since they are not searching for the best possible thresholds, they tend to be much faster to train. 

### 6. If your AdaBoost ensemble underfits the training data, which hyperparameters should you tweak and how?

If we underfit with AdaBoost, we can try increasing the number of estimators or reducing the regularization hyperparamters of the base estimator.  We could also try increasing the learning rate.  

### 7. If a gradient boosting ensemble overfits the training data, should we increase or decrease the learning rate? 

We should try decreasing the learning rate.  We can also try early stopping to find the right number of predictors, and it's likely the case that we have too many. 

### 8. MNist dataset

1. Load the MNist dataset and split it into a training set, validation set, and test set.  

In [2]:
import numpy as np
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1)
mnist.target = mnist.target.astype(np.uint8)

In [6]:
from sklearn.model_selection import train_test_split
X_train_val, X_test, y_train_val, y_test = train_test_split(
    mnist.data, mnist.target, test_size=10000, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(
    X_train_val, y_train_val, test_size=10000, random_state=42)

2. Train various classifiers, such as random forest, extra-trees, and SVM. 

In [4]:
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.svm import LinearSVC

random_forest_clf = RandomForestClassifier(n_estimators=100, random_state=42)
extra_trees_clf = ExtraTreesClassifier(n_estimators=100, random_state=42)
svm_clf = LinearSVC(random_state=42)

In [7]:
estimators = [random_forest_clf, extra_trees_clf, svm_clf]
for estimator in estimators:
    print("Training the", estimator)
    estimator.fit(X_train, y_train)

Training the RandomForestClassifier(random_state=42)
Training the ExtraTreesClassifier(random_state=42)
Training the LinearSVC(random_state=42)




In [8]:
[estimator.score(X_val, y_val) for estimator in estimators]

[0.9692, 0.9715, 0.8695]

So it looks like the SVC is performing a lot worse than the two tree-based models.  We can remove it, but we will keep it for now. 

3. Try to combine them into an ensemble that outperforms each individual classifier on the validation set, using soft or hard voting.  Once you have found one, try it on the test set.  How much better does it perform compared to the individual classifiers?

In [9]:
from sklearn.ensemble import VotingClassifier

named_estimators = [
    ("random_forest_clf", random_forest_clf),
    ("extra_trees_clf", extra_trees_clf),
    ("svm_clf", svm_clf)
]

In [10]:
voting_clf = VotingClassifier(named_estimators)
voting_clf.fit(X_train, y_train)



VotingClassifier(estimators=[('random_forest_clf',
                              RandomForestClassifier(random_state=42)),
                             ('extra_trees_clf',
                              ExtraTreesClassifier(random_state=42)),
                             ('svm_clf', LinearSVC(random_state=42))])

In [11]:
voting_clf.score(X_val, y_val)

0.9699

In [12]:
[estimator.score(X_val, y_val) for estimator in voting_clf.estimators_]

[0.9692, 0.9715, 0.8695]

Let's try to remove the SVC and see if this improves our score:

In [13]:
del voting_clf.estimators_[2]
voting_clf.score(X_val, y_val)

0.9713

This is a little bit better, so obviously the SVC was hurting our performance.  Let's try a soft voting classifier:

In [14]:
voting_clf.voting = "soft"
voting_clf.score(X_val, y_val)

0.9719

This outperforms our hard classifier.  So let's try our model now on our test data:

In [15]:
voting_clf.score(X_test, y_test)

0.9681

In [16]:
[estimator.score(X_test, y_test) for estimator in voting_clf.estimators_]

[0.9645, 0.9691]

So the voting classifier actually did not improve over our best model. 

### 9. More MNist

Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image's class. Train a classifier on this new training set.

In [17]:
X_val_predictions = np.empty((len(X_val), len(estimators)), dtype=np.float32)

for index, estimator in enumerate(estimators):
    X_val_predictions[:, index] = estimator.predict(X_val)

In [18]:
rnd_forest_blender = RandomForestClassifier(n_estimators=200, oob_score=True, random_state=42)
rnd_forest_blender.fit(X_val_predictions, y_val)

RandomForestClassifier(n_estimators=200, oob_score=True, random_state=42)

In [19]:
rnd_forest_blender.oob_score_

0.9698

Congratulations, you have just trained a blender, and together with the classifiers they form a stacking ensemble! Now let's evaluate the ensemble on the test set. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get the ensemble's predictions. How does it compare to the voting classifier you trained earlier?

In [20]:
X_test_predictions = np.empty((len(X_test), len(estimators)), dtype=np.float32)

for index, estimator in enumerate(estimators):
    X_test_predictions[:, index] = estimator.predict(X_test)

In [21]:
y_pred = rnd_forest_blender.predict(X_test_predictions)

In [22]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.9665

So as we can see, this stacking ensemble does not perform as well as our previous voting classifier. 