1. If you have trained five different models on the exact same training data, and
they all achieve 95% precision, is there any chance that you can combine these
models to get better results? If so, how? If not, why?

can try combining them into a voting ensemble, which will often give you even
better results. It works better if the models are very different classifiers/regressors

2. What is the difference between hard and soft voting classifiers?

hard voting takes absolute mode for votes while soft is when probablities for classes are considered average estimated class probability for each class and picks the class with the
highest probability. This gives high-confidence votes more weight

3. Is it possible to speed up training of a bagging ensemble by distributing it across
multiple servers? What about pasting ensembles, boosting ensembles, Random
Forests, or stacking ensembles?

bagging can be distributed, similarly pasting as independent , boosting can not be as sequential improvement over weak learners, random fores yes, stacking yes blending layer can take from layers distributed over servers, but only after previous layer broadcast the results to the servers in the following layer

4. What is the benefit of out-of-bag evaluation?

prediction on unseen data

5. What makes Extra-Trees more random than regular Random Forests? How can
this extra randomness help? Are Extra-Trees slower or faster than regular Random
Forests?

extra trees have more randomness beacuse of selecting random threshholds for every node in the estimators (decision trees) instead of finding the best threshold for split. randomness this way trades high bias for low variance giving a model which will generalise way better. Yes faster as no time invesed in finding best threshold

6. If your AdaBoost ensemble underfits the training data, which hyperparameters
should you tweak and how?

increase alpha, <span style="color:red;">If your AdaBoost ensemble underfits the training data, you can try increasing the
number of estimators or reducing the regularization hyperparameters of the base
estimator. You may also try slightly increasing the learning rate.</span>

7. If your Gradient Boosting ensemble overfits the training set, should you increase
or decrease the learning rate?

decrease alpha, and find lesser number of learners. use early stopping

8. Load the MNIST data (introduced in Chapter 3), and split it into a training set, a
validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation,
and 10,000 for testing). Then train various classifiers, such as a Random
Forest classifier, an Extra-Trees classifier, and an SVM classifier. Next, try to combine
them into an ensemble that outperforms each individual classifier on the
validation set, using soft or hard voting. Once you have found one, try it on the
test set. How much better does it perform compared to the individual classifiers?

In [3]:
import numpy as np
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1)
mnist.target = mnist.target.astype(np.uint8)

In [5]:
mnist.data.shape

(70000, 784)

In [7]:
mnist.target.shape

(70000,)

In [11]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(mnist.data, mnist.target, random_state=24, test_size=10000)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(60000, 784)
(10000, 784)
(60000,)
(10000,)


In [12]:
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, random_state=24, test_size=10000)
print(x_train.shape)
print(x_val.shape)
print(y_train.shape)
print(y_val.shape)

(50000, 784)
(10000, 784)
(50000,)
(10000,)


In [13]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.svm import LinearSVC

rfc = RandomForestClassifier(n_estimators=100, random_state=24)
ext = ExtraTreesClassifier(n_estimators=100, random_state=24)
svc = LinearSVC(random_state=24)

In [15]:
est_list = [rfc, ext, svc]
for est in est_list:
    est.fit(x_train, y_train)



In [19]:
from sklearn.metrics import accuracy_score

for est in est_list:
    est_pred = est.predict(x_val)
    print(accuracy_score(est_pred, y_val))

0.9687
0.9698
0.835


In [20]:
from sklearn.ensemble import VotingClassifier

est_tups = [
    ('rfc', rfc),
    ('ext', ext),
    ('svc', svc)
]

vc = VotingClassifier(est_tups)

In [21]:
vc.fit(x_train, y_train)



VotingClassifier(estimators=[('rfc',
                              RandomForestClassifier(bootstrap=True,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     n_estimators=100,
                                                     n_jobs=None,
         

In [23]:
vc.score(x_val, y_val)

0.9682

In [24]:
[est.score(x_val, y_val) for est in vc.estimators_]

[0.9687, 0.9698, 0.835]

In [26]:
del vc.estimators_[2]

In [27]:
vc.score(x_val, y_val)

0.9697

In [29]:
vc.voting = "soft"

In [31]:
vc.score(x_val, y_val)

0.971

In [32]:
vc.score(x_test, y_test)

0.971

In [33]:
vc.voting = "hard"
vc.score(x_test, y_test)

0.97

9. Run the individual classifiers from the previous exercise to make predictions on
the validation set, and create a new training set with the resulting predictions:
each training instance is a vector containing the set of predictions from all your
classifiers for an image, and the target is the image’s class. Train a classifier on
this new training set. Congratulations, you have just trained a blender, and
together with the classifiers it forms a stacking ensemble! Now evaluate the
ensemble on the test set. For each image in the test set, make predictions with all
your classifiers, then feed the predictions to the blender to get the ensemble’s predictions.
How does it compare to the voting classifier you trained earlier?

In [34]:
preds = []
for est in vc.estimators_:
    preds.append(est.predict(x_val))
len(preds)

2

In [41]:
stack_data = np.vstack(preds).T
stack_data.shape

(10000, 2)

In [47]:
new_est_tups = [
    ('rfc', rfc),
    ('ext', ext)
]

blender_vc = VotingClassifier(new_est_tups)

In [48]:
blender_vc.fit(stack_data, y_val)

VotingClassifier(estimators=[('rfc',
                              RandomForestClassifier(bootstrap=True,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     n_estimators=100,
                                                     n_jobs=None,
         

In [46]:
preds_test = []
for est in vc.estimators_:
    preds_test.append(est.predict(x_test))
stack_data_test = np.vstack(preds_test).T
stack_data_test.shape

(10000, 2)

In [51]:
blender_vc.score(stack_data_test, y_test)

0.9708

worked little worse than earlier one which gave 97.1 while this gave 97.08 could be insignificant if they perform similar on production data