## Ex 1: If you have trained 5 different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these mdoels to get better results? If so, how? If not, why?

Yes! As long as the models are making different sorts of errors then combining the models using an ensemble voting classifier will likely reduce the errors due to each model and will likely result in a higher precision. 

## Ex 2: What is the difference between hard and soft voting classifiers?

They are both examples of ensemble voting classifiers. Hard voting classifiers use majority vote to determine the predicted class output from the ensemble with each predictor in the ensemble given equal voting weight. Soft voting classifiers take account of the confidence that the predictors have for each instance. This is done by predicting the class with the highest probability averaged over all the classifiers.

## Ex 3: Is it possible to speed up training of a bagging ensemble by distributing it accross multiple servers? What about pasting ensembles, boosting ensembles, random forrests, or stacking ensembles? 

Bagging training can be distributed. Individual predictors in the ensemble are parrallel so each individual predictor could be trained on a different server. Similarly for pasting ensembles and random forrests.
Boosting ensembles could not be trained on multiple servers because the individual predictors are sequential. In other words, each model in the ensemble relies on the output from the previous model.
Stacking ensembles could partially be partially trained in a distributed manners. Within each layer of the stack the constituent models are parallel so these could be trained using multiple servers.

## Ex 4: What is the benefit of out-of-bag evaluation?

Since a predictor never sees the out-of-bag instances during training, it can be evaluated on these instances without the need for a separate validation set or cross-validation. OOB evaluation can be used to evaluate the ensemble itself by averaging the OOB evaluations for each predictor.

## Ex 5: What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slow or faster than regular Random Forests?

Random thresholds are used for each feature rather than searching for the best possible threshold. This extra randomness reduces variance at the cost of increasing bias. Extra-Trees are faster because finding the best threshold for each feature at every node is one of the most time consuming processes of training a random forrest.

## Ex 6: If your AdaBoost ensemble underfits the training data, what hyperparameters should you tweak and how?

- Increase max number of estimators. More estimators will mean a closer fit to the training set.
- Increase learning rate. Increases the weight of new estimators.
- Reduce regularisation of the base estimator. Allows the base estimator to produce a closer fit to the training set.

## Ex 7: If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate? 

You should decrease the learning rate (known as shrinkage) as this will usually result in predictions that generalise better because the impact of each individual estimator is reduced.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC, SVC
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, VotingClassifier
from sklearn.metrics import accuracy_score

In [2]:
# Get the mnist data
mnist = datasets.fetch_openml('mnist_784', version=1, cache=True)
X, y = mnist["data"], mnist["target"]

In [3]:
# Split data into three: 10k test, 10k validation, 50k training
X_train_val, X_test, y_train_val, y_test = train_test_split(
    X, y, random_state=42, test_size=10000
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train_val, y_train_val, random_state=42, test_size=10000
)

In [4]:
X_train.shape

(50000, 784)

# Train various classifiers:
- random forest
- extra trees
- svm

In [5]:
rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)



RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=10,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

In [6]:
et_clf = ExtraTreesClassifier()
et_clf.fit(X_train, y_train)



ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
                     max_depth=None, max_features='auto', max_leaf_nodes=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
                     oob_score=False, random_state=None, verbose=0,
                     warm_start=False)

In [7]:
# sv_clf = LinearSVC()
sv_clf = SVC(probability=True)
sv_clf.fit(X_train, y_train)



KeyboardInterrupt: 

# Save models

In [None]:
from joblib import dump, load

In [None]:
dump(rf_clf, "rf_clf.joblib")
dump(et_clf, "et_clf.joblib")
dump(sv_clf, "sv_clf.joblib")

# Load models

In [10]:
rf_clf = load("saved_models/rf_clf.joblib")
et_clf = load("saved_models/et_clf.joblib")
sv_clf = load("saved_models/sv_clf.joblib")

# Compare scores of individual classifiers

In [12]:
# Random forrest
rf_clf.score(X_val, y_val)
# y_pred = rf_clf.predict(X_val)
# accuracy_score(y_val, y_pred)

0.9456

In [13]:
# Extra trees
et_clf.score(X_val, y_val)

0.9493

In [14]:
# SVM
sv_clf.score(X_val, y_val)

0.1158

# Combine into ensemble
- try both soft and hard voting

In [16]:
named_models = [
    ("rf_clf", rf_clf),
    ("et_clf", et_clf),
    ("sv_clf", sv_clf),
]

voting_clf = VotingClassifier(
    estimators=named_models,
    voting="soft"
)
voting_clf.fit(X_train, y_train)



VotingClassifier(estimators=[('rf_clf',
                              RandomForestClassifier(bootstrap=True,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     n_estimators=10,
                                                     n_jobs=None,
       

In [17]:
voting_clfing_clf.score(X_val, y_val)

AttributeError: predict_proba is not available when  probability=False

In [None]:
# Try best on the test set