### 1. If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?
Yes, you can combine these models to potentially get better results by using an **ensemble method** like **voting classifiers** or **stacking**. Even if all models have the same precision individually, they might make errors on different samples. By combining their predictions (e.g., through majority voting), you can leverage their strengths and correct some of the individual errors, thus improving overall performance. 



### 2. What is the difference between hard and soft voting classifiers?
- **Hard Voting:** The ensemble predicts the class that receives the majority of votes from the individual classifiers. 
- **Soft Voting:** The ensemble averages the predicted probabilities for each class from all classifiers and selects the class with the highest average probability. 
- **Key Difference:** Soft voting can perform better if the classifiers are well-calibrated because it takes into account the confidence level of predictions.



### 3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, random forests, or stacking ensembles?
- **Bagging (and Pasting) Ensembles:** Yes, you can speed up training by distributing it across multiple servers because each model can be trained independently on a subset of data.
- **Boosting Ensembles:** Boosting is sequential (each model corrects the errors of the previous one), so it cannot be easily parallelized.
- **Random Forests:** Since Random Forests are a form of bagging, they can be distributed across multiple servers for faster training.
- **Stacking Ensembles:** The first layer (base classifiers) can be trained in parallel, but the blender (second layer) must be trained sequentially after the base classifiers are trained.



### 4. What is the benefit of out-of-bag evaluation?
**OOB evaluation** provides an unbiased estimate of the ensemble's performance without needing a separate validation set. This saves data, allowing more data to be used for training, and offers a quick way to check the model’s generalization performance.



### 5. What makes Extra-Trees more random than regular random forests? How can this extra randomness help? Are extra-trees slower or faster to train compared to regular random forests?
**Extra-Trees (Extremely Randomized Trees)** introduce extra randomness by:
- Choosing **random split thresholds** rather than selecting the best split.
- **Extra randomness** can help reduce variance, making the model less prone to overfitting, especially on noisy datasets.
- **Faster or Slower?:** Extra-Trees are generally faster because they do not have to search for the best split point; they randomly pick a split, which speeds up training.



### 6. If your AdaBoost ensemble underfits the training data, what hyperparameters should you tweak and how?
If your **AdaBoost** model underfits:
- **Increase the number of estimators (`n_estimators`)** to give the ensemble more models to improve the fit.
- **Increase the learning rate (`learning_rate`)** to make each classifier in the ensemble pay more attention to correcting errors. Be cautious, though, as increasing it too much can lead to overfitting.



### 7. If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate?
If your **Gradient Boosting** ensemble is overfitting, you should **decrease the learning rate**. A smaller learning rate makes the model learn more slowly, which can help avoid overfitting. You may need to compensate by increasing the number of trees (`n_estimators`) to maintain performance.



### 8. Load the MNIST data (introduced in Chapter 3), and split it into a training set, a validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation, and 10,000 for testing). Then train various classifiers, such as a Random Forest classifier, an Extra-Trees classifier, and an SVM. Next, try to combine them into an ensemble that outperforms them all on the validation set, using a soft or hard voting classifier. Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers?
Here's a step-by-step process for this:
1. **Load MNIST dataset** and split it into 50,000 training, 10,000 validation, and 10,000 test sets.
2. **Train individual classifiers**: Random Forest, Extra-Trees, and SVM.
3. **Combine them using hard or soft voting** to create an ensemble.
4. **Evaluate on the validation set**: Compare the ensemble’s performance to that of the individual classifiers.
5. **Test the ensemble on the test set**: You may see a small but noticeable improvement over the individual classifiers because the ensemble leverages different strengths.



In [3]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml
import pandas as pd

# Load the MNIST dataset and split it into 50000 training, 10000 validation and 10000 test images
def load_mnist():
    # Load the MNIST dataset
    mnist = fetch_openml('mnist_784', version=1, as_frame=False)

    # Split the dataset into training, validation and test sets
    X_train_val, X_test, y_train_val, y_test = train_test_split(
        mnist.data, mnist.target, test_size=10000, random_state=42)
    X_train, X_val, y_train, y_val = train_test_split(
        X_train_val, y_train_val, test_size=10000, random_state=42)

    return X_train, X_val, X_test, y_train, y_val, y_test

In [4]:
# Load the MNIST dataset
X_train, X_val, X_test, y_train, y_val, y_test = load_mnist()

In [5]:
# Train individual classifiers
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.svm import LinearSVC
from sklearn.neural_network import MLPClassifier

random_forest_clf = RandomForestClassifier(n_estimators=100, random_state=42)
extra_trees_clf = ExtraTreesClassifier(n_estimators=100, random_state=42)
svm_clf = LinearSVC(max_iter=100, tol=20, random_state=42)
mlp_clf = MLPClassifier(random_state=42)

In [6]:
estimators = [random_forest_clf, extra_trees_clf, svm_clf, mlp_clf]
for estimator in estimators:
    print("Training the", estimator)
    estimator.fit(X_train, y_train)

Training the RandomForestClassifier(random_state=42)
Training the ExtraTreesClassifier(random_state=42)
Training the LinearSVC(max_iter=100, random_state=42, tol=20)
Training the MLPClassifier(random_state=42)


In [7]:
[estimator.score(X_val, y_val) for estimator in estimators]

[0.9692, 0.9715, 0.0997, 0.9619]

In [8]:
from sklearn.ensemble import VotingClassifier

named_estimators = [
    ("random_forest_clf", random_forest_clf),
    ("extra_trees_clf", extra_trees_clf),
    ("svm_clf", svm_clf),
    ("mlp_clf", mlp_clf),
]

voting_clf = VotingClassifier(named_estimators)

In [9]:
voting_clf.fit(X_train, y_train)

In [10]:
voting_clf.score(X_val, y_val)

0.9719

In [11]:
[estimator.score(X_val, y_val) for estimator in voting_clf.estimators_]

[0.0, 0.0, 0.0, 0.0]

In [12]:
voting_clf.set_params(svm_clf=None)

In [13]:
voting_clf.estimators

[('random_forest_clf', RandomForestClassifier(random_state=42)),
 ('extra_trees_clf', ExtraTreesClassifier(random_state=42)),
 ('svm_clf', None),
 ('mlp_clf', MLPClassifier(random_state=42))]

In [14]:
voting_clf.estimators_

[RandomForestClassifier(random_state=42),
 ExtraTreesClassifier(random_state=42),
 LinearSVC(max_iter=100, random_state=42, tol=20),
 MLPClassifier(random_state=42)]

In [15]:
del voting_clf.estimators_[2]

In [16]:
voting_clf.score(X_val, y_val)

0.9734

In [17]:
voting_clf.voting = "soft"

In [18]:
voting_clf.score(X_val, y_val)

0.9693

In [19]:
voting_clf.voting = "hard"
voting_clf.score(X_test, y_test)

0.9714

In [20]:
[estimator.score(X_test, y_test) for estimator in voting_clf.estimators_]

[0.0, 0.0, 0.0]

### 9. Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image’s class. Train a classifier on this new training set. Congratulations, you have just trained a blender, and together with the classifiers, it forms a stacking ensemble! Now let’s evaluate the ensemble on the test set. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get the ensemble’s predictions. How does it compare to the voting classifier you trained earlier?
1. **Run the individual classifiers** on the validation set and collect their predictions.
2. **Create a new training set**: Each instance is a vector of predictions from all classifiers, and the target is the actual class of the image.
3. **Train a blender** on this new training set.
4. **Evaluate on the test set**: For each test image, get predictions from all base classifiers, then feed them to the blender for the final prediction.
5. **Comparison:** The stacking ensemble may outperform the voting classifier because the blender can learn patterns about which classifiers perform best for certain types of inputs.


In [21]:
# Import required libraries
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the MNIST dataset
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist["data"], mnist["target"]

# Split the data into train, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.2, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Train individual classifiers
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)
ext_clf = ExtraTreesClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(probability=True, random_state=42)

rnd_clf.fit(X_train, y_train)
ext_clf.fit(X_train, y_train)
svm_clf.fit(X_train, y_train)

# Make predictions on the validation set
rnd_val_pred = rnd_clf.predict(X_val)
ext_val_pred = ext_clf.predict(X_val)
svm_val_pred = svm_clf.predict(X_val)

# Combine predictions to create a new training set for the blender
X_blender_train = np.column_stack((rnd_val_pred, ext_val_pred, svm_val_pred))
y_blender_train = y_val

# Train the blender (e.g., a Logistic Regression model)
blender = LogisticRegression(random_state=42)
blender.fit(X_blender_train, y_blender_train)

# Make predictions on the test set using the base classifiers
rnd_test_pred = rnd_clf.predict(X_test)
ext_test_pred = ext_clf.predict(X_test)
svm_test_pred = svm_clf.predict(X_test)

# Combine predictions for the test set
X_blender_test = np.column_stack((rnd_test_pred, ext_test_pred, svm_test_pred))

# Use the blender to make final predictions
y_pred = blender.predict(X_blender_test)

# Evaluate the stacking ensemble
accuracy = accuracy_score(y_test, y_pred)
print(f"Stacking Ensemble Accuracy: {accuracy:.4f}")


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Stacking Ensemble Accuracy: 0.9661
