<a href="https://colab.research.google.com/github/MCn21thCntry/Practical-Machine-Learning---from-the-rooter-to-the-tooter/blob/main/Module_6_Random_Forests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
dansbecker_home_data_for_ml_course_path = kagglehub.dataset_download('dansbecker/home-data-for-ml-course')

print('Data source import complete.')


## Module 6: Ensemble Methods: Decision Trees, Bagging, Pasting, and Random Forests - A Deep Dive for Practical Machine Learning

**Welcome to Module 6!** This module offers a comprehensive exploration of Ensemble Methods in Machine Learning, focusing on the practical implementation and comparative analysis of Decision Trees, Bagging, Pasting, and Random Forests. We will systematically build our understanding, starting from the fundamental Decision Tree algorithm and progressing to more sophisticated ensemble techniques.

**Module Objectives:**

By the end of this module, you will be able to:

1.  **Understand Ensemble Learning:**  Grasp the core concept of ensemble learning and its advantages in improving model performance and robustness.
2.  **Implement Decision Tree Baselines:**  Create and evaluate single Decision Tree Regressor and Classifier models as baseline references for performance comparison.
3.  **Master Bagging and Pasting:** Implement and analyze Bagging and Pasting ensembles for both regression and classification tasks, understanding their sampling methodologies and performance characteristics.
4.  **Explore Random Forests:**  Delve into Random Forests, understanding their unique combination of Bagging and feature randomness, and implement them for regression and classification.
5.  **Tune Random Forest Hyperparameters:**  Learn how to tune key hyperparameters of Random Forests, such as `n_estimators` and `max_depth`, to optimize model performance.
6.  **Compare Ensemble Methods:**  Systematically compare the performance of Decision Trees, Bagging, Pasting, and Random Forests across regression and classification tasks, using appropriate evaluation metrics and visualizations.
7.  **Analyze Feature Importance:**  Utilize Random Forests to gain insights into feature importance and understand the relevance of different features in predictive modeling.
8.  **Evaluate Pros and Cons:**  Critically assess the advantages and disadvantages of ensemble methods, considering factors like accuracy, interpretability, computational cost, and complexity.
9.  **Appreciate Advanced Ensemble Concepts:**  Gain a foundational awareness of more advanced ensemble techniques like Boosting and Stacking, and explore future directions in ensemble learning research.

**Module Structure:**

This module is structured in a step-by-step manner, progressing from simpler to more complex concepts and models:

*   **Introduction to Ensemble Learning:** Setting the stage for why combining models is powerful.
*   **Step 0: Decision Tree Baselines:** Establishing performance benchmarks using single Decision Tree Regressors and Classifiers.
*   **Step 1: Bagging Ensembles:** Exploring Bagging for both Regression and Classification, and comparing performance to baselines.
*   **Step 2: Pasting Ensembles:** Investigating Pasting for Regression and Classification, comparing performance to baselines and Bagging.
*   **Step 3: Random Forest Ensembles:** Implementing and evaluating Random Forests for Regression and Classification, comparing performance to baselines, Bagging, and Pasting.
*   **Step 4: Hyperparameter Tuning for Random Forests:**  Learning to tune `n_estimators` and `max_depth` to optimize Random Forest models, with visual analysis of tuning effects.
*   **Step 5: Comprehensive Model Performance Comparison:**  A direct, side-by-side comparison of all models across both regression and classification tasks, using tables and plots to visualize performance differences.
*   **Step 6: Unveiling Feature Importance in Random Forests:**  Exploring and visualizing feature importances to understand model insights and data characteristics.
*   **Step 7: Critical Evaluation: Pros and Cons of Ensemble Methods:**  Discussing the strengths and limitations of ensemble methods in practical scenarios.
*   **Step 8: Expanding Horizons: Advanced Techniques and Future Directions:**  Briefly introducing more advanced ensemble methods and emerging research areas.
*   **Step 9: Conclusion and Summary:**  Recap of key learnings and preparation for the next module.

Let's begin our journey into the world of Ensemble Methods, starting with Decision Tree baselines!


## Ensemble Learning: A Detailed Overview

Ensemble learning is a powerful machine learning technique that combines the predictions of multiple individual models (often called "base learners" or "weak learners") to produce a more accurate and robust prediction than any single model could achieve alone.  Think of it as leveraging the **"wisdom of the crowd"** in the realm of machine learning.  Instead of relying on a single expert, we consult a diverse group of experts and aggregate their opinions to make a better decision.

**In detail, Ensemble Learning works by:**

1.  **Training Multiple Base Learners:**  Creating a set of diverse individual models. This diversity is crucial and can be achieved through various methods such as:
    *   **Different Algorithms:** Using different types of machine learning algorithms (e.g., Decision Trees, Linear Regression, Neural Networks) as base learners.
    *   **Different Training Data Subsets:**  Training each base learner on a different subset of the original training data (e.g., Bagging, Pasting).
    *   **Different Feature Subsets:**  Training each base learner on a different subset of the features (e.g., Random Subspace).
    *   **Different Initializations:** For algorithms sensitive to initial conditions (like Neural Networks), using different random initializations.

2.  **Combining Predictions:**  Developing a strategy to aggregate the predictions of all the base learners. Common aggregation methods include:
    *   **Voting (for Classification):**  Each base learner "votes" for a class, and the class with the majority vote is the final prediction.  Can be *hard voting* (majority class) or *soft voting* (average predicted probabilities).
    *   **Averaging (for Regression):**  Simply averaging the predictions of all base learners.
    *   **Weighted Averaging:**  Assigning weights to each base learner based on its performance and then calculating a weighted average of predictions.
    *   **Stacking (Stacked Generalization):** Training a "meta-learner" on the predictions of the base learners to learn the optimal way to combine them.

**When to Use Ensemble Learning:**

Ensemble learning is particularly beneficial and often recommended in the following situations:

*   **High Accuracy is Critical:** When you need to achieve the highest possible predictive accuracy, ensemble methods often outperform single models, especially for complex problems.
*   **Reducing Overfitting:**  Ensembles, particularly methods like Random Forests and Boosting, are effective at reducing overfitting and improving generalization to unseen data.
*   **Handling Complex Datasets:**  When dealing with datasets that are high-dimensional, noisy, or have complex relationships between features and the target variable, ensembles can capture more intricate patterns.
*   **Robustness and Stability:** Ensembles are generally more robust to noise and outliers in the data compared to single models, as errors from individual learners tend to cancel each other out.
*   **Competitive Machine Learning:** In many machine learning competitions, ensemble methods are frequently used to achieve top rankings due to their superior performance.
*   **Better Generalization:** Ensembles tend to generalize better to unseen data because they average out the errors of individual models, leading to more consistent performance

**Pros and Cons of Ensemble Learning:**

| **Pros**                                  | **Cons**                                      |
|-------------------------------------------|-----------------------------------------------|
| **Higher Accuracy:** Often achieves better predictive performance than single models. | **Increased Complexity:** Ensembles are generally more complex to understand and implement than single models. |
| **Improved Robustness:** More stable and less sensitive to noisy data or outliers. | **Higher Computational Cost:** Training and prediction can be more computationally expensive, especially for large ensembles or complex base learners. |
| **Better Generalization:** Reduces overfitting and improves performance on unseen data. | **Reduced Interpretability:** Ensembles can be "black boxes," making it harder to understand *why* a particular prediction is made compared to simpler models (though feature importance techniques can help). |
| **Versatility:** Can be applied to various types of data and machine learning tasks (classification, regression, etc.). | **Potential for Over-Engineering:**  Careful tuning and selection of base learners and combination methods are needed to avoid unnecessary complexity or decreased performance. |
| **Reduced Variance and Bias:**  Addresses both types of errors in machine learning models. | **Requires More Data (sometimes):**  Some ensemble methods might benefit from larger datasets to train diverse and effective base learners. |

**Table of Ensemble Models:**

| Ensemble Model         | Description                                                                      | Pros                                                                                                                               | Cons                                                                                                                              |
|--------------------------|----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
| **Bagging (Bootstrap Aggregating)** | Trains multiple instances of the same base learner on bootstrapped subsets of the training data and averages their predictions. | Simple to implement, reduces variance significantly, improves stability and robustness, parallelizable training.                         | Can slightly increase bias if base learners are already very complex, may not significantly improve performance if base learners are already strong. |
| **Pasting**              | Similar to Bagging, but samples are drawn without replacement.                      | Less randomness than Bagging, can sometimes outperform Bagging, potentially faster training in some cases.                         | May not reduce variance as effectively as Bagging, can still be sensitive to overfitting if base learners are complex.               |
| **Random Forests**        | An extension of Bagging that introduces further randomness by also selecting random subsets of features at each node split in decision trees. | Very high accuracy, robust to outliers and noise, reduces variance and overfitting effectively, provides feature importance estimates. | Less interpretable than single decision trees, can be computationally expensive for very large datasets or many trees, hyperparameter tuning can be important. |
| **Boosting (Adaptive Boosting - AdaBoost)** | Sequentially trains base learners, where each learner attempts to correct the mistakes of its predecessors. Weights are adjusted to focus on misclassified instances. | High accuracy, effective at reducing bias and variance, relatively simple to implement, can work well with weak learners.     | Sensitive to noisy data and outliers, can overfit if base learners are too complex or training data is too small, can be slow for very large datasets. |
| **Gradient Boosting Machines (GBM)** | Similar to AdaBoost but uses gradient descent to minimize a loss function in a stage-wise additive manner. Typically uses decision trees as base learners. | Very high accuracy, flexible loss functions, effective at handling complex relationships, robust to outliers and missing data (to some extent). | Can be prone to overfitting if not tuned properly, computationally intensive to train and tune, hyperparameter tuning is crucial. |
| **XGBoost (Extreme Gradient Boosting)** | An optimized and highly efficient implementation of Gradient Boosting, with regularization and parallel processing capabilities. | State-of-the-art performance, very fast and efficient, regularized to prevent overfitting, handles missing data well, feature importance. | More complex to understand and tune than simpler ensembles, can still overfit if not carefully tuned, requires careful hyperparameter optimization. |
| **LightGBM (Light Gradient Boosting Machine)** | Another highly efficient Gradient Boosting framework that uses tree-based learning algorithms and optimized techniques for speed and memory efficiency. | Very fast and efficient, lower memory consumption, high accuracy, handles large datasets well, supports categorical features directly.   | Can be more sensitive to overfitting with small datasets compared to XGBoost, hyperparameter tuning is still important for optimal performance. |
| **CatBoost (Categorical Boosting)** | A Gradient Boosting algorithm that excels at handling categorical features directly and robustly, without extensive preprocessing. | Excellent at handling categorical features natively, robust and accurate, good out-of-the-box performance, feature importance.     | Can be slower than LightGBM or XGBoost in some cases, may require more computational resources for very large datasets, hyperparameter tuning can still be beneficial. |
| **Stacking (Stacked Generalization)** | Trains a meta-learner to combine the predictions of diverse base learners. The meta-learner learns the optimal way to weight or combine the base learner predictions. | Can achieve very high accuracy by leveraging the strengths of different base learners, flexible in terms of base learners and meta-learner choice. | More complex to implement and tune than simpler ensembles, prone to overfitting if the meta-learner is too complex or training data is insufficient, computationally expensive. |

**Bagging (Bootstrap Sampling):**

Bagging uses bootstrap sampling, which means it draws samples from the original training data with replacement.

With replacement means that when a data point is selected for a subset, it's put back into the original dataset, so it has a chance of being selected again for the same subset or other subsets.
This results in subsets that are the same size as the original training data but contain some repeated data points and potentially leave out some data points in each subset.

**Pasting (Sampling Without Replacement):**

Pasting, on the other hand, draws samples from the original training data without replacement.

Without replacement means that once a data point is selected for a subset, it's not put back into the original dataset. This prevents the same data point from appearing multiple times in the same subset.
Each subset created in pasting contains unique data points from the training set, and generally each subset is smaller than the original training set.

**Bootstrapped Subsets:** In machine learning, bootstrapping is a resampling technique used to create multiple subsets of data from a single original dataset. These subsets are called bootstrapped subsets or bootstrap samples.

This overview provides a comprehensive understanding of ensemble learning, its benefits, drawbacks, and a range of popular ensemble models. Choosing the right ensemble method depends on the specific problem, dataset characteristics, and desired trade-off between accuracy, complexity, and computational cost.

## Decision Trees vs. Random Forests: Key Differences

The fundamental difference between **Random Forests** and **Decision Trees** is that:

*   **Decision Tree:** Employs a **single tree-like model**.
* https://youtu.be/_L39rN6gz7Y?si=rFUeyJkguETXThx3
*   **Random Forest:** Utilizes a **"forest" of many Decision Trees** working in concert.
*https://youtu.be/J4Wdy0Wc_xQ?si=ZP_Q8TiTSdXbt5Xc

Let's delve into a more detailed comparison:

**1. Number of Models:**

*   **Decision Tree:** Operates with **a solitary decision tree** to generate predictions, deriving a single set of rules from the training data.
*   **Random Forest:** Comprises an **ensemble of multiple decision trees** (typically hundreds to thousands), collectively contributing to the final prediction.

**2. Training Process and Data Utilization:**

*   **Decision Tree:** Trained on the **entire training dataset**, aiming to construct a single, intricate tree that closely fits the training data.
*   **Random Forest:** Each Decision Tree within a Random Forest is trained on a **distinct bootstrap sample** of the training data. **Bootstrap sampling** involves training each tree on a random subset of the original data, sampled *with replacement*. This process may result in some data points being repeated in a tree's training set, while others are excluded.

**3. Feature Selection at Each Split:**

*   **Decision Tree:** Considers **all available features** when determining node splits, seeking the optimal feature and split point to maximize information gain or minimize impurity.
*   **Random Forest:** At each node split within each tree, only a **random subset of features** is considered. The algorithm randomly selects a few features and then identifies the best split among *those* selected features, introducing a key element of randomness beyond data sampling.

**4. Overfitting Tendency:**

*   **Decision Tree:** Exhibits a high **prone to overfitting**, particularly with deep trees. Single decision trees can memorize training data patterns too well, leading to poor generalization on new data.
*   **Random Forest:** Significantly **reduces overfitting**. By averaging predictions from numerous trees trained on diverse data subsets and with feature randomness, Random Forests achieve greater robustness and better generalization.

**5. Variance and Bias:**

*   **Decision Tree:** Characterized by **low bias and high variance**. It can model complex relationships (low bias) but is highly sensitive to training data variations (high variance).
*   **Random Forest:** Achieves **lower variance** than individual decision trees, while maintaining comparable or even lower bias. Bagging and feature randomness are key to variance reduction without increasing bias.

**6. Performance (Accuracy/Generalization):**

*   **Decision Tree:** Can perform adequately but is generally **less accurate and less robust** than Random Forests, especially on complex datasets or when generalization is critical.
*   **Random Forest:** Known for being **more accurate and robust**, consistently outperforming single decision trees due to their ensemble nature and randomness.

**7. Interpretability:**

*   **Decision Tree:** **Highly interpretable**. Decision rules are easily visualized and understood by tracing tree paths.
*   **Random Forest:** **Less interpretable** as a whole, acting more as a "black box." However, they offer **feature importance** measures, providing some insight into feature influence.

**8. Computational Cost:**

*   **Decision Tree:** **Faster to train and predict** due to its simplicity as a single tree.
*   **Random Forest:** **More computationally intensive** due to the training and aggregation of many trees. The performance gains, however, often justify the added cost.


**Summary Table:**

| Feature             | Decision Tree                                 | Random Forest                                      |
|----------------------|-----------------------------------------------|----------------------------------------------------|
| Number of Models    | One                                            | Many (Ensemble)                                   |
| Training Data       | Entire training set                             | Bootstrap samples of training set                 |
| Feature Selection   | All features considered at each split         | Random subset of features considered at each split |
| Overfitting         | High tendency to overfit                        | Low tendency to overfit                           |
| Variance            | High                                            | Low                                                |
| Bias                | Low                                             | Low (similar to Decision Tree)                     |
| Accuracy            | Lower                                           | Higher                                             |
| Interpretability    | High                                            | Lower (but provides feature importance)             |
| Computational Cost | Lower                                           | Higher                                             |

**Use Cases:**

*   **Decision Tree:** Ideal for scenarios prioritizing **interpretability** and simplicity, or when computational resources are limited.
*   **Random Forest:** Preferred when **accuracy and robustness** are key, offering high performance and generalization, suitable as a go-to model for many ML problems despite higher computational demands.

Let's clarify how **Decision Trees** and **Random Forests** handle training and validation data, as their approaches and purposes differ significantly due to their fundamental nature:

**Decision Trees:**

*   **Training Data:**
    *   A single Decision Tree is trained using the **entire training dataset**.
    *   The algorithm examines all features in the training data to determine the optimal splits at each node of the tree.
    *   The goal during training is to build a tree structure that effectively learns the patterns and relationships within the training data to predict the target variable.
    *   The training process continues until certain stopping criteria are met (e.g., maximum tree depth, minimum samples per leaf node, or perfect classification/regression on the training subset at a node).

*   **Validation Data:**
    *   Validation data is **not directly used during the training process** of a Decision Tree.
    *   After the Decision Tree is fully trained on the training data, the **validation data is used to evaluate the model's performance on unseen data.**
    *   This evaluation is crucial for:
        *   **Assessing Generalization:** To estimate how well the trained Decision Tree will perform on new, data it has not encountered during training.
        *   **Detecting Overfitting:**  If a Decision Tree performs very well on the training data but poorly on the validation data, it's a strong indication of overfitting. The tree has likely memorized the training data rather than learning generalizable patterns.
        *   **Potentially for Pruning (though less common in basic examples):** In some advanced scenarios, validation data might be used to guide tree pruning techniques. Pruning aims to simplify the tree by removing branches that do not improve performance on the validation set, helping to reduce overfitting.

**In essence, for a Decision Tree, the training data is for *building* the model, and the validation data is for *testing* and *evaluating* its generalization ability after it's built.**

**Random Forests:**

*   **Training Data:**
    *   A Random Forest, being an ensemble method, trains **multiple Decision Trees**.
    *   Each individual Decision Tree in the Random Forest is trained on a **different subset of the training data**. This subset is typically created using **bootstrap sampling** (sampling with replacement from the training data). In Pasting, it's sampling without replacement.
    *   Because each tree is trained on a different subset of the training data, they become diverse and learn slightly different aspects of the data.
    *   The feature randomness (considering only a subset of features at each split) also contributes to the diversity of the individual trees during training.

*   **Validation Data:**
    *   Similar to Decision Trees, validation data is **not directly used during the training of individual trees** within a Random Forest.
    *   After all the trees in the Random Forest are trained, the **validation data is used to evaluate the performance of the *entire ensemble***.
    *   To make a prediction for a validation data point, each tree in the Random Forest independently makes a prediction.
    *   These individual predictions are then **aggregated** to get the final prediction of the Random Forest. For regression, predictions are typically averaged. For classification, they might be voted on (majority vote) or averaged (for predicted probabilities).
    *   The validation data is used to calculate performance metrics (like MAE, accuracy) for these aggregated predictions, thus evaluating the generalization ability of the Random Forest as a whole.
    *   **Hyperparameter Tuning:** Validation data plays a critical role in **hyperparameter tuning for Random Forests**. When you adjust hyperparameters like `n_estimators` or `max_depth`, you train multiple Random Forest models with different hyperparameter settings. You then use the validation data to compare the performance of these different models and choose the hyperparameter settings that yield the best performance on the validation set (indicating better generalization).

**Key Differences in Handling Data:**

| Feature           | Decision Tree                                  | Random Forest                                                                    |
|--------------------|------------------------------------------------|---------------------------------------------------------------------------------|
| **Training Data Usage** | Trained on the *entire* training dataset.    | Each tree trained on a *different subset* (bootstrap or pasting) of training data. |
| **Validation Data Usage** | Used *after* training for evaluation and overfitting detection. | Used *after* ensemble training for evaluation, hyperparameter tuning, and generalization assessment. |
| **Data Diversity in Training** | Learns from a single, complete training set. | Learns from multiple, diverse subsets of the training data (for each tree).    |
| **Model Evaluation**| Evaluation of a *single* tree's performance.  | Evaluation of the *aggregated predictions* of the entire forest.                 |

**In summary,** both Decision Trees and Random Forests use training data to learn and validation data to evaluate generalization. However, Random Forests leverage the training data more extensively by creating multiple models on different subsets, and the validation data is used to assess the performance of the *ensemble* prediction, especially during hyperparameter tuning to optimize generalization. The use of validation data remains crucial for both types to ensure models are not just memorizing training data but learning to make accurate predictions on new, unseen data.

In [None]:
!pip install scikit-learn --upgrade

In [None]:
# Code Cell 1: Importing necessary libraries

import kagglehub
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor, BaggingRegressor, RandomForestClassifier, BaggingClassifier
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_digits, load_iris # For classification examples
from sklearn.metrics import ConfusionMatrixDisplay # Import ConfusionMatrixDisplay instead of plot_confusion_matrix


kagglehub.login()
dansbecker_home_data_for_ml_course_path = kagglehub.dataset_download('dansbecker/home-data-for-ml-course')

print('Data source import complete.')

In [None]:
# Code Cell 2: Loading Datasets - Iowa Housing (Regression), Digits (Classification), and Iris (Comparison)

# Load Iowa Housing Dataset for Regression
iowa_file_path = dansbecker_home_data_for_ml_course_path + '/train.csv'
iowa_data = pd.read_csv(iowa_file_path)
print("Iowa dataset (Regression) loaded successfully!")

# Load Digits Dataset for Classification
digits = load_digits()
digits_X, digits_y = digits.data, digits.target
print("\nDigits dataset (Classification) loaded successfully!")

# Load Iris Dataset for Comparison (Classification)
iris = load_iris()
iris_X, iris_y = iris.data, iris.target
print("\nIris dataset (Comparison) loaded successfully!")

### **Explanation of Code Cell 2: Loading Datasets**

In this code cell, we load **all three datasets** that we will use throughout this module at once:

*   **Iowa Housing Dataset (for Regression):**
    *   *(Explanation is the same as before)*

*   **Digits Dataset (for Classification):**
    *   *(Explanation is the same as before)*

*   **Iris Dataset (for Dataset Comparison):**
    *   `iris = load_iris()`: Loads the **Iris dataset** using `load_iris()` from `sklearn.datasets`. We load it now so it's ready for the dataset comparison in Step 10.
    *   `iris_X, iris_y = iris.data, iris.target`: Separates Iris dataset into features (`iris_X`) and target labels (`iris_y`).
    *   `print("\nIris dataset (Comparison) loaded successfully!")`: Confirmation message.

By loading all datasets upfront, we ensure that all necessary data is available throughout the module, and we are ready to proceed with data preparation and model building.

In [None]:
#------------------------------------
# Step 1: Prepare Data for Regression (Iowa Dataset)
#------------------------------------
print("\n\n----- Step 1: Prepare Data for Regression (Iowa Dataset) -----")

iowa_y = iowa_data['SalePrice']
print("Regression Target variable 'iowa_y' created successfully!")

iowa_numeric_features = iowa_data.select_dtypes(include=np.number)
iowa_X = iowa_numeric_features.drop('SalePrice', axis=1)
print("\nRegression Predictive features 'iowa_X' created successfully!")

# Split data for Regression
iowa_X_train, iowa_X_val, y_train, y_val = train_test_split(iowa_X, iowa_y, test_size=0.2, random_state=0)
print("\nRegression Data split into training and validation sets successfully!")

In [None]:
#------------------------------------
# Step 2: Prepare Data for Classification (Digits Dataset)
#------------------------------------
print("\n\n----- Step 2: Prepare Data for Classification (Digits Dataset) -----")

digits_X = digits_X
digits_y = digits_y
print("Classification Features 'digits_X' and Target 'digits_y' created successfully!")
# Digits Dataset represents 8x8 Images: Each sample is a list of 64 numbers, representing the pixel values of an 8x8 image. To actually visualize the image, you would need to reshape this list into an 8x8 matrix. The numbers within the list correspond to the intensity of each pixel in grayscale. They range from 0 (black) to 16 (white).

# Split data for Classification
digits_X_train, digits_X_val, digits_y_train, digits_y_val = train_test_split(digits_X, digits_y, test_size=0.2, random_state=0)
print("\nClassification Data split into training and validation sets successfully!")

# Print shapes of the data
print("Shape of digits_X_train:", digits_X_train.shape)
print("Shape of digits_y_train:", digits_y_train.shape)
print("Shape of digits_X_val:", digits_X_val.shape)
print("Shape of digits_y_val:", digits_y_val.shape)
print("\n")

# Print some examples from the training data
print("\nFirst 5 samples of digits_X_train:")
print(digits_X_train[:5])
print("\nFirst 5 labels of digits_y_train:")
print(digits_y_train[:5])
print("\n")

# Print some examples from the validation data
print("\nFirst 5 samples of digits_X_val:")
print(digits_X_val[:5])
print("\nFirst 5 labels of digits_y_val:")
print(digits_y_val[:5])
print("\n")

# Print unique values of target variable in training and validation sets
print("\nUnique values in digits_y_train:", np.unique(digits_y_train))
print("Unique values in digits_y_val:", np.unique(digits_y_val))
print("\n")

### **Explanation of Code Cell for Step 2: Prepare Data for Classification (Digits Dataset)**

This code cell focuses on preparing the **Digits dataset** specifically for classification tasks. It takes the already loaded Digits dataset (from Code Cell 2) and splits it into training and validation sets, ready for model training and evaluation. Here's a breakdown:

*   `print("\n\n----- Step 2: Prepare Data for Classification (Digits Dataset) -----")`: This line simply prints a heading to clearly indicate the start of Step 2 and that it's dealing with the Digits dataset for classification.

*   `digits_X = digits_X`
    `digits_y = digits_y`:
    These lines might seem redundant, but they explicitly assign the features (`digits_X`) and target labels (`digits_y`) from the loaded `digits` object to variables with the same names. In practice, this step ensures that we are using the correct feature and target data for the subsequent operations.

*   `print("Classification Features 'digits_X' and Target 'digits_y' created successfully!")`: This line prints a confirmation message to indicate that the feature and target variables for classification have been successfully prepared.

*   `digits_X_train, digits_X_val, digits_y_train, digits_y_val = train_test_split(digits_X, digits_y, test_size=0.2, random_state=0)`:
    This is the core of the data preparation step. It uses the `train_test_split` function from `sklearn.model_selection` to divide the Digits dataset into training and validation sets.
    *   `digits_X` and `digits_y`: These are the feature matrix and target variable for the Digits dataset, which are being split.
    *   `test_size=0.2`: This argument specifies that 20% of the data should be reserved for the validation set, and the remaining 80% will be used for training.
    *   `random_state=0`: This sets a seed for the random number generator used by the `train_test_split` function. Setting a `random_state` ensures that the data split is reproducible. If you run the code multiple times, you will get the same training and validation sets.
    *   The function returns four variables:
        *   `digits_X_train`: Features for the training set.
        *   `digits_X_val`: Features for the validation set.
        *   `digits_y_train`: Target labels for the training set.
        *   `digits_y_val`: Target labels for the validation set.

*   `print("\nClassification Data split into training and validation sets successfully!")`: This line prints a confirmation message indicating that the Digits classification data has been successfully split into training and validation sets.

In summary, this code cell takes the Digits dataset and prepares it for classification modeling by splitting it into distinct training and validation portions. This split is crucial to train models on one part of the data and then evaluate their performance on unseen data (the validation set), providing a realistic estimate of how well the models will generalize to new, real-world examples.

In [None]:
#------------------------------------
# Step 0: Decision Tree - Baseline Models (Regressor and Classifier)
#------------------------------------
print("\n\n----- Step 0: Decision Tree - Baseline Models (Regressor and Classifier) -----")
# Explanation: Create and evaluate single Decision Tree models as baselines for regression and classification.

# 0.1 Create and Train DecisionTreeRegressor Baseline
baseline_tree_regressor = DecisionTreeRegressor(random_state=0)
baseline_tree_regressor.fit(iowa_X_train, y_train)
print("DecisionTreeRegressor baseline model trained successfully for Regression!")

# 0.2 Make predictions and evaluate DecisionTreeRegressor (Regression)
baseline_regressor_predictions = baseline_tree_regressor.predict(iowa_X_val)
mae_baseline_regressor = mean_absolute_error(y_val, baseline_regressor_predictions)
mse_baseline_regressor = mean_squared_error(y_val, baseline_regressor_predictions)
print(f"MAE of DecisionTreeRegressor baseline (Regression): {mae_baseline_regressor}")
print(f"MSE of DecisionTreeRegressor baseline (Regression): {mse_baseline_regressor}")

# 0.3 Create and Train DecisionTreeClassifier Baseline
baseline_tree_classifier = DecisionTreeClassifier(random_state=0)
baseline_tree_classifier.fit(digits_X_train, digits_y_train)
print("\nDecisionTreeClassifier baseline model trained successfully for Classification!")

# 0.4 Make predictions and evaluate DecisionTreeClassifier (Classification)
baseline_classifier_predictions = baseline_tree_classifier.predict(digits_X_val)
accuracy_baseline_classifier = accuracy_score(digits_y_val, baseline_classifier_predictions)
print(f"Accuracy of DecisionTreeClassifier baseline (Classification): {accuracy_baseline_classifier}")
print("\nClassification Report for DecisionTreeClassifier baseline:")
print(classification_report(digits_y_val, baseline_classifier_predictions))
print("\nConfusion Matrix for DecisionTreeClassifier baseline:")
print(confusion_matrix(digits_y_val, baseline_classifier_predictions))

# Explanation:
# We are creating simple Decision Tree Regressor and Classifier models as baselines
# to compare against ensemble methods later. We train them, make predictions, and evaluate
# their performance using appropriate metrics for regression (MAE, MSE) and
# classification (Accuracy, Classification Report, Confusion Matrix).

# 0.5 Plotting Predictions of Decision Tree Regressor (Regression)
plt.figure(figsize=(6, 6))
plt.scatter(y_val, baseline_regressor_predictions, alpha=0.5)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Decision Tree Regressor: Actual vs Predicted (Baseline Regression Model)")
plt.plot([min(y_val), max(y_val)], [min(y_val), max(y_val)], color='red')
plt.tight_layout()
plt.show()

# 0.6 Plotting Confusion Matrix of Decision Tree Classifier (Classification)
cm = confusion_matrix(digits_y_val, baseline_classifier_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(digits_y))
disp.plot(cmap=plt.cm.Blues) # Use cmap for color scheme
plt.title('Confusion Matrix - Decision Tree Classifier (Baseline Classification Model - Digits)') # Updated title
plt.xticks(rotation=45) # Rotate x-axis labels for better visibility
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.tight_layout()
plt.show()

### Interpretation of Classification Report for Digit Classification

The output you provided is a classification report, which shows the performance of your machine learning model on classifying handwritten digits (0-9). Here's a breakdown:

**Structure:**

1. **Per-Class Metrics:**
   - The first 10 rows (for digits 0 through 9) provide individual metrics for each class:
     - **precision:** Out of all the times the model predicted a specific digit, what proportion was actually correct?
     
      - **Precision = True Positives / (True Positives + False Positives)**

     - **recall:** Out of all the actual instances of a specific digit, what proportion did the model correctly identify?

      -  **Recall = True Positives / (True Positives + False Negatives)**

     - **f1-score:** A balanced measure considering both precision and recall.
     - **support:** The number of actual instances of that digit in your dataset.

2. **Overall Metrics:**
   - **accuracy:** The overall proportion of correctly classified digits. In this case, it's 85%, meaning the model got 85% of the digits right.
   - **macro avg:** Averages the precision, recall, and F1-score across all classes *without* considering class imbalances (equal weight to each class).
   - **weighted avg:** Averages the precision, recall, and F1-score across all classes, taking into account the number of instances in each class (weighted by support).

**Interpretation:**

- **Accuracy (0.85):** Your model has an overall accuracy of 85%, which is a good starting point.
- **Per-Class Performance:**
    - The model performs well on most digits (precision, recall, and F1-score above 0.8 for many).
    - Digit '8' seems to have lower performance, especially in terms of recall (0.62). This means the model might be missing a significant number of actual '8's.
- **Macro vs. Weighted Average:**
    - Since the macro and weighted averages are very close (both 0.85), it suggests that your classes (digits 0-9) are relatively balanced in the dataset.

**Insights and Next Steps:**

- **Focus on '8':** Investigate why the model is struggling with digit '8'. Consider collecting more data for '8's or adjusting model parameters to improve recognition.
- **Overall Improvement:** Experiment with different model settings or ensemble methods (Bagging, Random Forest, etc.) to try and boost the overall accuracy and per-class performance.
- **Data Augmentation:** If you have limited data for specific digits, consider techniques like data augmentation (rotating, shifting existing images) to synthetically increase the training set.
- **Feature Engineering:** Analyze the image data to see if you can extract more informative features that might help the model differentiate between digits more effectively.

### **Explanation of Code Cell for Step 0: Decision Tree - Baseline Models (Regressor and Classifier)**

This code cell establishes **baseline performance** using single **Decision Tree models** for both regression and classification tasks. These baselines will serve as a point of comparison to evaluate the improvements offered by ensemble methods in subsequent steps.

**Regression Baseline (DecisionTreeRegressor):**

*   `print("\n\n----- Step 0: Decision Tree - Baseline Models (Regressor and Classifier) -----")`: Prints a heading for Step 0, indicating the creation of baseline models.

*   `baseline_tree_regressor = DecisionTreeRegressor(random_state=0)`:
    This line creates a `DecisionTreeRegressor` object from `sklearn.tree`.
    *   `DecisionTreeRegressor()`: Initializes a Decision Tree Regressor model.
    *   `random_state=0`: Sets a seed for the random number generator within the Decision Tree algorithm to ensure reproducibility.

*   `baseline_tree_regressor.fit(iowa_X_train, y_train)`:
    This line trains the `baseline_tree_regressor` model using the training data prepared for regression (Iowa dataset).
    *   `.fit(iowa_X_train, y_train)`:  The `.fit()` method is used to train the model. It takes the training features (`iowa_X_train`) and the corresponding training target variable (`y_train`) as input.

*   `print("DecisionTreeRegressor baseline model trained successfully for Regression!")`: Confirmation message after training the regression baseline model.

*   `baseline_regressor_predictions = baseline_tree_regressor.predict(iowa_X_val)`:
    This line uses the trained `baseline_tree_regressor` to make predictions on the validation set for the Iowa housing data.
    *   `.predict(iowa_X_val)`: The `.predict()` method takes the validation features (`iowa_X_val`) as input and generates predictions based on the trained model.

*   `mae_baseline_regressor = mean_absolute_error(y_val, baseline_regressor_predictions)`:
    `mse_baseline_regressor = mean_squared_error(y_val, baseline_regressor_predictions)`:
    These lines calculate the performance metrics for the regression baseline model.
    *   `mean_absolute_error(y_val, baseline_regressor_predictions)`: Calculates the Mean Absolute Error (MAE) between the true validation target values (`y_val`) and the predictions made by the baseline regressor (`baseline_regressor_predictions`). MAE measures the average absolute difference between predictions and actual values.
    *   `mean_squared_error(y_val, baseline_regressor_predictions)`: Calculates the Mean Squared Error (MSE) between the true validation target values (`y_val`) and the predictions. MSE measures the average squared difference between predictions and actual values.

*   `print(f"MAE of DecisionTreeRegressor baseline (Regression): {mae_baseline_regressor}")`
    `print(f"MSE of DecisionTreeRegressor baseline (Regression): {mse_baseline_regressor}")`:
    These lines print the calculated MAE and MSE values for the Decision Tree Regressor baseline.

**Classification Baseline (DecisionTreeClassifier):**

*   `baseline_tree_classifier = DecisionTreeClassifier(random_state=0)`:
    Creates a `DecisionTreeClassifier` object from `sklearn.tree` for classification baseline.  Similar to the regressor, `random_state=0` ensures reproducibility.

*   `baseline_tree_classifier.fit(digits_X_train, digits_y_train)`:
    Trains the `baseline_tree_classifier` using the training data prepared for classification (Digits dataset).

*   `print("\nDecisionTreeClassifier baseline model trained successfully for Classification!")`: Confirmation message after training the classification baseline model.

*   `baseline_classifier_predictions = baseline_tree_classifier.predict(digits_X_val)`:
    Makes predictions using the trained `baseline_tree_classifier` on the validation set of the Digits dataset.

*   `accuracy_baseline_classifier = accuracy_score(digits_y_val, baseline_classifier_predictions)`:
    Calculates the accuracy score for the classification baseline model.
    *   `accuracy_score(digits_y_val, baseline_classifier_predictions)`: Calculates the accuracy, which is the proportion of correctly classified instances (predictions matching the true labels `digits_y_val`).

*   `print(f"Accuracy of DecisionTreeClassifier baseline (Classification): {accuracy_baseline_classifier}")`:
    Prints the accuracy score for the Decision Tree Classifier baseline.

*   `print("\nClassification Report for DecisionTreeClassifier baseline:")`
    `print(classification_report(digits_y_val, baseline_classifier_predictions))`:
    Prints a detailed classification report.
    *   `classification_report(digits_y_val, baseline_classifier_predictions)`: Generates a classification report that includes precision, recall, F1-score, and support for each class in the Digits dataset. This provides a more comprehensive view of the classifier's performance beyond just accuracy, especially in multi-class classification problems like Digits.

*   `print("\nConfusion Matrix for DecisionTreeClassifier baseline:")`
    `print(confusion_matrix(digits_y_val, baseline_classifier_predictions))`:
    Prints the confusion matrix.
    *   `confusion_matrix(digits_y_val, baseline_classifier_predictions)`: Generates a confusion matrix, which is a table showing the counts of true positive, true negative, false positive, and false negative predictions for each class. It helps to visualize the performance of a classifier in terms of which classes are being confused with each other.

**Plotting (Regression and Classification):**

*   **Regression Plot (Actual vs Predicted Prices):**
    *   Uses `matplotlib.pyplot` to create a scatter plot of actual vs. predicted prices for the `DecisionTreeRegressor`. This visual helps to assess how well the model's predictions align with the actual values. The red diagonal line represents perfect predictions.

*   **Classification Plot (Confusion Matrix):**
    *   Uses `matplotlib.pyplot` to display the confusion matrix for the `DecisionTreeClassifier` as a heatmap. The confusion matrix is visualized using `plt.imshow()`, and colorbar and labels are added for clarity. The tick marks and labels are adjusted to represent the 10 digits classes correctly. This plot provides a visual representation of the classifier's performance in classifying each digit and helps identify potential areas of confusion between different digits.

In essence, Step 0 establishes the performance of single Decision Tree models as a starting point. By evaluating both regression and classification Decision Trees and visualizing their performance, we have a clear baseline to compare against the ensemble methods that will be introduced in the following steps. This allows us to quantify the benefits of using ensemble techniques.

In [None]:
#------------------------------------
# Step 1: Bagging - Regression and Classification
#------------------------------------
print("\n\n----- Step 1: Bagging - Regression and Classification -----")
# Explanation: Create and evaluate Bagging ensembles for both regression and classification, comparing to Decision Tree baselines.

# 1.1 Create and Train BaggingRegressor
# Use 'estimator' instead of 'base_estimator'
bagging_regressor = BaggingRegressor(estimator=DecisionTreeRegressor(random_state=0), n_estimators=10, random_state=0)
bagging_regressor.fit(iowa_X_train, y_train)
print("\nBaggingRegressor model trained successfully!")

# 1.2 Make predictions and evaluate BaggingRegressor (Regression)
bagging_regressor_predictions = bagging_regressor.predict(iowa_X_val)
mae_bagging_regressor = mean_absolute_error(y_val, bagging_regressor_predictions)
mse_bagging_regressor = mean_squared_error(y_val, bagging_regressor_predictions)
print(f"MAE of BaggingRegressor (Regression): {mae_bagging_regressor}")
print(f"MSE of BaggingRegressor (Regression): {mse_bagging_regressor}")
print(f"Improvement in MAE over DecisionTreeRegressor: {mae_baseline_regressor - mae_bagging_regressor:.2f}")
print(f"Improvement in MSE over DecisionTreeRegressor: {mse_baseline_regressor - mse_bagging_regressor:.2f}")

# 1.3 Create and Train BaggingClassifier
# Use 'estimator' instead of 'base_estimator'
bagging_classifier = BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0), n_estimators=10, random_state=0)
bagging_classifier.fit(digits_X_train, digits_y_train)
print("\nBaggingClassifier model trained successfully!")

# 1.4 Make predictions and evaluate BaggingClassifier (Classification)
bagging_classifier_predictions = bagging_classifier.predict(digits_X_val)
accuracy_bagging_classifier = accuracy_score(digits_y_val, bagging_classifier_predictions)
print(f"Accuracy of BaggingClassifier (Classification): {accuracy_bagging_classifier}")
print(f"Improvement in Accuracy over DecisionTreeClassifier: {accuracy_bagging_classifier - accuracy_baseline_classifier:.2f}")
print("\nClassification Report for BaggingClassifier:")
print(classification_report(digits_y_val, bagging_classifier_predictions))
print("\nConfusion Matrix for BaggingClassifier:")
print(confusion_matrix(digits_y_val, bagging_classifier_predictions))

# Explanation:
# We are creating Bagging ensembles for both regression and classification using
# Decision Trees as base estimators. We train them, make predictions, and
# evaluate their performance, comparing against the Decision Tree baselines.

# 1.5 Plotting Predictions of BaggingRegressor (Regression)
plt.figure(figsize=(6, 6))
plt.scatter(y_val, bagging_regressor_predictions, alpha=0.5)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("BaggingRegressor: Actual vs Predicted (Ensemble Regression Model)")
plt.plot([min(y_val), max(y_val)], [min(y_val), max(y_val)], color='red')
plt.tight_layout()
plt.show()

# 1.6 Plotting Confusion Matrix of BaggingClassifier (Classification)
# Instead of using plot_confusion_matrix, use ConfusionMatrixDisplay
cm = confusion_matrix(digits_y_val, bagging_classifier_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(digits_y))
disp.plot(cmap=plt.cm.Blues) # Use cmap for color scheme
plt.title('Confusion Matrix - BaggingClassifier (Ensemble Classification Model - Digits)')
plt.xticks(rotation=45) # Rotate x-axis labels for better visibility
plt.tight_layout()
plt.show()

### **Explanation of Code Cell for Step 1: Bagging - Regression and Classification**

This code cell implements and evaluates **Bagging ensembles** for both regression and classification tasks. Bagging (Bootstrap Aggregating) is an ensemble technique that involves training multiple instances of the same base learner (in this case, Decision Trees) on different bootstrapped subsets of the training data. The predictions of these base learners are then aggregated (averaged for regression, voted for classification) to make the final prediction.

**Bagging Regressor (BaggingRegressor):**

*   `print("\n\n----- Step 1: Bagging - Regression and Classification -----")`: Prints a heading for Step 1, indicating Bagging models.

*   `bagging_regressor = BaggingRegressor(base_estimator=DecisionTreeRegressor(random_state=0), n_estimators=10, random_state=0)`:
    This line creates a `BaggingRegressor` object from `sklearn.ensemble`.
    *   `BaggingRegressor(...)`: Initializes a Bagging Regressor ensemble.
    *   `base_estimator=DecisionTreeRegressor(random_state=0)`: Specifies that the base learner for Bagging is a `DecisionTreeRegressor`. We are using Decision Trees as the building blocks of our Bagging ensemble. `random_state=0` ensures the Decision Tree base estimator is also reproducible.
    *   `n_estimators=10`: Sets the number of base estimators (Decision Trees) in the Bagging ensemble to 10. This means the Bagging ensemble will consist of 10 Decision Trees.
    *   `random_state=0`: Sets a seed for the random number generator used by the Bagging process itself (e.g., for bootstrapping). This ensures the Bagging ensemble creation is reproducible.

*   `bagging_regressor.fit(iowa_X_train, y_train)`:
    Trains the `bagging_regressor` ensemble using the Iowa housing training data.  This step will train 10 Decision Tree regressors, each on a different bootstrapped sample of the training data.

*   `print("\nBaggingRegressor model trained successfully!")`: Confirmation message after training the Bagging Regressor model.

*   `bagging_regressor_predictions = bagging_regressor.predict(iowa_X_val)`:
    Uses the trained `bagging_regressor` ensemble to make predictions on the Iowa housing validation set. The predictions from all 10 Decision Trees in the ensemble are averaged to get the final prediction for each instance.

*   `mae_bagging_regressor = mean_absolute_error(y_val, bagging_regressor_predictions)`:
    `mse_bagging_regressor = mean_squared_error(y_val, bagging_regressor_predictions)`:
    Calculates the MAE and MSE for the Bagging Regressor, similar to the baseline evaluation in Step 0.

*   `print(f"MAE of BaggingRegressor (Regression): {mae_bagging_regressor}")`
    `print(f"MSE of BaggingRegressor (Regression): {mse_bagging_regressor}")`:
    Prints the MAE and MSE of the Bagging Regressor.

*   `print(f"Improvement in MAE over DecisionTreeRegressor: {mae_baseline_regressor - mae_bagging_regressor:.2f}")`
    `print(f"Improvement in MSE over DecisionTreeRegressor: {mse_baseline_regressor - mse_bagging_regressor:.2f}")`:
    Calculates and prints the improvement in MAE and MSE achieved by Bagging Regressor compared to the Decision Tree Regressor baseline from Step 0. This helps to quantify the benefit of using Bagging.

**Bagging Classifier (BaggingClassifier):**

*   `bagging_classifier = BaggingClassifier(base_estimator=DecisionTreeClassifier(random_state=0), n_estimators=10, random_state=0)`:
    Creates a `BaggingClassifier` object from `sklearn.ensemble`. It is configured similarly to `BaggingRegressor`, but for classification. It uses `DecisionTreeClassifier` as the base estimator and also uses 10 estimators.

*   `bagging_classifier.fit(digits_X_train, digits_y_train)`:
    Trains the `bagging_classifier` ensemble using the Digits training data. This will train 10 Decision Tree classifiers, each on a bootstrapped sample.

*   `print("\nBaggingClassifier model trained successfully!")`: Confirmation message.

*   `bagging_classifier_predictions = bagging_classifier.predict(digits_X_val)`:
    Makes predictions using the trained `bagging_classifier` ensemble on the Digits validation set. For classification, the predictions are typically made by majority voting among the 10 Decision Tree classifiers.

*   `accuracy_bagging_classifier = accuracy_score(digits_y_val, bagging_classifier_predictions)`:
    Calculates the accuracy of the Bagging Classifier.

*   `print(f"Accuracy of BaggingClassifier (Classification): {accuracy_bagging_classifier}")`
    `print(f"Improvement in Accuracy over DecisionTreeClassifier: {accuracy_bagging_classifier - accuracy_baseline_classifier:.2f}")`:
    Prints the accuracy and the improvement in accuracy over the Decision Tree Classifier baseline.

*   `print("\nClassification Report for BaggingClassifier:")`
    `print(classification_report(digits_y_val, bagging_classifier_predictions))`:
    `print("\nConfusion Matrix for BaggingClassifier:")`
    `print(confusion_matrix(digits_y_val, bagging_classifier_predictions))`:
    Prints the classification report and confusion matrix for the Bagging Classifier, providing detailed performance evaluation.

**Plotting (Regression and Classification):**

*   **Regression Plot (Actual vs Predicted Prices for BaggingRegressor):**
    *   Creates a scatter plot of actual vs. predicted prices for the `BaggingRegressor`, similar to the baseline plot in Step 0.

*   **Classification Plot (Confusion Matrix for BaggingClassifier):**
    *   Displays the confusion matrix for the `BaggingClassifier` as a heatmap, again similar to the baseline plot but now for the Bagging model. The plot is adapted for the 10 classes of the Digits dataset.

In summary, Step 1 implements Bagging for both regression and classification using Decision Trees as base learners. It trains and evaluates these Bagging ensembles, calculates performance metrics, and visualizes the results. Crucially, it compares the performance of Bagging to the Decision Tree baselines from Step 0, allowing students to observe the potential benefits of ensemble methods in improving model accuracy and robustness.

In [None]:
#------------------------------------
# Step 2: Pasting - Regression and Classification
#------------------------------------
print("\n\n----- Step 2: Pasting - Regression and Classification -----")
# Explanation: Create and evaluate Pasting ensembles for both regression and classification, comparing to Decision Tree and Bagging models.

# 2.1 Create and Train PastingRegressor
pasting_regressor = BaggingRegressor(estimator=DecisionTreeRegressor(random_state=0), # Changed 'base_estimator' to 'estimator'
                                     n_estimators=10,
                                     random_state=0,
                                     bootstrap=False)

pasting_regressor.fit(iowa_X_train, y_train)
print("\nPastingRegressor model trained successfully!")

# 2.2 Make predictions and evaluate PastingRegressor (Regression)
pasting_regressor_predictions = pasting_regressor.predict(iowa_X_val)
mae_pasting_regressor = mean_absolute_error(y_val, pasting_regressor_predictions)
mse_pasting_regressor = mean_squared_error(y_val, pasting_regressor_predictions)
print(f"MAE of PastingRegressor (Regression): {mae_pasting_regressor}")
print(f"MSE of PastingRegressor (Regression): {mse_pasting_regressor}")
print(f"Improvement in MAE over DecisionTreeRegressor: {mae_baseline_regressor - mae_pasting_regressor:.2f}")
print(f"Improvement in MSE over DecisionTreeRegressor: {mse_baseline_regressor - mse_pasting_regressor:.2f}")
print(f"Comparison of MAE with BaggingRegressor: {mae_bagging_regressor - mae_pasting_regressor:.2f}")
print(f"Comparison of MSE with BaggingRegressor: {mse_bagging_regressor - mse_pasting_regressor:.2f}")


# 2.3 Create and Train PastingClassifier
pasting_classifier = BaggingClassifier(estimator=DecisionTreeClassifier(random_state=0),  # Changed 'base_estimator' to 'estimator'
                                       n_estimators=10,
                                       random_state=0,
                                       bootstrap=False)

pasting_classifier.fit(digits_X_train, digits_y_train)
print("\nPastingClassifier model trained successfully!")

# 2.4 Make predictions and evaluate PastingClassifier (Classification)
pasting_classifier_predictions = pasting_classifier.predict(digits_X_val)
accuracy_pasting_classifier = accuracy_score(digits_y_val, pasting_classifier_predictions)
print(f"Accuracy of PastingClassifier (Classification): {accuracy_pasting_classifier}")
print(f"Improvement in Accuracy over DecisionTreeClassifier: {accuracy_pasting_classifier - accuracy_baseline_classifier:.2f}")
print(f"Comparison of Accuracy with BaggingClassifier: {accuracy_bagging_classifier - accuracy_pasting_classifier:.2f}")
print("\nClassification Report for PastingClassifier:")
print(classification_report(digits_y_val, pasting_classifier_predictions))
print("\nConfusion Matrix for PastingClassifier:")
print(confusion_matrix(digits_y_val, pasting_classifier_predictions))

# Explanation:
# We are creating Pasting ensembles for both regression and classification using
# Decision Trees as base estimators. We train them, make predictions, and
# evaluate their performance, comparing against Decision Tree and Bagging models.

# 2.5 Plotting Predictions of PastingRegressor (Regression)
plt.figure(figsize=(6, 6))
plt.scatter(y_val, pasting_regressor_predictions, alpha=0.5)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("PastingRegressor: Actual vs Predicted (Ensemble Regression Model)")
plt.plot([min(y_val), max(y_val)], [min(y_val), max(y_val)], color='red')
plt.tight_layout()
plt.show()

# 2.6 Plotting Confusion Matrix of PastingClassifier (Classification)
cm = confusion_matrix(digits_y_val, pasting_classifier_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(digits_y))
disp.plot(cmap=plt.cm.Blues) # Use cmap for color scheme
plt.title('Confusion Matrix - Decision Tree Classifier (Baseline Classification Model - Digits)') # Updated title
plt.xticks(rotation=45) # Rotate x-axis labels for better visibility
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.tight_layout()
plt.show()

### **Explanation of Code Cell for Step 2: Pasting - Regression and Classification**

This code cell implements and evaluates **Pasting ensembles** for both regression and classification tasks. Pasting is very similar to Bagging, but the key difference is in the data sampling method. While Bagging uses **bootstrapping** (sampling with replacement), Pasting uses **sampling without replacement**. This means each base learner in a Pasting ensemble is trained on a slightly different subset of the original training data, but these subsets are disjoint (no data points are repeated within a single subset).

**Pasting Regressor (PastingRegressor):**

*   `print("\n\n----- Step 2: Pasting - Regression and Classification -----")`: Prints a heading for Step 2, indicating Pasting models.

*   `pasting_regressor = PastingRegressor(base_estimator=DecisionTreeRegressor(random_state=0), n_estimators=10, random_state=0)`:
    Creates a `PastingRegressor` object from `sklearn.ensemble`.
    *   `PastingRegressor(...)`: Initializes a Pasting Regressor ensemble.
    *   `base_estimator=DecisionTreeRegressor(random_state=0)`:  Similar to Bagging, it uses `DecisionTreeRegressor` as the base learner.
    *   `n_estimators=10`: Sets the number of base estimators to 10.
    *   `random_state=0`: Sets a seed for reproducibility.

*   `pasting_regressor.fit(iowa_X_train, y_train)`:
    Trains the `pasting_regressor` ensemble using the Iowa housing training data.  This trains 10 Decision Tree regressors, each on a different subset of the training data sampled *without* replacement.

*   `print("\nPastingRegressor model trained successfully!")`: Confirmation message.

*   `pasting_regressor_predictions = pasting_regressor.predict(iowa_X_val)`:
    Makes predictions using the trained `pasting_regressor` on the Iowa housing validation set. Predictions are aggregated (averaged) from the 10 base regressors.

*   `mae_pasting_regressor = mean_absolute_error(y_val, pasting_regressor_predictions)`:
    `mse_pasting_regressor = mean_squared_error(y_val, pasting_regressor_predictions)`:
    Calculates MAE and MSE for the Pasting Regressor.

*   `print(f"MAE of PastingRegressor (Regression): {mae_pasting_regressor}")`
    `print(f"MSE of PastingRegressor (Regression): {mse_pasting_regressor}")`:
    Prints the MAE and MSE.

*   `print(f"Improvement in MAE over DecisionTreeRegressor: {mae_baseline_regressor - mae_pasting_regressor:.2f}")`
    `print(f"Improvement in MSE over DecisionTreeRegressor: {mse_baseline_regressor - mse_pasting_regressor:.2f}")`:
    Calculates and prints the improvement over the Decision Tree Regressor baseline.

*   `print(f"Comparison of MAE with BaggingRegressor: {mae_bagging_regressor - mae_pasting_regressor:.2f}")`
    `print(f"Comparison of MSE with BaggingRegressor: {mse_bagging_regressor - mse_pasting_regressor:.2f}")`:
    Calculates and prints a comparison of MAE and MSE between Pasting and Bagging Regressors, allowing for a direct performance comparison between these two ensemble techniques.

**Pasting Classifier (PastingClassifier):**

*   `pasting_classifier = PastingClassifier(base_estimator=DecisionTreeClassifier(random_state=0), n_estimators=10, random_state=0)`:
    Creates a `PastingClassifier` object, configured similarly to `PastingRegressor` but for classification using `DecisionTreeClassifier` as the base estimator.

*   `pasting_classifier.fit(digits_X_train, digits_y_train)`:
    Trains the `pasting_classifier` ensemble on the Digits training data.

*   `print("\nPastingClassifier model trained successfully!")`: Confirmation message.

*   `pasting_classifier_predictions = pasting_classifier.predict(digits_X_val)`:
    Makes predictions on the Digits validation set using the Pasting Classifier, with predictions aggregated via voting.

*   `accuracy_pasting_classifier = accuracy_score(digits_y_val, pasting_classifier_predictions)`:
    Calculates the accuracy of the Pasting Classifier.

*   `print(f"Accuracy of PastingClassifier (Classification): {accuracy_pasting_classifier}")`
    `print(f"Improvement in Accuracy over DecisionTreeClassifier: {accuracy_pasting_classifier - accuracy_baseline_classifier:.2f}")`:
    Prints the accuracy and improvement over the Decision Tree Classifier baseline.

*   `print(f"Comparison of Accuracy with BaggingClassifier: {accuracy_bagging_classifier - accuracy_pasting_classifier:.2f}")`:
    Prints a comparison of accuracy between Pasting and Bagging Classifiers.

*   `print("\nClassification Report for PastingClassifier:")`
    `print(classification_report(digits_y_val, pasting_classifier_predictions))`:
    `print("\nConfusion Matrix for PastingClassifier:")`
    `print(confusion_matrix(digits_y_val, pasting_classifier_predictions))`:
    Prints the classification report and confusion matrix for the Pasting Classifier.

**Plotting (Regression and Classification):**

*   **Regression Plot (Actual vs Predicted Prices for PastingRegressor):**
    *   Creates a scatter plot for `PastingRegressor` predictions, similar to previous regression plots.

*   **Classification Plot (Confusion Matrix for PastingClassifier):**
    *   Displays the confusion matrix for `PastingClassifier` as a heatmap, adapted for the Digits dataset.

Step 2 implements Pasting for both regression and classification, allowing students to explore the effect of sampling without replacement in ensemble methods. By comparing Pasting to both Decision Tree baselines and Bagging ensembles, students can start to understand the nuances of different ensemble techniques and their relative performance. The focus is on the difference in sampling strategy (without replacement in Pasting vs. with replacement in Bagging) and its impact on model performance.

In [None]:
#------------------------------------
# Step 3: Random Forests - Regression and Classification
#------------------------------------
print("\n\n----- Step 3: Random Forests - Regression and Classification -----")
# Explanation: Create and evaluate Random Forest ensembles for both regression and classification, comparing to Decision Tree, Bagging, and Pasting models.

# 3.1 Create and Train RandomForestRegressor
forest_regressor = RandomForestRegressor(n_estimators=10, random_state=0)
forest_regressor.fit(iowa_X_train, y_train)
print("\nRandomForestRegressor model trained successfully!")

# 3.2 Make predictions and evaluate RandomForestRegressor (Regression)
forest_regressor_predictions = forest_regressor.predict(iowa_X_val)
mae_forest_regressor = mean_absolute_error(y_val, forest_regressor_predictions)
mse_forest_regressor = mean_squared_error(y_val, forest_regressor_predictions)
print(f"MAE of RandomForestRegressor (Regression): {mae_forest_regressor}")
print(f"MSE of RandomForestRegressor (Regression): {mse_forest_regressor}")
print(f"Improvement in MAE over DecisionTreeRegressor: {mae_baseline_regressor - mae_forest_regressor:.2f}")
print(f"Improvement in MSE over DecisionTreeRegressor: {mse_baseline_regressor - mse_forest_regressor:.2f}")
print(f"Comparison of MAE with BaggingRegressor: {mae_bagging_regressor - mae_forest_regressor:.2f}")
print(f"Comparison of MSE with BaggingRegressor: {mse_bagging_regressor - mse_forest_regressor:.2f}")
print(f"Comparison of MAE with PastingRegressor: {mae_pasting_regressor - mae_forest_regressor:.2f}")
print(f"Comparison of MSE with PastingRegressor: {mse_pasting_regressor - mse_forest_regressor:.2f}")


# 3.3 Create and Train RandomForestClassifier
forest_classifier = RandomForestClassifier(n_estimators=10, random_state=0)
forest_classifier.fit(digits_X_train, digits_y_train)
print("\nRandomForestClassifier model trained successfully!")
print("\n")

# 3.4 Make predictions and evaluate RandomForestClassifier (Classification)
forest_classifier_predictions = forest_classifier.predict(digits_X_val)
accuracy_forest_classifier = accuracy_score(digits_y_val, forest_classifier_predictions)
print(f"Accuracy of RandomForestClassifier (Classification): {accuracy_forest_classifier}")
print(f"Improvement in Accuracy over DecisionTreeClassifier: {accuracy_forest_classifier - accuracy_baseline_classifier:.2f}")
print(f"Comparison of Accuracy with BaggingClassifier: {accuracy_bagging_classifier - accuracy_forest_classifier:.2f}")
print(f"Comparison of Accuracy with PastingClassifier: {accuracy_pasting_classifier - accuracy_forest_classifier:.2f}")

print("\nClassification Report for RandomForestClassifier:")
print(classification_report(digits_y_val, forest_classifier_predictions))
print("\nConfusion Matrix for RandomForestClassifier:")
print(confusion_matrix(digits_y_val, forest_classifier_predictions))

# Explanation:
# We are creating Random Forest ensembles for both regression and classification.
# Random Forests are an extension of Bagging that introduces additional
# randomness by also selecting random subsets of features when splitting nodes
# in the Decision Trees. We train them, make predictions, and evaluate their
# performance, comparing against Decision Tree, Bagging, and Pasting models.

# 3.5 Plotting Predictions of RandomForestRegressor (Regression)
plt.figure(figsize=(6, 6))
plt.scatter(y_val, forest_regressor_predictions, alpha=0.5)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("RandomForestRegressor: Actual vs Predicted (Ensemble Regression Model)")
plt.plot([min(y_val), max(y_val)], [min(y_val), max(y_val)], color='red')
plt.tight_layout()
plt.show()

# 3.6 Plotting Confusion Matrix of RandomForestClassifier (Classification)
cm = confusion_matrix(digits_y_val, forest_classifier_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(digits_y))
disp.plot(cmap=plt.cm.Blues) # Use cmap for color scheme
plt.title('Confusion Matrix - Decision Tree Classifier (Baseline Classification Model - Digits)') # Updated title
plt.xticks(rotation=45) # Rotate x-axis labels for better visibility
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.tight_layout()
plt.show()

### **Explanation of Code Cell for Step 3: Random Forests - Regression and Classification**

This code cell implements and evaluates **Random Forest ensembles** for both regression and classification tasks. Random Forests are a highly popular and effective ensemble method that builds upon the concept of Bagging.  They introduce an additional layer of randomness by not only bootstrapping the data but also by randomly selecting a subset of features at each node split in the Decision Trees. This feature randomness further decorrelates the trees in the forest, often leading to improved performance and generalization.

**Random Forest Regressor (RandomForestRegressor):**

*   `print("\n\n----- Step 3: Random Forests - Regression and Classification -----")`: Prints a heading for Step 3, indicating Random Forest models.

*   `forest_regressor = RandomForestRegressor(n_estimators=10, random_state=0)`:
    Creates a `RandomForestRegressor` object from `sklearn.ensemble`.
    *   `RandomForestRegressor(...)`: Initializes a Random Forest Regressor ensemble.
    *   `n_estimators=10`: Sets the number of trees in the Random Forest to 10.
    *   `random_state=0`: Sets a seed for reproducibility.

*   `forest_regressor.fit(iowa_X_train, y_train)`:
    Trains the `forest_regressor` ensemble using the Iowa housing training data. This step trains 10 Decision Tree regressors, each on a bootstrapped sample of the data, and with feature randomness applied during tree building.

*   `print("\nRandomForestRegressor model trained successfully!")`: Confirmation message.

*   `forest_regressor_predictions = forest_regressor.predict(iowa_X_val)`:
    Makes predictions using the trained `forest_regressor` on the Iowa housing validation set. Predictions are averaged across the 10 trees.

*   `mae_forest_regressor = mean_absolute_error(y_val, forest_regressor_predictions)`:
    `mse_forest_regressor = mean_squared_error(y_val, forest_regressor_predictions)`:
    Calculates MAE and MSE for the Random Forest Regressor.

*   `print(f"MAE of RandomForestRegressor (Regression): {mae_forest_regressor}")`
    `print(f"MSE of RandomForestRegressor (Regression): {mse_forest_regressor}")`:
    Prints the MAE and MSE.

*   `print(f"Improvement in MAE over DecisionTreeRegressor: {mae_baseline_regressor - mae_forest_regressor:.2f}")`
    `print(f"Improvement in MSE over DecisionTreeRegressor: {mse_baseline_regressor - mse_forest_regressor:.2f}")`:
    Prints the improvement over the Decision Tree Regressor baseline.

*   `print(f"Comparison of MAE with BaggingRegressor: {mae_bagging_regressor - mae_forest_regressor:.2f}")`
    `print(f"Comparison of MSE with BaggingRegressor: {mse_bagging_regressor - mse_forest_regressor:.2f}")`:
    `print(f"Comparison of MAE with PastingRegressor: {mae_pasting_regressor - mae_forest_regressor:.2f}")`
    `print(f"Comparison of MSE with PastingRegressor: {mse_pasting_regressor - mse_forest_regressor:.2f}")`:
    Prints comparisons of MAE and MSE between Random Forest Regressor and both Bagging and Pasting Regressors, allowing for performance comparisons among all three ensemble methods and the Decision Tree baseline.

**Random Forest Classifier (RandomForestClassifier):**

*   `forest_classifier = RandomForestClassifier(n_estimators=10, random_state=0)`:
    Creates a `RandomForestClassifier` object, configured similarly to `RandomForestRegressor` but for classification.

*   `forest_classifier.fit(digits_X_train, digits_y_train)`:
    Trains the `forest_classifier` ensemble on the Digits training data.

*   `print("\nRandomForestClassifier model trained successfully!")`: Confirmation message.

*   `forest_classifier_predictions = forest_classifier.predict(digits_X_val)`:
    Makes predictions on the Digits validation set using the Random Forest Classifier, with predictions aggregated via voting.

*   `accuracy_forest_classifier = accuracy_score(digits_y_val, forest_classifier_predictions)`:
    Calculates the accuracy of the Random Forest Classifier.

*   `print(f"Accuracy of RandomForestClassifier (Classification): {accuracy_forest_classifier}")`
    `print(f"Improvement in Accuracy over DecisionTreeClassifier: {accuracy_forest_classifier - accuracy_baseline_classifier:.2f}")`:
    Prints the accuracy and improvement over the Decision Tree Classifier baseline.

*   `print(f"Comparison of Accuracy with BaggingClassifier: {accuracy_bagging_classifier - accuracy_forest_classifier:.2f}")`
    `print(f"Comparison of Accuracy with PastingClassifier: {accuracy_pasting_classifier - accuracy_forest_classifier:.2f}")`:
    Prints comparisons of accuracy between Random Forest Classifier and both Bagging and Pasting Classifiers.

*   `print("\nClassification Report for RandomForestClassifier:")`
    `print(classification_report(digits_y_val, forest_classifier_predictions))`:
    `print("\nConfusion Matrix for RandomForestClassifier:")`
    `print(confusion_matrix(digits_y_val, forest_classifier_predictions))`:
    Prints the classification report and confusion matrix for the Random Forest Classifier.

**Plotting (Regression and Classification):**

*   **Regression Plot (Actual vs Predicted Prices for RandomForestRegressor):**
    *   Creates a scatter plot for `RandomForestRegressor` predictions.

*   **Classification Plot (Confusion Matrix for RandomForestClassifier):**
    *   Displays the confusion matrix for `RandomForestClassifier` as a heatmap, adapted for the Digits dataset.

Step 3 introduces Random Forests, a powerful ensemble technique that combines Bagging with feature randomness. By implementing and evaluating Random Forests for both regression and classification, and comparing their performance to Decision Trees, Bagging, and Pasting, students can appreciate the effectiveness of Random Forests and understand the impact of feature randomness in further improving ensemble performance. This step highlights Random Forests as a state-of-the-art ensemble method and allows for a comprehensive comparison of all ensemble techniques covered so far.

In [None]:
#------------------------------------
# Step 4: Hyperparameter Tuning for Random Forests
#------------------------------------
print("\n\n----- Step 4: Hyperparameter Tuning for Random Forests -----")
# Explanation: Explore the impact of key hyperparameters in RandomForestRegressor and RandomForestClassifier.

# 4.1 Tuning n_estimators for RandomForestRegressor
n_estimators_vals = [10, 50, 100, 200, 500]
mae_scores_n_estimators_reg = []
for n_est in n_estimators_vals:
    rf_reg = RandomForestRegressor(n_estimators=n_est, random_state=0)
    rf_reg.fit(iowa_X_train, y_train)
    predictions = rf_reg.predict(iowa_X_val)
    mae = mean_absolute_error(y_val, predictions)
    mae_scores_n_estimators_reg.append(mae)
print("\nMAE scores for RandomForestRegressor with different n_estimators:", mae_scores_n_estimators_reg)

# 4.2 Tuning max_depth for RandomForestRegressor
max_depth_vals = [None, 5, 10, 15, 20]
mae_scores_max_depth_reg = []
for max_d in max_depth_vals:
    rf_reg = RandomForestRegressor(max_depth=max_d, n_estimators=100, random_state=0) # Fixed n_estimators for this tuning
    rf_reg.fit(iowa_X_train, y_train)
    predictions = rf_reg.predict(iowa_X_val)
    mae = mean_absolute_error(y_val, predictions)
    mae_scores_max_depth_reg.append(mae)
print("\nMAE scores for RandomForestRegressor with different max_depth:", mae_scores_max_depth_reg)


# 4.3 Tuning n_estimators for RandomForestClassifier
n_estimators_vals_clf = [10, 20, 30, 40, 50, 75, 100, 150, 200]
accuracy_scores_n_estimators_clf = []
for n_est in n_estimators_vals_clf:
    rf_clf = RandomForestClassifier(n_estimators=n_est, random_state=0)
    rf_clf.fit(digits_X_train, digits_y_train)
    predictions = rf_clf.predict(digits_X_val)
    accuracy = accuracy_score(digits_y_val, predictions)
    accuracy_scores_n_estimators_clf.append(accuracy)
print("\nAccuracy scores for RandomForestClassifier with different n_estimators:", accuracy_scores_n_estimators_clf)

# 4.4 Tuning max_depth for RandomForestClassifier
max_depth_vals_clf = [None, 5, 10, 15, 20]
accuracy_scores_max_depth_clf = []
for max_d in max_depth_vals_clf:
    rf_clf = RandomForestClassifier(max_depth=max_d, n_estimators=100, random_state=0) # Fixed n_estimators for this tuning
    rf_clf.fit(digits_X_train, digits_y_train)
    predictions = rf_clf.predict(digits_X_val)
    accuracy = accuracy_score(digits_y_val, predictions)
    accuracy_scores_max_depth_clf.append(accuracy)
print("\nAccuracy scores for RandomForestClassifier with different max_depth:", accuracy_scores_max_depth_clf)


# Explanation:
# We are exploring the impact of two key hyperparameters for Random Forests:
# - n_estimators: Number of trees in the forest.
# - max_depth: Maximum depth of each tree.
# We iterate through different values for each hyperparameter, train Random Forest
# models, evaluate performance, and print the scores to observe the impact.

# 4.5 Plotting n_estimators vs MAE for RandomForestRegressor
plt.figure(figsize=(6, 4))
plt.plot(n_estimators_vals, mae_scores_n_estimators_reg, marker='o')
plt.xlabel("n_estimators")
plt.ylabel("Mean Absolute Error (MAE)")
plt.title("RandomForestRegressor: n_estimators vs MAE")
plt.grid(True)
plt.tight_layout()
plt.show()

# 4.6 Plotting max_depth vs MAE for RandomForestRegressor
plt.figure(figsize=(6, 4))
plt.plot(max_depth_vals, mae_scores_max_depth_reg, marker='o')
plt.xlabel("max_depth")
plt.ylabel("Mean Absolute Error (MAE)")
plt.title("RandomForestRegressor: max_depth vs MAE")
plt.grid(True)
plt.tight_layout()
plt.show()

# 4.7 Plotting n_estimators vs Accuracy for RandomForestClassifier
plt.figure(figsize=(6, 4))
plt.plot(n_estimators_vals_clf, accuracy_scores_n_estimators_clf, marker='o')
plt.xlabel("n_estimators")
plt.ylabel("Accuracy")
plt.title("RandomForestClassifier: n_estimators vs Accuracy (Digits Dataset)") # Updated title
plt.grid(True)
plt.tight_layout()
plt.show()

# 4.8 Plotting max_depth vs Accuracy for RandomForestClassifier
plt.figure(figsize=(6, 4))
plt.plot(max_depth_vals_clf, accuracy_scores_max_depth_clf, marker='o')
plt.xlabel("max_depth")
plt.ylabel("Accuracy")
plt.title("RandomForestClassifier: max_depth vs Accuracy (Digits Dataset)") # Updated title
plt.grid(True)
plt.tight_layout()
plt.show()

### Interpretation of Hyperparameter Tuning Results for Random Forests

This section analyzes the performance of Random Forest models (Regressor and Classifier) with varying `n_estimators` and `max_depth` hyperparameters. The goal is to identify optimal settings for improved model accuracy.

**Understanding the Output:**

The output displays performance scores (MAE for Regressor, Accuracy for Classifier) for different combinations of `n_estimators` (number of trees) and `max_depth` (maximum tree depth).

**Interpreting the Scores:**

1. **MAE scores for RandomForestRegressor with different n_estimators:**
    * These scores represent the Mean Absolute Error (MAE) of the Regressor with varying numbers of trees.
    * **Interpretation:** MAE generally decreases as `n_estimators` increases, indicating that more trees improve accuracy. However, improvement might plateau.

2. **MAE scores for RandomForestRegressor with different max_depth:**
    * These scores show the MAE of the Regressor when varying the maximum depth of the trees.
    * **Interpretation:** There isn't a clear trend with increasing `max_depth`. The optimal value balances accuracy and avoids overfitting. `max_depth=None` was included for comparison.

3. **Accuracy scores for RandomForestClassifier with different n_estimators:**
    * These scores represent the accuracy of the Classifier with different numbers of trees.
    * **Interpretation:** Accuracy generally increases with increasing `n_estimators`, but improvement may plateau.

4. **Accuracy scores for RandomForestClassifier with different max_depth:**
    * These scores show the accuracy of the Classifier when varying the maximum depth of the trees.
    * **Interpretation:** There isn't a clear trend with increasing `max_depth`. The optimal value balances accuracy and avoids overfitting. `max_depth=None` was included for comparison.


### **Explanation of Code Cell for Step 4: Hyperparameter Tuning for Random Forests**

This code cell focuses on **hyperparameter tuning** for **Random Forest models**. Hyperparameters are settings of a machine learning model that are not learned from the data but are set prior to training. Tuning these hyperparameters is crucial to optimize model performance and generalization. This step explores the impact of two key hyperparameters in Random Forests: `n_estimators` (number of trees in the forest) and `max_depth` (maximum depth of each tree).

**Hyperparameter Tuning for RandomForestRegressor:**

*   `print("\n\n----- Step 4: Hyperparameter Tuning for Random Forests -----")`: Prints a heading for Step 4, indicating hyperparameter tuning.

*   **Tuning `n_estimators` for `RandomForestRegressor`:**
    *   `n_estimators_vals = [10, 50, 100, 200, 500]`: Defines a list of different values for the `n_estimators` hyperparameter to be tested. These values represent the number of trees that will be in the Random Forest.
    *   `mae_scores_n_estimators_reg = []`: Initializes an empty list to store the MAE scores for each `n_estimators` value.
    *   `for n_est in n_estimators_vals:`: Starts a loop that iterates through each value in the `n_estimators_vals` list.
        *   `rf_reg = RandomForestRegressor(n_estimators=n_est, random_state=0)`: Creates a `RandomForestRegressor` model with the current `n_estimators` value (`n_est`) and a fixed `random_state` for reproducibility.
        *   `rf_reg.fit(iowa_X_train, y_train)`: Trains the `RandomForestRegressor` model using the Iowa housing training data.
        *   `predictions = rf_reg.predict(iowa_X_val)`: Makes predictions on the Iowa housing validation set using the trained model.
        *   `mae = mean_absolute_error(y_val, predictions)`: Calculates the MAE for the current model's predictions.
        *   `mae_scores_n_estimators_reg.append(mae)`: Appends the calculated MAE score to the `mae_scores_n_estimators_reg` list.
    *   `print("\nMAE scores for RandomForestRegressor with different n_estimators:", mae_scores_n_estimators_reg)`: After the loop finishes, this line prints the list of MAE scores corresponding to each `n_estimators` value tested.

*   **Tuning `max_depth` for `RandomForestRegressor`:**
    *   `max_depth_vals = [None, 5, 10, 15, 20]`: Defines a list of different values for the `max_depth` hyperparameter. `None` means no maximum depth limit.
    *   `mae_scores_max_depth_reg = []`: Initializes an empty list to store MAE scores for each `max_depth` value.
    *   `for max_d in max_depth_vals:`: Starts a loop that iterates through each value in `max_depth_vals`.
        *   `rf_reg = RandomForestRegressor(max_depth=max_d, n_estimators=100, random_state=0)`: Creates a `RandomForestRegressor` model with the current `max_depth` value (`max_d`), a *fixed* `n_estimators=100` (to isolate the effect of `max_depth`), and `random_state=0`.
        *   The rest of the loop is similar to the `n_estimators` tuning loop: train, predict, calculate MAE, and append to the scores list.
    *   `print("\nMAE scores for RandomForestRegressor with different max_depth:", mae_scores_max_depth_reg)`: Prints the list of MAE scores for each `max_depth` value.

**Hyperparameter Tuning for RandomForestClassifier:**

*   **Tuning `n_estimators` for `RandomForestClassifier`:**
    *   `n_estimators_vals_clf = [10, 50, 100, 200, 500]`: Defines `n_estimators` values to test for the classifier.
    *   `accuracy_scores_n_estimators_clf = []`: Initializes a list to store accuracy scores for each `n_estimators` value.
    *   The loop structure and operations are similar to the `n_estimators` tuning for `RandomForestRegressor`, but it uses `RandomForestClassifier`, trains on Digits data, and calculates `accuracy_score` instead of MAE.
    *   `print("\nAccuracy scores for RandomForestClassifier with different n_estimators:", accuracy_scores_n_estimators_clf)`: Prints accuracy scores for different `n_estimators`.

*   **Tuning `max_depth` for `RandomForestClassifier`:**
    *   `max_depth_vals_clf = [None, 5, 10, 15, 20]`: Defines `max_depth` values for the classifier.
    *   `accuracy_scores_max_depth_clf = []`: Initializes a list for accuracy scores for each `max_depth`.
    *   The loop structure is again similar, but for `RandomForestClassifier` and `max_depth` tuning, using a fixed `n_estimators=100`.
    *   `print("\nAccuracy scores for RandomForestClassifier with different max_depth:", accuracy_scores_max_depth_clf)`: Prints accuracy scores for different `max_depth` values.

**Plotting Hyperparameter Tuning Results:**

*   **Plots for `RandomForestRegressor`:**
    *   **`n_estimators` vs MAE Plot:** Creates a line plot showing how MAE changes as `n_estimators` varies for `RandomForestRegressor`. This plot helps visualize the relationship between the number of trees and regression performance.
    *   **`max_depth` vs MAE Plot:** Creates a line plot showing how MAE changes as `max_depth` varies for `RandomForestRegressor`. This visualizes the impact of tree depth on regression performance.

*   **Plots for `RandomForestClassifier`:**
    *   **`n_estimators` vs Accuracy Plot:** Creates a line plot of accuracy vs. `n_estimators` for `RandomForestClassifier` on the Digits dataset.
    *   **`max_depth` vs Accuracy Plot:** Creates a line plot of accuracy vs. `max_depth` for `RandomForestClassifier` on the Digits dataset.

In summary, Step 4 systematically explores the impact of `n_estimators` and `max_depth` hyperparameters on the performance of both `RandomForestRegressor` and `RandomForestClassifier`. By iterating through different hyperparameter values, training models, evaluating performance, and visualizing the results, students can understand how these hyperparameters influence model accuracy and complexity. These plots help in making informed decisions about hyperparameter settings for Random Forest models in practice.

In [None]:
#------------------------------------
# Step 5: Comparative Analysis and Visualization
#------------------------------------
print("\n\n----- Step 5: Comparative Analysis and Visualization -----")
# Explanation: Compare the performance of all models (Decision Tree, Bagging, Pasting, Random Forest) for both regression and classification using metrics and visualizations.

# 5.1 Collect MAE and MSE for all Regression Models
regression_model_names = ["Decision Tree", "Bagging", "Pasting", "Random Forest"]
mae_scores_reg = [mae_baseline_regressor, mae_bagging_regressor, mae_pasting_regressor, mae_forest_regressor]
mse_scores_reg = [mse_baseline_regressor, mse_bagging_regressor, mse_pasting_regressor, mse_forest_regressor]

# Print MAE and MSE values before plotting
print("\nRegression Model MAE Scores:")
for model_name, mae_score in zip(regression_model_names, mae_scores_reg):
    print(f"{model_name}: {mae_score:.2f}")

print("\nRegression Model MSE Scores:")
for model_name, mse_score in zip(regression_model_names, mse_scores_reg):
    print(f"{model_name}: {mse_score:.2f}")


# 5.2 Collect Accuracy for all Classification Models
classification_model_names = ["Decision Tree", "Bagging", "Pasting", "Random Forest"]
accuracy_scores_clf = [accuracy_baseline_classifier, accuracy_bagging_classifier, accuracy_pasting_classifier, accuracy_forest_classifier]

# Print Accuracy values before plotting
print("\nClassification Model Accuracy Scores:")
for model_name, accuracy_score in zip(classification_model_names, accuracy_scores_clf):
    print(f"{model_name}: {accuracy_score:.2f}")

# Explanation: We are collecting the performance metrics from previous steps for easy comparison.

# 5.3 Plotting MAE Comparison for Regression Models
plt.figure(figsize=(8, 5))
plt.bar(regression_model_names, mae_scores_reg, color='skyblue')
plt.ylabel("Mean Absolute Error (MAE)")
plt.title("Comparison of MAE for Regression Models")
plt.ylim(min(mae_scores_reg) * 0.9, max(mae_scores_reg) * 1.1) # Adjust y-axis limits for better visualization
for i, v in enumerate(mae_scores_reg):
    plt.text(i, v + 0.01 * max(mae_scores_reg), f"{v:.2f}", ha='center', va='bottom') # Add value labels on bars
plt.tight_layout()
plt.show()

# 5.4 Plotting MSE Comparison for Regression Models
plt.figure(figsize=(8, 5))
plt.bar(regression_model_names, mse_scores_reg, color='lightcoral')
plt.ylabel("Mean Squared Error (MSE)")
plt.title("Comparison of MSE for Regression Models")
plt.ylim(min(mse_scores_reg) * 0.9, max(mse_scores_reg) * 1.1) # Adjust y-axis limits for better visualization
for i, v in enumerate(mse_scores_reg):
    plt.text(i, v + 0.01 * max(mse_scores_reg), f"{v:.2f}", ha='center', va='bottom') # Add value labels on bars
plt.tight_layout()
plt.show()

# 5.5 Plotting Accuracy Comparison for Classification Models
plt.figure(figsize=(8, 5))
plt.bar(classification_model_names, accuracy_scores_clf, color='lightgreen')
plt.ylabel("Accuracy")
plt.title("Comparison of Accuracy for Classification Models (Digits Dataset)") # Updated title
plt.ylim(min(accuracy_scores_clf) * 0.9, max(accuracy_scores_clf) * 1.1) # Adjust y-axis limits for better visualization
for i, v in enumerate(accuracy_scores_clf):
    plt.text(i, v + 0.01 * max(accuracy_scores_clf), f"{v:.2f}", ha='center', va='bottom') # Add value labels on bars
plt.tight_layout()
plt.show()

## Interpretation of Step 5 Output and Final Conclusion (Revised)

This section analyzes the performance of four different machine learning models (Decision Tree, Bagging, Pasting, and Random Forest) on both regression and classification tasks, considering the corrected MSE value for Pasting. Here's a breakdown:

### Regression Task (Iowa Housing Dataset):

**Performance Metrics:**

| Model | MAE | MSE |
|---|---|---|
| Decision Tree | 27432.52 | 2790012884.63 |
| Bagging | 19014.07 | 1249452686.82 |
| Pasting | 24434.84 | 1295328430.72 |
| Random Forest | 19419.13 | 1220709259.25 |

**Observations:**

- **Bagging and Random Forest still outperform Decision Tree and Pasting** in terms of both MAE and MSE. Lower values indicate better predictive accuracy.
- **Bagging** has the lowest MAE (19014.07), suggesting it has the smallest average error in predicting housing prices.
- **Random Forest** has the lowest MSE (1220709259.25), indicating it handles larger errors and outliers slightly better.
- **Pasting**, with the corrected MSE, now shows a performance closer to Bagging but still not as good as Random Forest.


### Classification Task (Digits Dataset):

**Performance Metrics:**

| Model | Accuracy |
|---|---|
| Decision Tree | 0.85 |
| Bagging | 0.92 |
| Pasting | 0.87 |
| Random Forest | 0.94 |

**Observations:**

- **Bagging and Random Forest** significantly outperform Decision Tree and Pasting in terms of accuracy. Higher accuracy means better classification performance.
- **Random Forest** achieves the highest accuracy (0.94), demonstrating its effectiveness in recognizing patterns for accurate digit classification.


## Understanding the Trade-off between Bagging and Random Forest for Regression

This section focuses on clarifying the following conclusion about Bagging and Random Forest for regression tasks:

**3. For regression, Bagging demonstrates a slight edge with the lowest MAE, while Random Forest provides better handling of outliers and larger errors with a lower MSE. The choice between the two can depend on the specific problem and priorities regarding error types.**

### Understanding MAE and MSE:

- **MAE (Mean Absolute Error):** The average of the absolute differences between predicted and actual values. It treats all errors equally, regardless of size. A lower MAE means predictions are generally closer to true values.
- **MSE (Mean Squared Error):** The average of the squared differences between predicted and actual values. It penalizes larger errors more heavily due to squaring. A lower MSE suggests the model is less prone to large errors but can be sensitive to outliers.

### Bagging's Edge with Lower MAE:

- Bagging often achieves the lowest MAE, indicating that, on average, its predictions are closer to the actual values compared to Random Forest.
- If your priority is minimizing the average prediction error, regardless of occasional larger errors, Bagging might be a good choice.

### Random Forest's Advantage with Lower MSE:

- Random Forest often achieves the lowest MSE, indicating it's better at avoiding large prediction errors, even if it might have a slightly higher average error (MAE) than Bagging.
- The lower MSE suggests Random Forest is more robust to outliers, as it gives them less influence on the overall error metric.

### Choice Based on Problem and Priorities:

- **Sensitivity to Outliers:** If your dataset has outliers or you are particularly concerned about large prediction errors, Random Forest's lower MSE might make it a better choice.
- **Focus on Average Error:** If minimizing the average prediction error is your primary goal, and you are less concerned about occasional larger errors, Bagging's lower MAE might be preferable.
- **Business Context:** The best choice also depends on the specific business problem. For example, in some cases, a large prediction error might have more severe consequences than in others.

### Conclusion:

Based on the corrected output and observations:

1. **Ensemble methods (Bagging and Random Forest) remain superior to single Decision Trees** for both regression and classification, reducing overfitting and enhancing overall performance.

2. **Random Forest is generally preferred for classification tasks** due to its higher accuracy and robustness, particularly for complex datasets like the Digits dataset.

3. **For regression, Bagging demonstrates a slight edge with the lowest MAE**, while Random Forest provides better handling of outliers and larger errors with a lower MSE. The choice between the two can depend on the specific problem and priorities regarding error types.

4. **Pasting, with the corrected MSE, performs better than Decision Tree but not as well as Bagging or Random Forest**. It might be considered an alternative but not the primary choice.

The choice between Bagging and Random Forest for regression depends on your tolerance for different types of errors and the specific needs of your problem. If you need more consistent predictions with fewer large errors, Random Forest is a good choice. If minimizing the average error is most important, Bagging might be preferred.


The key takeaway remains that ensembles, especially Bagging and Random Forest, are powerful tools for improving predictive performance in machine learning. The choice between them depends on the specific task, dataset, and desired trade-offs between error metrics and computational cost.

### **Explanation of Code Cell for Step 5: Comparative Analysis and Visualization**

This code cell performs a **comparative analysis** of all the models implemented in this module: Decision Tree, Bagging, Pasting, and Random Forest, for both regression and classification tasks. It collects the performance metrics calculated in the previous steps and visualizes them using bar plots, allowing for a clear side-by-side comparison of the effectiveness of each method.

**Data Collection for Comparison:**

*   `print("\n\n----- Step 5: Comparative Analysis and Visualization -----")`: Prints a heading for Step 5, indicating comparative analysis.

*   **Regression Model Metrics:**
    *   `regression_model_names = ["Decision Tree", "Bagging", "Pasting", "Random Forest"]`: Defines a list of names for the regression models, which will be used as labels in the plots.
    *   `mae_scores_reg = [mae_baseline_regressor, mae_bagging_regressor, mae_pasting_regressor, mae_forest_regressor]`: Collects the MAE scores calculated for each regression model in Steps 0, 1, 2, and 3 into a list.
    *   `mse_scores_reg = [mse_baseline_regressor, mse_bagging_regressor, mse_pasting_regressor, mse_forest_regressor]`: Collects the MSE scores for each regression model.

*   **Classification Model Metrics:**
    *   `classification_model_names = ["Decision Tree", "Bagging", "Pasting", "Random Forest"]`: Defines names for classification models.
    *   `accuracy_scores_clf = [accuracy_baseline_classifier, accuracy_bagging_classifier, accuracy_pasting_classifier, accuracy_forest_classifier]`: Collects the accuracy scores for each classification model from Steps 0, 1, 2, and 3.

*   `# Explanation: We are collecting the performance metrics from previous steps for easy comparison.`: A comment explaining the purpose of this data collection.

**Bar Plots for Performance Comparison:**

*   **MAE Comparison for Regression Models:**
    *   `plt.figure(figsize=(8, 5))`: Creates a figure for the plot with a specified size.
    *   `plt.bar(regression_model_names, mae_scores_reg, color='skyblue')`: Creates a bar plot.
        *   `regression_model_names`: Provides the x-axis labels (model names).
        *   `mae_scores_reg`: Provides the bar heights (MAE values).
        *   `color='skyblue'`: Sets the color of the bars.
    *   `plt.ylabel("Mean Absolute Error (MAE)")`: Sets the y-axis label.
    *   `plt.title("Comparison of MAE for Regression Models")`: Sets the plot title.
    *   `plt.ylim(min(mae_scores_reg) * 0.9, max(mae_scores_reg) * 1.1)`: Sets the y-axis limits to provide better visualization, slightly expanding beyond the min and max MAE values.
    *   `for i, v in enumerate(mae_scores_reg): ...`:  This loop adds text labels on top of each bar to display the exact MAE value.
        *   `plt.text(i, v + 0.01 * max(mae_scores_reg), f"{v:.2f}", ha='center', va='bottom')`: Adds a text annotation at the top of each bar (`v`) at position `i` (bar index), formatted to two decimal places.
    *   `plt.tight_layout()`: Adjusts plot layout to prevent labels from overlapping.
    *   `plt.show()`: Displays the plot.

*   **MSE Comparison for Regression Models:**
    *   Creates a bar plot for MSE comparison for regression models, similar to the MAE plot but using `mse_scores_reg` and `color='lightcoral'`.

*   **Accuracy Comparison for Classification Models:**
    *   Creates a bar plot for Accuracy comparison for classification models, using `accuracy_scores_clf`, `classification_model_names`, `color='lightgreen'`, and updating the `plt.title` to indicate "Digits Dataset".

*   `# Explanation: ...`: A comment explaining the purpose of the bar plots: to visually compare model performance side-by-side.

In summary, Step 5 consolidates the performance metrics of all the models and presents them visually using bar plots. These plots provide a direct and easy-to-understand comparison of how Decision Trees, Bagging, Pasting, and Random Forests perform for both regression and classification tasks. By visually comparing MAE, MSE, and Accuracy across different models, students can quickly grasp the relative effectiveness of each method and observe the improvements gained by using ensemble techniques, particularly Random Forests, over single Decision Trees. The value labels on top of the bars enhance the readability and precision of the comparison.

In [None]:
import seaborn as sbn
#------------------------------------
# Step 6: Feature Importance in Random Forests
#------------------------------------
print("\n\n----- Step 6: Feature Importance in Random Forests -----")
# Explanation: Explore feature importance provided by Random Forest models for both regression and classification.

# 6.1 Feature Importance for RandomForestRegressor
feature_importances_reg = forest_regressor.feature_importances_
feature_names_reg = iowa_X_train.columns # Get feature names from training data

print("####################################################################################",
      "####################################################################################")
reg_models = [baseline_tree_regressor,
              #bagging_regressor,
              #pasting_regressor,
              forest_regressor,
              rf_reg]
clf_models = [baseline_tree_classifier,
              #bagging_classifier,
              #pasting_classifier,
              forest_classifier,
              rf_clf]
## Additional - More prectical
def plot_importance(model, features, num=15, save=False):
    print("Feature Importance Section: ",model.__class__.__name__)
    feature_imp = pd.DataFrame({"Value":model.feature_importances_,
                                "Feature": features.columns})
    plt.figure(figsize=(10,10))
    sbn.set(font_scale=1)
    sbn.barplot(x="Value", y="Feature",
                data=feature_imp.sort_values(by='Value', ascending=False)[0:num])
    plt.title(f"{model.__class__.__name__} Features")
    plt.tight_layout()
    plt.show()
    if save:
        plt.savefig(f"{model.__class__.__name__}importances.png")

for reg_model in reg_models:
    plot_importance(reg_model, iowa_X_train)

for clf_model in clf_models:
    digits_X_train = pd.DataFrame(digits_X_train, columns=[f'feature_{i}' for i in range(digits_X_train.shape[1])])
    plot_importance(clf_model, digits_X_train)
print("####################################################################################",
      "####################################################################################")

# Print all features before plotting
print("\nAll Features for Regression:")
for feature_name in feature_names_reg:
    print(feature_name)

# Sort feature importances in descending order and get top features
indices_reg = np.argsort(feature_importances_reg)[::-1]
top_n_features_reg = 10 # Display top N features
top_feature_indices_reg = indices_reg[:top_n_features_reg]
top_feature_names_reg = [feature_names_reg[i] for i in top_feature_indices_reg]
top_feature_importance_values_reg = feature_importances_reg[top_feature_indices_reg]

# 6.2 Plotting Feature Importances for RandomForestRegressor
plt.figure(figsize=(10, 6))
plt.title("Feature Importances - RandomForestRegressor (Top 10)")
plt.bar(range(top_n_features_reg), top_feature_importance_values_reg, align="center")
plt.xticks(range(top_n_features_reg), top_feature_names_reg, rotation=45, ha='right') # Rotate x-axis labels
plt.xlim([-1, top_n_features_reg])
plt.ylabel("Importance")
plt.tight_layout()
plt.show()

# 6.3 Feature Importance for RandomForestClassifier
feature_importances_clf = forest_classifier.feature_importances_
feature_names_clf = digits.feature_names # Feature names from Digits dataset (pixels)

print('digits.data\n',digits.data)


# Print all features before plotting
print("\nAll Features for Classification:")
for i in range(len(feature_names_clf)):
    print(f"pixel_{i}")  # Print pixel_0, pixel_1, etc.

# Sort feature importances in descending order and get top features
indices_clf = np.argsort(feature_importances_clf)[::-1]
top_n_features_clf = 10 # Display top N features
top_feature_indices_clf = indices_clf[:top_n_features_clf]
top_feature_names_clf = [f"pixel{i}" for i in top_feature_indices_clf] # Pixel names
top_feature_importance_values_clf = feature_importances_clf[top_feature_indices_clf]

# 6.4 Plotting Feature Importances for RandomForestClassifier
plt.figure(figsize=(10, 6))
plt.title("Feature Importances - RandomForestClassifier (Digits Dataset - Top 10 Pixels)") # Updated title
plt.bar(range(top_n_features_clf), top_feature_importance_values_clf, align="center", color='coral') # Different color for classifier plot
plt.xticks(range(top_n_features_clf), top_feature_names_clf, rotation=45, ha='right') # Rotate x-axis labels
plt.xlim([-1, top_n_features_clf])
plt.ylabel("Importance")
plt.tight_layout()
plt.show()

### **Explanation of Code Cell for Step 6: Feature Importance in Random Forests**

This code cell explores **feature importance** as provided by **Random Forest models**. Feature importance is a valuable tool to understand which features in the dataset are most influential in making predictions. Random Forests, as tree-based models, can naturally estimate feature importances based on how much each feature contributes to reducing impurity (e.g., Gini impurity for classification, variance for regression) across all trees in the forest.

**Feature Importance for RandomForestRegressor:**

*   `print("\n\n----- Step 6: Feature Importance in Random Forests -----")`: Prints a heading for Step 6, indicating feature importance exploration.

*   `feature_importances_reg = forest_regressor.feature_importances_`:
    Retrieves the feature importances from the already trained `forest_regressor` (RandomForestRegressor model from Step 3).
    *   `.feature_importances_`: This attribute of a trained `RandomForestRegressor` (and `RandomForestClassifier`) object stores the feature importances as an array. The importance of each feature is a numerical value, with higher values indicating greater importance.

*   `feature_names_reg = iowa_X_train.columns`:
    Gets the column names (feature names) from the Iowa housing training data (`iowa_X_train`). These names will be used to label the features in the plot.

*   **Sorting and Selecting Top Features (Regression):**
    *   `indices_reg = np.argsort(feature_importances_reg)[::-1]`: Sorts the feature importances in descending order and gets the indices.
        *   `np.argsort(feature_importances_reg)`: Returns the indices that would sort the `feature_importances_reg` array in ascending order.
        *   `[::-1]`: Slices the index array to reverse it, resulting in indices that sort in descending order (from most important to least important).
    *   `top_n_features_reg = 10`: Sets the number of top features to display in the plot (top 10 in this case).
    *   `top_feature_indices_reg = indices_reg[:top_n_features_reg]`: Selects the indices of the top N features from the sorted indices.
    *   `top_feature_names_reg = [feature_names_reg[i] for i in top_feature_indices_reg]`: Gets the names of the top N features using their indices and the `feature_names_reg` list.
    *   `top_feature_importance_values_reg = feature_importances_reg[top_feature_indices_reg]`: Gets the importance values of the top N features.

*   **Plotting Feature Importances for RandomForestRegressor:**
    *   `plt.figure(figsize=(10, 6))`: Creates a figure for the plot.
    *   `plt.title("Feature Importances - RandomForestRegressor (Top 10)")`: Sets the plot title.
    *   `plt.bar(range(top_n_features_reg), top_feature_importance_values_reg, align="center")`: Creates a bar plot.
        *   `range(top_n_features_reg)`: Provides x-axis positions for the bars (0 to 9 for top 10 features).
        *   `top_feature_importance_values_reg`: Provides the bar heights (importance values of top features).
        *   `align="center"`: Centers the bars on the x-axis ticks.
    *   `plt.xticks(range(top_n_features_reg), top_feature_names_reg, rotation=45, ha='right')`: Sets the x-axis tick positions and labels.
        *   `range(top_n_features_reg)`: Tick positions.
        *   `top_feature_names_reg`: Tick labels (feature names).
        *   `rotation=45, ha='right'`: Rotates x-axis labels by 45 degrees and horizontally aligns them to the right for better readability.
    *   `plt.xlim([-1, top_n_features_reg])`: Sets x-axis limits.
    *   `plt.ylabel("Importance")`: Sets y-axis label.
    *   `plt.tight_layout()`: Adjusts layout.
    *   `plt.show()`: Displays the plot.

**Feature Importance for RandomForestClassifier:**

*   `feature_importances_clf = forest_classifier.feature_importances_`:
    Retrieves feature importances from the trained `forest_classifier` (RandomForestClassifier).

*   `feature_names_clf = digits.feature_names`:
    Attempts to get feature names from `digits.feature_names`. For the Digits dataset, `digits.feature_names` is actually `None` because the features are pixel values without meaningful names.

*   **Sorting and Selecting Top Features (Classification):**
    *   Similar sorting and top feature selection logic as for regression.
    *   `top_feature_names_clf = [f"pixel{i}" for i in top_feature_indices_clf]`:  Since `digits.feature_names` is `None`, this line creates artificial feature names like "pixel0", "pixel1", etc., for the top pixel features. This provides meaningful labels for the plot.

*   **Plotting Feature Importances for RandomForestClassifier:**
    *   Similar plotting logic as for regression, but:
        *   `plt.title("Feature Importances - RandomForestClassifier (Digits Dataset - Top 10 Pixels)")`: Updated title to specify Digits dataset and "Pixels".
        *   `color='coral'`: Uses a different color for the classifier plot.

In summary, Step 6 demonstrates how to extract and visualize feature importances from Random Forest models. It shows how to retrieve the `feature_importances_` attribute, sort and select top features, handle feature names (including creating artificial names for pixel features in the Digits dataset), and create bar plots to display the importance of each feature. These plots provide valuable insights into which features are most influential in the models' predictions, aiding in understanding the data and model behavior. This step highlights the interpretability aspect of Random Forests through feature importance analysis.

## Module 6: Ensemble Methods - Homework Assignment

**Instructions:** Please answer the following questions to the best of your ability. For coding tasks, you can modify and run the Colab Notebook provided for Module 6. Please submit your answers and code modifications (if any) as instructed by your instructor.

---

### Conceptual Understanding

1.  **Ensemble Learning Principles**

    *   Explain in your own words what **ensemble learning** is and why it is often more effective than using a single machine learning model.  Use the "wisdom of the crowd" analogy to support your explanation.
        * **Answer: Ensemble Learning is a combination of different models. It is more effective than using a single machine learning model because it bases on diverse models' predictions. Like wisdom of the crowd analogy, it brings multiple models together to achieve better prediction.**

    *   What are the two main approaches to ensemble learning we discussed in this module in terms of how base learners are trained? Briefly describe **Bagging** (Bootstrap Aggregating) and **Boosting**.  *(Note: While we focused on Bagging-related methods, briefly mentioning Boosting for broader understanding is helpful)*.
        * **Answer: Bagging (Bootstrap Aggregating) is an approach that trains multiple models independently at the same time with randomly selected piece of data. It gives the regression problem prediction generally the mean of the models, otherwise for the classification problem prediction it gives the most voted class as the prediction.**
        * **Unlike the Bagging, Boosting is an approach that trains multiple models sequentially which provides the next model to fix the errors of the previous model. On the other words, subsequent learner mostly focuses on misclassified observations.**
        * *
2.  **Bagging vs. Pasting**

    *   What is the key difference between **Bagging** and **Pasting** in terms of how they sample the training data to train multiple base learners?
    *   * **As I mentioned above, Boosting contains replacement of samples, which leads instances to be repeated inside the training subsets. Meanwhile, Pasting doesn't contain duplicated instances, instead it contains distinct subsets. So, Bagging with bootstrap samples while Pasting without replacement.**
    *   In what situations might Bagging be preferred over Pasting, and in what situations might Pasting be preferred over Bagging? Consider factors like dataset size and the goal of variance reduction.
    *   * **Boosting is preferred when the dataset is smaller or noisy to avoid the model to learn overly complex patterns of trained data so it decreases the variance by replacing instances. While Pasting is preferred when data is larger to leverage seperate instances efficiently to avoid biases or underfitting.**
        * *

3.  **Random Forests: Adding Randomness**

    *   Explain what a **Random Forest** is and how it builds upon the concept of Bagging. What is the additional source of randomness introduced in Random Forests compared to basic Bagging?
    *   * **RandomForest is an ensemble method stemming from Bagging which consists of randomly selected features at each split of decision trees insted of including all features.**

    *   Why is this additional randomness (feature randomness) helpful in Random Forests? How does it contribute to improved model performance and generalization?
    *   * **Additional/Feature Randomness is helpful in Random Forests because it reduces the correlation between the trees which provide to avoid to heavily rely on a few strong features. So, it reduces varince/overfitting it thereby contributes to improve model performance and generalization.**
        * *
4.  **Hyperparameters in Random Forests**

    *   Name and briefly describe **two** important hyperparameters in Random Forest models (`RandomForestRegressor` and `RandomForestClassifier`) that we experimented with in this module.
    *   * **First of all, let's understand what `n_estimators` and `max_depth` mean. n_estimators refers to total number of trees, while max_depth refers to the max length that the trees can have.**

    *   Explain in general terms how increasing the value of `n_estimators` and `max_depth` might affect the performance and complexity of a Random Forest model.
    *   * **Increasing `n_estimators` generally improves performance and computational complexity but reduces variance. On the other hand, Increasing `max_depth` increases the complexity and captures more complex patterns which can lead to overfitting.**
        * *
3.  **Random Forest Hyperparameter Tuning: `n_estimators` for Classification**

    *   **Task:** Using the Colab Notebook, revisit Step 4 (Hyperparameter Tuning) focusing on `RandomForestClassifier`. Experiment with a wider range of `n_estimators` values than tested in the notebook (e.g., `n_estimators_vals_clf = [10, 20, 30, 40, 50, 75, 100, 150, 200]`).

---
**Grading:** This homework will be graded based on the completeness and correctness of your answers to the conceptual questions, the successful implementation of the practical exercises, the quality and clarity of your code modifications, and the thoughtfulness and depth of your analysis and discussions.