**Author:** Shahab Fatemi

**Email:** shahab.fatemi@umu.se   ;   shahab.fatemi@amitiscode.com

**Created:** 2025-08-10

**Last update:** 2025-10-02

**MIT License** — Shahab Fatemi (2025); For use in the *Machine Learning in Physics* course, Umeå University, Sweden; See the full license text in the parent folder.

<hr>

📢 <span style="color:red"><strong> Note for Students:</strong></span>

* Before working on the labs, review your lecture notes.

* Please read all sections, code blocks, and comments **carefully** to fully understand the material. Throughout the labs, my instructions are provided to you in written form, guiding you through the materials step-by-step.

* All concepts covered in this lab are part of the course and may be included in the final exam.

* I strongly encourage you to work in pairs and discuss your findings, observations, and reasoning with each other.

* If something is unclear, don't hesitate to ask.

* I have done my best to make the lab files as bug-free (and error-free) as possible, but remember: *there is no such thing as bug-free code.* If you observed any bugs, errors, typos, or other issues, I would greatly appreciate it if you report them to me by email. Verbal notifications are not work, as I will likely forget 🙂

* Your answers for the "⚡ Mandatory" sections of each lab <span style="color:red"><strong>must be submitted before the start of the next lab session</strong></span>.

ENJOY WORKING ON THIS LAB.
***

# 🛠️ Purpose and Learning Outcomes:

- This lab aims to introduce you to ensemble learning techniques. You will learn how to implement and evaluate various ensemble models using Scikit-Learn.

- Our intention in this notebook is to use various ensemble models to solve a multi-feature (multi-dimensional) classification problem.

- You learn how to cross-validate and hyperparameter tune your models to improve their performance.

***

In [None]:
import sys
import os
sys.path.append(os.path.abspath('../utils'))
from notebook_config import *

***
# Ensemble Learning

Ensemble learning is a powerful ML technique that enhances the performance of predictive models by combining the predictions of multiple base learners (weak learners). Instead of relying on just one strong model, we use a group of "base/weak learners" and blend their answers. This usually gives us more accurate and reliable results because the strengths of one model can help make up for the weaknesses of another.

In this section, we will go over the main ideas behind ensemble learning that we talked about in class, plus introduce a few new Ada-based models we did not cover or discuss. By the end, you will have an understanding of how these methods work and how they can be used in real-world ML projects. Some of these models, as I explained, are very popular in training ML models.
***

- In the code section below, we have a function that generates a dataset for classification, simulating a real-world scenario with multiple features. All generated features are made as `informative`, meaning they directly contribute to the class labels. The non-informative features (also called "noise" or "redundant" features) do not contribute to the class labels. Here, we want all features to be informative, therefore, `n_informative` is set to the total number of features. You can read about it in the SciKit-Learn documentation on `make_classification()` function.

- Another new aspect of the developed function below is error handling in the function using `assert`. The `assert` statements in the function act as guardrails, ensuring that the input parameters meet necessary preconditions for the function to operate correctly. It is a good practice to use `assert` in your codes and to evaluate the input parameters.

In [None]:
from sklearn.datasets import make_classification

# Generate a classification dataset
def make_multiclasses_classification(n_samples=1000, 
                                     n_features=5  , 
                                     n_classes=3   , 
                                     noise_std=0.3 ):
    assert n_samples  >= 100, "Number of samples must be at least 100."
    assert n_features >= 2, "Number of features must be at least 2."
    assert n_classes  >= 2 and n_classes <= 5, "Number of classes must be between 2 and 5."

    X, y = make_classification(n_samples=n_samples,
                               n_features=n_features,
                               n_informative=n_features,  # all features are informative
                               n_redundant=0,
                               n_classes=n_classes,
                               n_clusters_per_class=1,
                               class_sep=1.5,                               
                               random_state=42)
    
    # Optional: Add Gaussian noise (like in make_blobs version)
    noise = np.random.normal(loc=0.0, scale=noise_std, size=X.shape)
    X_noisy = X + noise

    return X_noisy, y

Here, we generate data using our `make_multiclasses_classification` function, and visualize the training dataset. We generate a grid of scatter plots (i.e., a **pairwise scatter plot matrix**) for the training data, where each subplot shows the relationship between two different features. The diagonal subplots display histograms of individual features. In the previous notebook on Decision Trees, we used Seaborn's `pairplot` to create pairwise plots from a Pandas DataFrame. In this notebook, I demonstrate an alternative approach to visualizing pairwise data without relying on the Pandas framework.

In [None]:
from sklearn.model_selection import train_test_split

# ========== MAIN ==========
n_samples  = 2000  # Number of samples
n_features = 5     # Number of features
n_classes  = 3     # Number of classes
noise_std  = 0.1   # Standard deviation of Gaussian noise

# Generate dataset
X, y = make_multiclasses_classification(n_samples, n_features, n_classes, noise_std)

# Separate into training and test sets.
# I've used 70% of the data for training and 30% for testing.
# Often, a 80-20 or 70-30 split is used in practice.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(f"Dataset shape: X_train={X_train.shape}, y_train={y_train.shape}, X_test={X_test.shape}, y_test={y_test.shape}")

# ===== plot data =====
fig=plt.figure( figsize=(15, 15) )
plt_num = 1  # Initialize plot number

for i in range(n_features):
    for j in range(n_features):
        ax = fig.add_subplot(n_features, n_features, plt_num)
        if(i == j):
            ax.hist(X_train[:, i], bins=25, color='gray')
        else:
            ax.scatter(X_train[:, j], X_train[:, i], c=np.array(colors)[y_train], s=30, alpha=0.3)
                
        if(i == n_features-1):
            ax.set_xlabel(f'$x_{{{j}}}$', fontsize=22)
        
        if(j==0):
            ax.set_ylabel(f'$x_{{{i}}}$', fontsize=22)

        ax.grid(True)
        plt_num +=1

plt.show()

***
We want to use Ensemble models to solve this multi-feature classification problem. Before proceeding with this notebook, make sure you have installed the following Python packages, as they are necessary for running the next code sections:

- `xgboost`
- `lightgbm`
- `catboost`

You can install these packages using one of the following methods:

- If you use `pip`:

```bash
pip install xgboost lightgbm catboost
```

- If you use `conda`:

```bash
conda install -c conda-forge xgboost lightgbm catboost
```

In worst case that you could not install the packages, you can comment out the code sections that use these packages. Do not spend a lot of time trying to install them. If you know your environment, it should not take more than a minute to install them.

***

Below, I've developed a comprehensive class named `ClassifierComparison` to evaluate and compare various non-parametric classification models. I assume after almost 5 weeks of using Python, you are familiar with such type of code and you should know how to read and interpret it. However, if you have any questions about the code, please ask me.

The `fit_models` function trains a series of classifiers (Decision Tree, Random Forest, Bagging, AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost) on the training data, measures their training and prediction time (computation run-time), and calculates key performance metrics (accuracy, precision, recall, and F1-score). 

The `print_summary` function displays the model performance metrics. It once prints the metrics sorted by accuracy and once sorted by training time for easy comparison. Then we visualize the confusion matrix for each trained model and visualize the decision boundaries learned by each model for a specified pair of features. For plotting the decision boundary, we need to select 2 feature because we show them on a 2D plane, providing a graphical representation of how each classifier separates different classes in the feature space. Please note that our data is multi-dimensional.

Carefully study the `ClassifierComparison` class in the code section below. The code is lengthy, but I have added comments to help you understand each part. If you have any questions about the code, please ask me.

In [None]:
import time
import pandas as pd
from matplotlib.colors import ListedColormap
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (accuracy_score, precision_score, 
                             recall_score, f1_score, 
                             confusion_matrix, ConfusionMatrixDisplay)
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (RandomForestClassifier, BaggingClassifier, 
                              AdaBoostClassifier, GradientBoostingClassifier)
from xgboost import XGBClassifier        # Requires installation of the package ; Not a native function in sklearn
from lightgbm import LGBMClassifier      # Requires installation of the package; Not a native function in sklearn
from catboost import CatBoostClassifier  # Requires installation of the package; Not a native function in sklearn

# This class is designed to compare various classifiers on a given dataset.
class ClassifierComparison:
    def __init__(self, X, y, test_size=0.3, use_bootstrap=True, random_state=42):
        # Split data stratified by labels
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=test_size, stratify=y, random_state=random_state)
        
        # Scale features
        scaler       = StandardScaler()
        self.X_train = scaler.fit_transform(X_train)
        self.X_test  = scaler.transform(X_test)

        # Take local copies for later use
        self.y_train = y_train
        self.y_test  = y_test
        self.use_bootstrap = use_bootstrap
        self.models     = {}
        self.results    = {}
        self.results_df = None

    def fit_models(self):
        # Define the classifiers to be compared
        self.models = {
            'Decision Tree': DecisionTreeClassifier(random_state=42),
            'Random Forest': RandomForestClassifier(n_estimators=100, bootstrap=self.use_bootstrap, random_state=42),
            'Bagging' : BaggingClassifier(n_estimators=100, bootstrap=self.use_bootstrap, random_state=42),
            'AdaBoost': AdaBoostClassifier(n_estimators=100, algorithm='SAMME', random_state=42),
            'Gradient Boosting': GradientBoostingClassifier(n_estimators=100, random_state=42),
            'XGBoost' : XGBClassifier(n_estimators=100, eval_metric='mlogloss', random_state=42),
            'LightGBM': LGBMClassifier(n_estimators=100, random_state=42),
            'CatBoost': CatBoostClassifier(verbose=0, iterations=100, random_state=42)
        }

        results_list = [] # List to store results for DataFrame

        for name, model in self.models.items():
            # tic-toc the training and prediction time
            start_time_train = time.time()          # start the timer for training
            model.fit(self.X_train, self.y_train)   # train the model
            end_time_train = time.time()            # stop the timer for training
            
            start_time_pred = end_time_train        # start the timer for prediction
            y_pred = model.predict(self.X_test)     # make predictions
            end_time_pred = time.time()             # stop the timer for prediction

            # Calculate metrics and run time
            metrics = {
                'Model': name,
                'Accuracy': accuracy_score(self.y_test, y_pred),
                'Precision': precision_score(self.y_test, y_pred, average='weighted', zero_division=0),
                'Recall': recall_score(self.y_test, y_pred, average='weighted'),
                'F1 Score': f1_score(self.y_test, y_pred, average='weighted'),
                'Training Time (s)': (end_time_train - start_time_train),
                'Prediction Time (s)': (end_time_pred - start_time_pred),
                'Total Time (s)': (end_time_pred - start_time_train)
            }

            self.results[name] = {
                'model': model,
                'confusion_matrix': confusion_matrix(self.y_test, y_pred)
            }

            results_list.append(metrics)

        # Create a DataFrame from the results list
        self.results_df = pd.DataFrame(results_list)

    def print_summary(self):
        print("\n")
        print("------ Results Sorted by Accuracy ------")
        print(self.results_df.sort_values(by='Accuracy', ascending=False).to_string(index=False))

        print("\n")
        print("------ Results Sorted by Total Time ------")
        print(self.results_df.sort_values(by='Total Time (s)', ascending=True).to_string(index=False))

    def plot_confusion_matrices(self):
        for name, result in self.results.items():
            disp = ConfusionMatrixDisplay(confusion_matrix=result['confusion_matrix'])
            disp.plot()
            plt.title(f"Confusion Matrix: {name}")
            plt.show()

    # Plot decision boundaries for 2 features of different classifiers.
    # The default feature indices are (0, 1), but you can specify any two features.
    def plot_decision_boundaries(self, feature_indices=(0, 1)):
        i, j = feature_indices
        if self.X_train.shape[1] < 2:
            print("Decision boundary plot requires at least 2 features.")
            return

        h = 0.01
        x_min, x_max = self.X_train[:, i].min() - 0.5, self.X_train[:, i].max() + 0.5
        y_min, y_max = self.X_train[:, j].min() - 0.5, self.X_train[:, j].max() + 0.5
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                             np.arange(y_min, y_max, h))

        X_vis = np.zeros((xx.size, self.X_train.shape[1]))
        X_vis[:, i] = xx.ravel()
        X_vis[:, j] = yy.ravel()

        for name, result in self.results.items():
            model = result['model']
            # Predict the decision boundary. 
            # We use try-except to handle any potential errors during prediction.
            # Google the "try-except" in Python and learn it.
            try:
                prd = model.predict(X_vis)
                prd = prd.reshape(xx.shape)
                
                # Create a colormap for the regions
                region_cmap = ListedColormap(colors[:len(np.unique(self.y_train))])
                point_colors = np.array(colors)[self.y_train]

                plt.figure()
                plt.contourf(xx, yy, prd, alpha=0.3, cmap=region_cmap)
                plt.scatter(self.X_train[:, i], self.X_train[:, j],
                            c=point_colors, edgecolor='k', s=40, alpha=0.8)
                plt.xlabel("$x_{i}$", fontsize=14)
                plt.ylabel("$x_{j}$", fontsize=14)
                plt.title(f"Decision Boundary: {name} (Features x{i} vs x{j})")
                plt.grid(True, linestyle='--', alpha=0.7)
                plt.show()
            except Exception as e:
                print(f"Skipping {name}: failed to predict decision boundary. Error: {e}")

Now we create an instance of the `ClassifierComparison` class using training and test data. The we fit multiple classification models to the training data, evaluate their performance on the test data, display confusion matrices for each model, and plot their decision boundaries using the first two features.

In [None]:
# ===== MAIN =====
n_samples  = 2000  # Number of samples
n_features = 5     # Number of features
n_classes  = 3     # Number of classes
noise_std  = 0.1   # Standard deviation of Gaussian noise

# Generate a multiclass classification dataset
X, y = make_multiclasses_classification(n_samples, n_features, n_classes, noise_std)

# Create an instance of ClassifierComparison and fit the models
clf = ClassifierComparison(X, y)
clf.fit_models()

In [None]:
# Print the summary of results
clf.print_summary()

***

### ✅ Check your understanding

- Carefully study the report above and investigate different models accuracy and run-time. Which model is the fastest? Which model is the slowest? Which model has the highest accuracy? Is there any trade-off between accuracy and run-time?

⚠️ Do not generalize your answers. Your answers are specific to this dataset and the parameters used in the models.

***

You can also plot the confusion matrix for each model using the `plot_confusion_matrices` but I've commented it out as it does not add much value since we already have the accuracy, precision, recall, and F1-score metrics in the summary table. However, if you want to see the confusion matrices, you can uncomment the line below.

In [None]:
# Plot confusion matrices for each model
# clf.plot_confusion_matrices()

In [None]:
# Plot decision boundaries obtained from each model for the first two features
clf.plot_decision_boundaries((0, 1))

***
### ✅ Check your understanding

- Increase the number of input features from 5 to 50 and re-run the classifier comparison. What changes do you observe in model performance and computation time? Which classifier handles the increased feature space most effectively? I suggest to write a spearate code block for this, because you want to compare the results with the previous results.

- In our developed class, we used `StandardScaler()` for feature scaling. Read SciKit-Learn page on `sklearn.preprocessing` class. There you see several ther feature scaling methods (e.g., MinMaxScaler). Read about them and think on which scalar is more suitable for your data. You should choose the one that suits your data the best. You can also try different scalers and see how they affect the model performance. This is indeed a difficult task.

***

### ⚠️⚠️ IMPORTANT ⚠️⚠️
Generally speaking, you see how many different parameters and choices you have in building a machine learning model. This is why machine learning is more of an art than a science. There is no one-size-fits-all solution. You need to experiment and find what works best for your specific problem and dataset.
***

## Room for Improvement?

Are we happy with the results we obtained above? Can we do better?

Of course we can! Before that, lets recap the techniques we can use to improve our model's performance: Cross-Validation and Hyperparameter Tuning.

### Cross-Validation

In case you forgot, cross validation is a fundamental technique for evaluating and improving the generalizability of ML models. It plays a key role in ensuring that your model performs well not just on the training data, but also on new data. For classification problems using `StratifiedKFold` is highly recommended, especially when the dataset has imbalanced classes. See your earlier lecture notes or check out [this link](https://scikit-learn.org/stable/modules/cross_validation.html).

### Hyperparameter Tuning: Optimizing model performance

Hyperparameters are settings that define how a model operates, such as the maximum depth of a decision tree, the number of estimators in an ensemble method, or the learning rate for gradient boosting. Choosing the right combination of hyperparameters can significantly impact a model's accuracy, stability, and overall effectiveness.

Rather than relying on default settings or trial-and-error, hyperparameter tuning systematically searches for the optimal configuration. One popular method for this task is `GridSearch`, which combines the power of exhaustive search and cross-validation. `GridSearch` works by defining a "grid" of hyperparameter values you want to test. For each combination in the grid, the model is trained and validated using cross-validation (e.g., StratifiedKFold). By evaluating performance across all combinations, `GridSearchCV` identifies the hyperparameter settings that provide the best results based on the chosen metric, such as accuracy, precision, or mean squared error.

The combination of cross-validation and hyperparameter tuning is a cornerstone of **modern machine learning workflows (pipelines)**. Together, these techniques make sure that your model is not only well evaluated but also fine tuned for optimal performance. By investing time in these steps, you can build models that generalize well to new data, avoiding pitfalls like overfitting or underfitting and achieving better results across the board.



***

### ⚠️⚠️ REMINDER ⚠️⚠️
Generally speaking, you see how many different parameters and choices you have in building a machine learning model. This is why machine learning is more of an art than a science. There is no one-size-fits-all solution. You need to experiment and find what works best for your specific problem and dataset.


⚠️ NOTE:
**You need to perform cross-validation and hyperparameter tuning to improve your model's performance for your FINAL, practical project. Write it down in your to-do list, and remember it, not only for your project, but for any future machine learning tasks you work on.**

***


In the code section below, I have developed the modified version of `ClassifierComparison` class, named `ClassifierComparisonOpt`, as optimizer. This class integrates both cross-validation and hyperparameter tuning to give a more accurate and fair comparison across popular classifiers like Decision Trees, Random Forests, XGBoost, and others. I intentionally opt out LightGBM and CatBoost to reduce the execution time, and I recommend not to include them in this lab. 

Our new class automatically scales features, runs grid search over predefined hyperparameter ranges using k-fold cross-validation, and evaluates the best model on a hold-out test set. This ensures that each model is tuned and validated properly before comparing their accuracy, precision, F1 score, confusion matrices, and decision boundaries.

First, focus on `get_models_with_params` function developed in `ClassifierComparisonOpt` class. The function defines a dictionary of popular classifiers paired with their respective hyperparameter grids (perhaps not all hyperparameters in that model, but rather the most important ones). Each entry consists of a model instance and a dictionary specifying the range of values to explore for key hyperparameters like `max_depth`, `n_estimators`, `learning_rate`, or `min_samples_split`. This setup allows the class to systematically apply grid search using `GridSearchCV` to each model, verify that every algorithm is tuned over a meaningful range of settings for fair and optimized comparison.

Then, the `fit_models` function performs hyperparameter tuning using `GridSearchCV` with cross-validation for each classifier, fits the best model to the training data, evaluates it on the test set, and records performance metrics, and the best parameters found.

In [None]:
from sklearn.model_selection import GridSearchCV, StratifiedKFold

# This is an optimized version of the ClassifierComparison class that includes 
# additional features such as cross-validation, hyperparameter tuning, and more detailed metrics. 
# It is designed to handle larger datasets and provide a more comprehensive analysis of classifier performance.
# -----------------------------------------------------------------------------------------------------------
# Feel free to use it in your projects, with some modifications e.g., with classifiers and hyperparameters.
# -----------------------------------------------------------------------------------------------------------
class ClassifierComparisonOpt:
    def __init__(self, X, y, test_size=0.3, use_bootstrap=True, random_state=42, cv_folds=5):
        # Split data stratified by labels
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=test_size, stratify=y, random_state=random_state)

        # Scale features
        scaler = StandardScaler()
        self.X_train = scaler.fit_transform(X_train)
        self.X_test  = scaler.transform(X_test)

        self.y_train       = y_train
        self.y_test        = y_test
        self.use_bootstrap = use_bootstrap
        self.cv_folds      = cv_folds
        self.models        = {}
        self.results       = {}
        self.results_df    = None

    # Define the classifiers and their hyperparameters
    def get_models_with_params(self):
        return {
            'Decision Tree': (DecisionTreeClassifier(random_state=42), {
                'max_depth': [None, 3, 5, 10, 20],
                'min_samples_split': [2, 5, 7, 10]
            }),
            'Random Forest': (RandomForestClassifier(bootstrap=self.use_bootstrap, random_state=42), {
                'n_estimators': [50, 100, 200],
                'max_depth': [None, 3, 5, 10, 20],
                'min_samples_split': [2, 5, 7, 10]
            }),
            'Bagging': (BaggingClassifier(bootstrap=self.use_bootstrap, random_state=42), {
                'n_estimators': [50, 100, 200],
                'max_samples': [0.5, 1.0],
                'oob_score': [True, False],
            }),
            # SAMME: Stagewise Additive Modeling using a Multi-class Exponential loss
            'AdaBoost': (AdaBoostClassifier(algorithm='SAMME', random_state=42), {
                'n_estimators': [50, 100, 200],
                'learning_rate': [0.01, 0.1, 0.2, 0.5, 1.0]
            }),
            'Gradient Boosting': (GradientBoostingClassifier(random_state=42), {
                'n_estimators': [50, 100, 200],
                'learning_rate': [0.01, 0.1, 0.2, 0.5, 1.0],
                'max_depth': [3, 5, 10]
            }),
            'XGBoost': (XGBClassifier(eval_metric='mlogloss', random_state=42), {
                'n_estimators': [50, 100, 200],
                'learning_rate': [0.01, 0.1, 0.2, 0.5, 1.0],
                'max_depth': [3, 5, 10]
            }),
            #'LightGBM': (LGBMClassifier(random_state=42), {
            #    'n_estimators': [50, 100, 200],
            #    'learning_rate': [0.01, 0.1, 0.2, 0.5, 1.0],
            #    'max_depth': [-1, 5, 10]
            #}),
            #'CatBoost': (CatBoostClassifier(verbose=0, random_state=42), {
            #    'iterations': [50, 100, 200],
            #    'learning_rate': [0.01, 0.1, 0.2, 0.5, 1.0],
            #    'depth': [4, 6, 10]
            #})
        }

    def fit_models(self):
        results_list = []
        cv = StratifiedKFold(n_splits=self.cv_folds, shuffle=True, random_state=42)
        models_with_params = self.get_models_with_params()

        for name, (model, param_grid) in models_with_params.items():
            print(f"Tuning {name} ...")
            
            grid_search = GridSearchCV(model, param_grid, scoring='accuracy', cv=cv, n_jobs=-1)
            
            start_train = time.time()
            grid_search.fit(self.X_train, self.y_train)
            end_train = time.time()

            best_model = grid_search.best_estimator_
            y_pred = best_model.predict(self.X_test)
            end_pred = time.time()

            self.models[name] = {
                'model': best_model,
                'confusion_matrix': confusion_matrix(self.y_test, y_pred)
            }

            metrics = {
                'Model': name,
                'Accuracy': accuracy_score(self.y_test, y_pred),
                'Precision': precision_score(self.y_test, y_pred, average='weighted', zero_division=0),
                'Recall': recall_score(self.y_test, y_pred, average='weighted'),
                'F1 Score': f1_score(self.y_test, y_pred, average='weighted'),
                'Best Params': grid_search.best_params_,
                'Training Time (s)': (end_train - start_train),
                'Prediction Time (s)': (end_pred - end_train),
                'Total Time (s)': (end_pred - start_train)
            }

            results_list.append(metrics)

        self.results_df = pd.DataFrame(results_list)

    def print_summary(self):
        print("\n------ Results Sorted by Accuracy ------")
        print(self.results_df.sort_values(by='Accuracy', ascending=False).to_string(index=False))

        print("\n------ Results Sorted by Total Time ------")
        print(self.results_df.sort_values(by='Total Time (s)', ascending=True).to_string(index=False))

    # Show feature importance for models that support it          
    def show_feature_importance(self):
        importance = {}

        for name, result in self.models.items():
            model = result['model']
            if hasattr(model, 'feature_importances_'):
                importance[name] = model.feature_importances_
            elif hasattr(model, 'coef_'):
                coef = model.coef_
                if coef.ndim == 1:
                    importance[name] = np.abs(coef)
                else:
                    importance[name] = np.mean(np.abs(coef), axis=0)
            else:
                print(f"Feature importance not available for model {name}")

        for name, imp in importance.items():
            sorted_idx = np.argsort(imp)[::-1]
            plt.figure()
            plt.bar(range(len(imp)), imp[sorted_idx], align='center')
            plt.xticks(range(len(imp)), sorted_idx)
            plt.title(f"Feature importance for {name}")
            plt.xlabel("Feature index")
            plt.ylabel("Importance score")
            plt.grid(True, linestyle='--', alpha=0.6)
            plt.show()

The code section below may take a while to run, depending on your machine's performance and the number of hyperparameter combinations being tested. Please be patient while it executes. On my office desktop, it takes ~27 seconds to run, and indeed the XGBoost model is the most time-consuming among listed models. You cans also exclude it from the list of models to speed up the execution time.

In [None]:
# ===== MAIN =====
# optimized classifiers and find their hyperparameters
clf_opt = ClassifierComparisonOpt(X, y)
clf_opt.fit_models()

In [None]:
# Print the summary of results
clf_opt.print_summary()

In [None]:
# Show feature importance for each model
clf_opt.show_feature_importance()

***
### ⚡ Mandatory submission

- Study the results obtained from the optimized classifier comparison. Compare these results with those from the initial comparison without cross-validation and hyperparameter tuning. Which models showed the most improvement? How did the computation time change? Shortly discuss your observations.

- What do the feature importance plots tell you? You can read about feature importance in SciKit-Learn [here](https://scikit-learn.org/stable/modules/ensemble.html#feature-importance).

***

## Parallelization

Parallelization in ML allows multiple computations to run simultaneously, significantly speeding up training and evaluation, especially when working with many classifiers or large datasets. Some classifiers like `Random Forest`, `Bagging`, and `Gradient Boosting` natively support parallel training by setting the `n_jobs` parameter. If `n_jobs=k` then computations are partitioned into `k` parallel jobs, and run on k CPU cores of the machine. If `n_jobs=-1` then all cores available on the machine are used. Note that because of inter-process communication overhead, the speedup might not be linear (i.e., using k jobs will unfortunately not be k times as fast). Significant speedup can still be achieved though when building a large number of trees, or when building a single tree requires a fair amount of time (e.g., on large datasets). When tuning hyperparameters across multiple models, `GridSearchCV` also supports parallelization with `n_jobs=-1`, allowing it to train and evaluate multiple parameter combinations at once. You can see the `n_jobs` settings in the `ClassifierComparisonOpt` class we developed above.

***
# ⛷️ Exercise

In this exercise, you should work on the **Breast Cancer** dataset from from `sklearn.datasets.load_breast_cancer` to build, train, and evaluate a classifier. Your goal is to classify breast cancer cases as **malignant** (cancer) or **benign** (no cancer) using the provided features.

Carefully analyze the dataset, train your model on the training data, and evaluate its performance using appropriate metrics.

Here are some hints and todos:
- How to load the dataset:
```python
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X, y = data.data, data.target
```

- Explore the dataset:
    * Check the shape of X and y to understand the dimensions.
    * Display the feature names and target classes to familiarize yourself with the data.
    * Ensure there are no missing values or anomalies in the data.

- Split, Scale, (and Impute the data if needed) similar to what we did in `ClassifierComparisonOpt`.

- Use different classifiers and experiment with hyperparameters to improve the model's performance. Use what you learned earlier from the `ClassifierComparisonOpt` class.

- Finally, evaluate the model using the following metrics:
    * Confusion Matrix
    * Classification Report
    * Accuracy Score
***
END
***