# Compare Classifier Package Tutorial
A short tutorial on comparing performances of multiple classifiers with Voting or Stacking ensemble methods. **Please note this package only works within the world of scikit-learn.**

## The Prerequisites

### An Example Dataset

Here is an example dataset we are getting started with:



![example_dataset](./img/dataset.png "Example Dataset")

This dataset contains physicochemical features as numerical values of wines, along with wine color (column `is_red`: 1 = red; 0 = white), and a target variable, column `quality` representing wine quality score. We are training models to predict wine quality scores, integer values from 3 to 9 (9 = highest quality; 3 = lowest quality).

We recommend following the order demonstrated in this tutorial to execute package functions for informed decision making when choosing a final model for your classification task (namely, checking confusion matrices and f1 scores for existing models, then compare f1 scores using ensembles).

### Step 1: Train + Test Data Split

First, you will need to split the dataset into X_train, y_train, X_test, y_test based on features and target variable columns. Here are some [instructions on data splitting](https://www.geeksforgeeks.org/how-to-split-the-dataset-with-scikit-learns-train_test_split-function/), if you need them. 

### Step 2: Prepare Your Models

Then, you will need to build your scikit-learn machine learning models and perform [hyperparameter tuning](https://scikit-learn.org/1.5/modules/grid_search.html) to make sure they are doing the best they can. We will assume your final candidates as below:

```python
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestRegressor

logreg = LogisticRegression(multi_class='multinomial', solver='lbfgs', C=92)
gb = GradientBoostingClassifier(learning_rate=0.001, n_estimators=1000)
svm = SVC(kernel='rbf', decision_function_shape='ovr', max_iter=2000)
rf = RandomForestClassifier(n_estimators=10)
knn5 = KNeighborsClassifier(n_neighbors=5)
```

Our package functions accept the models in a list of (name, model) tuple, so we will need to convert them into something like this:
```python
multi_ind = [
    ('logreg', logreg),
    ('gb', gb),
    ('svm', svm),
    ('rf', rf),
    ('knn5', knn5)
]
```

Note that the names you gave the models will appear in function outputs as labels, so if you'd like to see something more elaborate, for example, `Logistic Regression (C=92)` instead of `logreg` as we have above, please rename your models here.

> _**A Note on Pipelines:**_ For simplicity's sake, all the models used in this tutorial are individual sklearn classifiers. However, sklearn pipelines are also accepted by our functions, so below is valid input models as well.

```python
    from sklearn.preprocessing import RobustScaler
    from sklearn.pipeline import make_pipeline

    pipe_svm = make_pipeline(RobustScaler(), svm)
    pipe_rf = make_pipeline(RobustScaler(), rf)
    pipe_knn5 = make_pipeline(RobustScaler(), knn5)
    pipe_gb = make_pipeline(RobustScaler(), gb)
    pipe_mnp = make_pipeline(RobustScaler(), mnp)

    multi_pipe = [
        ('pipe_svm', pipe_svm),
        ('pipe_rf', pipe_rf),
        ('pipe_knn5', pipe_knn5),
        ('pipe_gb', pipe_gb),
        ('pipe_mnp', pipe_mnp)
    ]
```

## Let's Get Started!

### Step 1: Import Package Functions

First, let's import the package functions.

```python
from compare_classifiers.confusion_matrices import confusion_matrices
from compare_classifiers.compare_f1 import compare_f1
from compare_classifiers.ensemble_compare_f1 import ensemble_compare_f1
```

### Step 2: Compare Confusion Matrices for Candidate Models

A [confusion matrix](https://www.geeksforgeeks.org/confusion-matrix-machine-learning/) is used to evaluate the performance of a classification model by comparing its predicted values against the actual values, providing a detailed breakdown of correct and incorrect predictions, allowing for a deeper analysis of the model's strengths and weaknesses beyond just overall accuracy. We can take the models we've built in `multi_ind` from the **Prerequisites - Step 2: Prepare Your Models** section and look at how each is performing by looking at their confusion matrices side-by-side to compare them.

```python
confusion_matrices(multi_ind, X_train, X_test, y_train, y_test)
```

You will see outputs like below:

![cm1](./img/cm1.png "cm1")
![cm2](./img/cm2.png "cm2")
![cm3](./img/cm1.png "cm3")
![cm4](./img/cm1.png "cm4")
![cm5](./img/cm5.png "cm5")

### Step 3: Compare Fit Time and F1 Scores for Candidate Models

[F1 score](https://www.v7labs.com/blog/f1-score-guide) is a machine learning metric that measures a model's accuracy by combining its precision and recall by measuring the harmonic mean and is commonly used as an evaluation metric in binary and multi-class classification. Our package uses a macro-averaged F1 score (or macro F1 score). It is computed by taking the arithmetic mean (aka unweighted mean) of all the per-class F1 scores. This method treats all classes equally regardless of their support values.

Let's take a look at the fit times and f1 scores using 5-fold cross validation.

```python
compare_f1(multi_ind, X_train, y_train)
```

You will see outputs like below:

![f1](./img/f1.png "f1")

_(Please excuse the low test scores on the test dataset! Due to limited time of the project, I have only trained the models with the first 50 rows of the data using random hyperparamters, so I was not expecting impressing results.)_ 

### Step 4: Compare Fit Time and F1 Scores Through Voting and Stacking Ensembles

The goal of sklearn [ensemble methods](https://medium.com/@abhishekjainindore24/different-types-of-ensemble-techniques-bagging-boosting-stacking-voting-blending-b04355a03c93) is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator. There are different types of combination methods to build ensembles and each one yields slightly different results. Our package has two ensembles available from sklearn: [Voting](https://scikit-learn.org/1.5/modules/generated/sklearn.ensemble.VotingClassifier.html) and [Stacking](https://scikit-learn.org/dev/modules/generated/sklearn.ensemble.StackingClassifier.html) with default settings. 

Assume we've chosen the following models: `logreg`, `svm` and `knn5` as our finalists as they show the least amounts of overfitting. We can then take a look at their performances with both ensemble methods, again using 5-fold cross validation.

```python
finalists = [
    ('logreg', logreg),
    ('svm', svm),
    ('knn5', knn5)
]

ensemble_compare_f1(finalists, X_train, y_train)
```

You will see outputs like below:

![ensemble_diff](./img/ensemble_diff.png "ensemble_diff")

As we can see, Voting is clearly the winner!

### Step 5: Compare Ensemble Results With Individual Classifier Results

This step does not require running any package functions, but we believe it is an essential step in the comparing model performances. Since the Voting f1 score does not show a significant increase from some of our classifiers' individual f1 scores (looking at results from Step 3 above), we might want to assess whether we should choose to go forward with an ensemble model or without.

**Now, if you have decided to go with one of the existing individual classifiers you built (let's say `logreg` in our example, since its test score, 0.21, is higher than the Voting ensemble, 0.19, and does not show much sign of overfitting), then you have already ended up with the choice of a best model!**

However, if you have decided to use an ensemble to predict instead, then in the next section, we will demonstrate how to use the Voting ensemble method to predict the target classes. 

## (Optional) Predict Using Ensemble

### Predict Unseen Data With Chosen Ensemble Method

Now it's time to put our winner ensemble method to test and have it predict some test or unseen data.


```python
from compare_classifiers.ensemble_predict import ensemble_predict

ensemble_predict(finalists, X_train, y_train, 'voting', unseen_data)
```

> _**Note:**_ Since Voting won in our example, we put in 'voting' as the fourth input parameter of the function. Here, if you would like to use Stacking as the ensemble method, you would say 'stacking' instead.

You will see outputs like below:

![predict](./img/predict.png "predict")

These are the wine quality scores we get on the unseen data with our Voting ensemble classifier from the three finalist models we defined in Step 4!