# Lab 5: Comparing Machine Learning Models for Predicting Red Wine Quality🍷

**Author:** 🧑‍💻 Justin Schroder

**Date:** 📅 4/9/2025

## 📝 Introduction

This project compares different machine learning models to predict red wine quality. We evaluate models like Random Forest, AdaBoost, and Gradient Boosting to find the best one for accurate predictions and good generalization to new data.

---


## Section 1. Import and Inspect the Data
In the code cell below, import the necessary Python libraries for this notebook.  

### 1.1 Imports

In [184]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import (
    RandomForestClassifier,
    AdaBoostClassifier,
    GradientBoostingClassifier,
    BaggingClassifier,
    VotingClassifier,
)
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
)

### 1.2 Load the dataset

In [185]:
# Load the wine quality dataset
df = pd.read_csv("C:/Projects/applied-ml-justin/lab05/wine-dataset/winequality-red.csv", sep=";")

# Display structure and first few rows
df.info()
df.head()

# The dataset includes 11 physicochemical input variables (features):
# ---------------------------------------------------------------
# - fixed acidity          mostly tartaric acid
# - volatile acidity       mostly acetic acid (vinegar)
# - citric acid            can add freshness and flavor
# - residual sugar         remaining sugar after fermentation
# - chlorides              salt content
# - free sulfur dioxide    protects wine from microbes
# - total sulfur dioxide   sum of free and bound forms
# - density                related to sugar content
# - pH                     acidity level (lower = more acidic)
# - sulphates              antioxidant and microbial stabilizer
# - alcohol                % alcohol by volume

# The target variable is:
# - quality (integer score from 0 to 10, rated by wine tasters)

# We will simplify this target into three categories:
#   - low (3–4), medium (5–6), high (7–8) to make classification feasible.
#   - we will also make this numeric (we want both for clarity)
# The dataset contains 1599 samples and 12 columns (11 features + target).

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


--- 

## Section 2: Prepare the Data

In [186]:
# Helper function to map quality scores to string labels
def quality_to_label(q):
    if q <= 4:
        return "low"
    elif q <= 6:
        return "medium"
    else:
        return "high"

In [187]:
# Create a new column with quality labels
df["quality_label"] = df["quality"].apply(quality_to_label)

# Helper function to map quality scores to numeric labels
def quality_to_number(q):
    if q <= 4:
        return 0
    elif q <= 6:
        return 1
    else:
        return 2

In [188]:
# Create a new column with numeric quality categories
df["quality_numeric"] = df["quality"].apply(quality_to_number)

--- 

## Section 3: Feature Selection and Justification
* Features (X): We exclude quality, quality_label, and quality_numeric because they are related to the target, not input features.
* Target (y): We use quality_numeric as the target since it’s the numeric representation of the wine quality.

In [189]:
# Define input features (X) and target (y)

# Features: all columns except 'quality', 'quality_label', and 'quality_numeric'
# These columns are either the original quality score or labels, which are not useful for the model.
X = df.drop(columns=["quality", "quality_label", "quality_numeric"])  # Features

# Target: quality_numeric, which is the numeric encoding of the quality score we want to predict
y = df["quality_numeric"]  # Target

---

## Section 4: Train a Classification Model for Case 1
### 4.1 Split the Data

* Split the Data: We divide the data into a training set (80%) and a test set (20%) so the model can learn from one set and be evaluated on another to check its performance on unseen data.
* Stratify: We use stratify=y to make sure the class distribution of the target (y) is the same in both the training and test sets.

In [190]:
# Train/test split (stratify to preserve class balance)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

---

## Section 5: Evaluate Model Performance

### 5.1 Train the Models

In [191]:
# Helper function to train and evaluate models
def evaluate_model(name, model, X_train, y_train, X_test, y_test, results):
    model.fit(X_train, y_train)

    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)

    train_acc = accuracy_score(y_train, y_train_pred)
    test_acc = accuracy_score(y_test, y_test_pred)
    train_f1 = f1_score(y_train, y_train_pred, average="weighted")
    test_f1 = f1_score(y_test, y_test_pred, average="weighted")

    print(f"\n{name} Results")
    print("Confusion Matrix (Test):")
    print(confusion_matrix(y_test, y_test_pred))
    print(f"Train Accuracy: {train_acc:.4f}, Test Accuracy: {test_acc:.4f}")
    print(f"Train F1 Score: {train_f1:.4f}, Test F1 Score: {test_f1:.4f}")

    results.append(
        {
            "Model": name,
            "Train Accuracy": train_acc,
            "Test Accuracy": test_acc,
            "Train F1": train_f1,
            "Test F1": test_f1,
        }
    )

In [192]:
results = []

### 5.2 Model 1: Random Forest (100)

In [193]:
# 1. Random Forest
evaluate_model(
    "Random Forest (100)",
    RandomForestClassifier(n_estimators=100, random_state=42),
    X_train,
    y_train,
    X_test,
    y_test,
    results,
)


Random Forest (100) Results
Confusion Matrix (Test):
[[  0  13   0]
 [  0 256   8]
 [  0  15  28]]
Train Accuracy: 1.0000, Test Accuracy: 0.8875
Train F1 Score: 1.0000, Test F1 Score: 0.8661


### 5.3 Model 2: AdaBoost (100)

In [194]:
# 2. AdaBoost
evaluate_model(
    "AdaBoost (100)",
    AdaBoostClassifier(n_estimators=100, random_state=42),
    X_train,
    y_train,
    X_test,
    y_test,
    results,
)


AdaBoost (100) Results
Confusion Matrix (Test):
[[  1  12   0]
 [  5 240  19]
 [  0  20  23]]
Train Accuracy: 0.8342, Test Accuracy: 0.8250
Train F1 Score: 0.8209, Test F1 Score: 0.8158


--- 

## Section 6: Compare Results 

In [195]:
# Create a DataFrame from the results
results_df = pd.DataFrame(results)

# Calculate the gap between train and test accuracy
results_df['Accuracy Gap'] = results_df['Train Accuracy'] - results_df['Test Accuracy']

# Calculate the gap between train and test accuracy
results_df['F1 Gap'] = results_df['Train F1'] - results_df['Test F1']

# Sort the DataFrame by Test Accuracy (descending order)
results_df = results_df.sort_values(by='Test Accuracy', ascending=False)

# Display the summary of all models
print("\nSummary of All Models:")
display(results_df)



Summary of All Models:


Unnamed: 0,Model,Train Accuracy,Test Accuracy,Train F1,Test F1,Accuracy Gap,F1 Gap
0,Random Forest (100),1.0,0.8875,1.0,0.866056,0.1125,0.133944
1,AdaBoost (100),0.834246,0.825,0.820863,0.815803,0.009246,0.00506


---

## 🔍 Section 7: Conclusions and Insights

### 7.1 Overall Performance:
#### Random Forest (100):
* Test Accuracy: 88.75%
* Test F1 Score: 86.61%
* Accuracy Gap: 11.25%
* F1 Gap: 13.34%
#### AdaBoost (100):
* Test Accuracy: 82.50%
* Test F1 Score: 81.58%
* Accuracy Gap: 0.92%
* F1 Gap: 0.51%

### 7.2 Overall Performance from Peer Projects:
#### [Link to Kate's Project](https://github.com/katehuntsman/applied-ml-huntsman/blob/main/lab05/ensemble-huntsman.ipynb)
#### Gradient Boosting (100):
* Test Accuracy: 85.62%
* Test F1 Score: 84.11%
* Accuracy Gap: 10.38%
* F1 Gap: 11.73%

#### [Link to Brett's Project](https://github.com/bncodes19/applied-ml-bneely/blob/main/lab05/ensemble-neely.ipynb)
#### MLP Classifier:
* Test Accuracy: 84.38%
* Test F1 Score: 80.73%
* Accuracy Gap: 0.77%
* F1 Gap: 0.68%

### 7.3 Conclusion:
* Random Forest (100) has the best test accuracy (88.75%) but overfits the training data, as seen in the big gap between train and test F1 scores (13.34%).
* AdaBoost (100) is more stable with smaller gaps (0.92% accuracy, 0.51% F1) but has slightly lower performance overall.
* Gradient Boosting (100) from Kate's project and MLP Classifier from Brett's project perform similarly, with test accuracies around 85-86%. Their moderate gaps make them a good balance between performance and generalization.

Overall, I would recommend using AdaBoost (100) due to its strong performance and ability to avoid overfitting. It strikes a great balance, giving reliable results with stable generalization, making it a solid choice for predictive modeling.

--- 