# Lab 5: Ensemble Machine Learning – Wine Dataset
**Author:** Mhamed  
**Date:** 04, 10, 2025 

## Introduction

This project focuses on predicting the quality of red wine using various machine learning ensemble methods. 
To streamline the classification task, the original quality scores were grouped into three categories: low, medium, and high. These categories were further encoded numerically for model training.

The project explores and evaluates multiple ensemble models, including Random Forest and Gradient Boosting, to classify wine quality. Performance is assessed using metrics such as accuracy, F1 score, and confusion matrices on both training and test data. A final comparison table highlights each model's performance, with attention to generalization gaps.

# Objective:

We're working with the Wine Quality dataset, and our goal is to predict the quality of red wine using its chemical properties.

## Section 1. Import and Inspect the Data

In [97]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Ensemble and base models
from sklearn.ensemble import RandomForestClassifier as RF
from sklearn.ensemble import AdaBoostClassifier as AdaBoost
from sklearn.ensemble import GradientBoostingClassifier as GB
from sklearn.ensemble import BaggingClassifier as Bagging
from sklearn.ensemble import VotingClassifier

from sklearn.tree import DecisionTreeClassifier as DT
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression as LR
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.neural_network import MLPClassifier as MLP

# Tools for splitting and evaluating
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
)

In [98]:
# 1.1 # Load the wine quality dataset

df = pd.read_csv("winequality-red.csv", sep=";")

# Display basic structure
df.info()
df.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
5,7.4,0.66,0.0,1.8,0.075,13.0,40.0,0.9978,3.51,0.56,9.4,5
6,7.9,0.6,0.06,1.6,0.069,15.0,59.0,0.9964,3.3,0.46,9.4,5
7,7.3,0.65,0.0,1.2,0.065,15.0,21.0,0.9946,3.39,0.47,10.0,7
8,7.8,0.58,0.02,2.0,0.073,9.0,18.0,0.9968,3.36,0.57,9.5,7
9,7.5,0.5,0.36,6.1,0.071,17.0,102.0,0.9978,3.35,0.8,10.5,5


## Section 2. Prepare the Data
Includes cleaning, feature engineering, encoding, splitting, helper functions

In [99]:
# Convert quality score to labels: low, medium, high
# Convert quality score to labels: low, medium, high
def quality_to_label(q):
    if q <= 4:
        return "low"
    elif q <= 6:
        return "medium"
    else:
        return "high"

df["quality_label"] = df["quality"].apply(quality_to_label)

In [100]:
# Convert quality score to numeric class: 0 = low, 1 = medium, 2 = high
def quality_to_number(q):
    if q <= 4:
        return 0
    elif q <= 6:
        return 1
    else:
        return 2

df["quality_numeric"] = df["quality"].apply(quality_to_number)

# Display updated DataFrame
print("\nUpdated Dataset Sample:")
print(df[["quality", "quality_label", "quality_numeric"]].head())


Updated Dataset Sample:
   quality quality_label  quality_numeric
0        5        medium                1
1        5        medium                1
2        5        medium                1
3        6        medium                1
4        5        medium                1


### Explain what we do and why as you proceed. 
Cleaned and loaded the data,

Created meaningful features (quality_label, quality_numeric) from the original quality score,

Prepared the dataset for modeling (categorical → numeric transformation).

# Section 3. Feature Selection and Justification

In [101]:
# Define input features (X) and target (y)
X = df.drop(columns=["quality", "quality_label", "quality_numeric"])  # features
y = df["quality_numeric"]  # target (numeric version of wine quality)

# Section 4: Split the Data into Train and Test Sets

In [102]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Section 5: Evaluate Model Performance

In [103]:

def evaluate_model(name, model, X_train, y_train, X_test, y_test, results):
    model.fit(X_train, y_train)

    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)

    train_acc = accuracy_score(y_train, y_train_pred)
    test_acc = accuracy_score(y_test, y_test_pred)
    train_f1 = f1_score(y_train, y_train_pred, average="weighted")
    test_f1 = f1_score(y_test, y_test_pred, average="weighted")

    print(f"\n{name} Results")
    print("Confusion Matrix (Test):")
    print(confusion_matrix(y_test, y_test_pred))
    print(f"Train Accuracy: {train_acc:.4f}, Test Accuracy: {test_acc:.4f}")
    print(f"Train F1 Score: {train_f1:.4f}, Test F1 Score: {test_f1:.4f}")

    results.append({
        "Model": name,
        "Train Accuracy": train_acc,
        "Test Accuracy": test_acc,
        "Train F1": train_f1,
        "Test F1": test_f1,
    })

In [104]:

# List to store evaluation results
results = []

In [105]:
# Model 1: Random Forest (100 trees)
evaluate_model(
    "Random Forest (100)",
    RF(n_estimators=100, random_state=42),
    X_train, y_train, X_test, y_test,
    results
)

# Model 2: Gradient Boosting (100)
evaluate_model(
    "Gradient Boosting (100)",
    GradientBoostingClassifier(
        n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42
    ),
    X_train,
    y_train,
    X_test,
    y_test,
    results,
)


Random Forest (100) Results
Confusion Matrix (Test):
[[  0  13   0]
 [  0 256   8]
 [  0  15  28]]
Train Accuracy: 1.0000, Test Accuracy: 0.8875
Train F1 Score: 1.0000, Test F1 Score: 0.8661

Gradient Boosting (100) Results
Confusion Matrix (Test):
[[  0  13   0]
 [  3 247  14]
 [  0  16  27]]
Train Accuracy: 0.9601, Test Accuracy: 0.8562
Train F1 Score: 0.9584, Test F1 Score: 0.8411


# Section 6: Compare Results

In [106]:
# Create a DataFrame from the results list
results_df = pd.DataFrame(results)

# Add gap columns
results_df["Accuracy Gap"] = results_df["Train Accuracy"] - results_df["Test Accuracy"]
results_df["F1 Gap"] = results_df["Train F1"] - results_df["Test F1"]

# Sort by Test Accuracy
results_df = results_df.sort_values(by="Test Accuracy", ascending=False)

# Display final comparison table
print("\nSummary of All Models:")
display(results_df)


Summary of All Models:


Unnamed: 0,Model,Train Accuracy,Test Accuracy,Train F1,Test F1,Accuracy Gap,F1 Gap
0,Random Forest (100),1.0,0.8875,1.0,0.866056,0.1125,0.133944
1,Gradient Boosting (100),0.960125,0.85625,0.95841,0.841106,0.103875,0.117304


# Section 7: Conclusion and Insights

Both Random Forest and Gradient Boosting perform well in predicting wine quality, but Gradient Boosting is the more reliable model due to its better generalization and smaller performance gaps. While Random Forest achieves perfect training scores, it shows signs of overfitting. Gradient Boosting balances accuracy and stability, making it a stronger candidate for real-world deployment.

## Next step: 

The next step is to apply all models and make a comparison between them.