# Lab 5: Ensemble Machine Learning - Wine Dataset

Name: Terry Konkin  
Date: April 13, 2025  
Objective: To compare 2 Ensemble models on the Wine Dataset

### Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt



In [2]:
from sklearn.ensemble import (
    RandomForestClassifier,
    AdaBoostClassifier,
    GradientBoostingClassifier,
    BaggingClassifier,
    VotingClassifier,
)
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
)

### Section 1. Load and Inspect the Data

In [3]:
# Load the dataset (download from UCI and save in the same folder)
df = pd.read_csv("winequality-red.csv", sep=";")

# Display structure and first few rows
df.info()
df.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


The dataset includes 11 physicochemical input variables (features):
 ---------------------------------------------------------------
 - fixed acidity -           mostly tartaric acid
 - volatile acidity -        mostly acetic acid (vinegar)
 - citric acid -             can add freshness and flavor
 - residual sugar -          remaining sugar after fermentation
 - chlorides -               salt content
 - free sulfur dioxide -     protects wine from microbes
 - total sulfur dioxide -    sum of free and bound forms
 - density -                 related to sugar content
 - pH -                      acidity level (lower = more acidic)
 - sulphates -               antioxidant and microbial stabilizer
 - alcohol -                 % alcohol by volume



The target variable is:
 - quality (integer score from 0 to 10, rated by wine tasters)  
  
We will simplify this target into three categories:
   - low (3–4), medium (5–6), high (7–8) to make classification feasible.  
   - we will also make this numeric (we want both for clarity)  
The dataset contains 1599 samples and 12 columns (11 features + target).

### Section 2. Prepare the Data  
  
Includes cleaning, feature engineering, encoding, splitting, helper functions

In [4]:
# Define helper function that:

# Takes one input, the quality (which we will temporarily name q while in the function)
# And returns a string of the quality label (low, medium, high)
# This function will be used to create the quality_label column

def quality_to_label(q):
    if q <= 4:
        return "low"
    elif q <= 6:
        return "medium"
    else:
        return "high"

# Call the apply() method on the quality column to create the new quality_label column
df["quality_label"] = df["quality"].apply(quality_to_label)     

# Then, create a numeric column for modeling: 0 = low, 1 = medium, 2 = high
def quality_to_number(q):
    if q <= 4:
        return 0
    elif q <= 6:
        return 1
    else:
        return 2

df["quality_numeric"] = df["quality"].apply(quality_to_number)   


### Section 3. Feature Selection and Justification

In [5]:
# Define input features (X) and target (y)
# Features: all columns except 'quality' and 'quality_label' and 'quality_numeric' - drop these from the input array
# Target: quality_label (the new column we just created)

X = df.drop(columns=["quality", "quality_label", "quality_numeric"])  # Features
y = df["quality_numeric"]  # Target


Summary:  

The input features (X) are all columns in the original dataset with the exception of 'quality'.  
The target (y) is the newly created feature 'quality_numeric'.  This column aggregates the quality ratings into 3 categories:  
0 (low), 1 (medium), and 2 (high).

### Section 4. Split the Data into Train and Test

In [6]:
# Train/test split (stratify to preserve class balance)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)


### Section 5. Evaluate Model Performance (Choose 2)

Train models

In [12]:
# Helper function to train and evaluate models

results = []

def evaluate_model(name, model, X_train, y_train, X_test, y_test, results):
    model.fit(X_train, y_train)
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)

    train_acc = accuracy_score(y_train, y_train_pred)
    test_acc = accuracy_score(y_test, y_test_pred)
    train_f1 = f1_score(y_train, y_train_pred, average="weighted")
    test_f1 = f1_score(y_test, y_test_pred, average="weighted")

    print(f"\n{name} Results")
    print("Confusion Matrix (Test):")
    print(confusion_matrix(y_test, y_test_pred))
    print(f"Train Accuracy: {train_acc:.4f}, Test Accuracy: {test_acc:.4f}")
    print(f"Train F1 Score: {train_f1:.4f}, Test F1 Score: {test_f1:.4f}") 

    results.append(
        {
            "Model": name,
            "Train Accuracy": train_acc,
            "Test Accuracy": test_acc,
            "Train F1": train_f1,
            "Test F1": test_f1,
        }
    )



Option 1: Random Forest (100)

In [None]:
# Call the function for Random Forest (100)


evaluate_model(
    "Random Forest (100)",
    RandomForestClassifier(n_estimators=100, random_state=42),
    X_train,
    y_train,
    X_test,
    y_test,
    results,
)





Random Forest (100) Results
Confusion Matrix (Test):
[[  0  13   0]
 [  0 256   8]
 [  0  15  28]]
Train Accuracy: 1.0000, Test Accuracy: 0.8875
Train F1 Score: 1.0000, Test F1 Score: 0.8661


Option 3: AdaBoost

In [None]:
# Call the function for AdaBoost


evaluate_model(
    "AdaBoost (100)",
    AdaBoostClassifier(n_estimators=100, random_state=42),
    X_train,
    y_train,
    X_test,
    y_test,
    results,
)






AdaBoost (100) Results
Confusion Matrix (Test):
[[  1  12   0]
 [  5 240  19]
 [  0  20  23]]
Train Accuracy: 0.8342, Test Accuracy: 0.8250
Train F1 Score: 0.8209, Test F1 Score: 0.8158


### Section 6. Compare Results

In [15]:
# Create a table of results 
results_df = pd.DataFrame(results)

print("\nSummary of All Models:")
display(results_df)



Summary of All Models:


Unnamed: 0,Model,Train Accuracy,Test Accuracy,Train F1,Test F1
0,Random Forest (100),1.0,0.8875,1.0,0.866056
1,AdaBoost (100),0.834246,0.825,0.820863,0.815803


### Section 7. Conclusions and Insights

From the list of available options, two types of ensemble models were utilized:  
  
Random Forest (100)   
Multiple decision trees (100 in this case) train in parallel, and the predictions are averaged.
  
AdaBoost (100)  
Models train sequentially, with each new tree correcting previous errors.

Performance Metrics

Test Accuracy: 

Random Forest 88.88; AdaBoost 82.50  
Random Forest metric is higher, so has slighly better accuracy.
  


Test F1 Score:  
Random Forest 86.61; AdaBoost 81.58  
Random Forest metric is higher, so performs slightly better.

Confusion Matrix:  
True positives:  Random Forest 0, 256, 28; AdaBoost 1, 240 23  
Although the models are similar for the low class, Random Forest has more true positives for medium and high class.

Summary  
  
Based on the above metrics, Random Forest had slighly higher performance of the 2 models.