<a href="https://colab.research.google.com/github/Samuel22-Cen/library_managment/blob/main/CEN352_Assignment2_Google_Colab_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Assignment 2 â€“ Supervised Learning with Real Data
## Learning from Real-World Patterns

**Student:** Samuel Troci  
**Course:** CEN352  

ðŸ“Œ *This notebook is fully compatible with **Google Colab**.*  
You can upload it directly to https://colab.research.google.com



## Dataset Choice & Justification

**Dataset:** Wine Quality â€“ Red Wine (UCI Machine Learning Repository)

**Justification:**
- Publicly available real-world dataset (UCI repository)
- Not used in lectures (not Iris, MNIST, Titanic)
- Non-trivial supervised learning task
- Relevant to real applications in food quality control and manufacturing

The goal is to predict whether a wine is **good** or **bad** based on physicochemical properties.



## Setup (Colab Ready)

All required libraries are pre-installed in Google Colab.


In [9]:

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier


## Load and Prepare Dataset

In [10]:

# Load dataset directly from UCI repository
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(url, sep=';')

# Convert quality to binary classification
# Good wine: quality >= 6, Bad wine: quality < 6
data['quality'] = (data['quality'] >= 6).astype(int)

X = data.drop('quality', axis=1)
y = data['quality']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


## Model Training

In [11]:

# Decision Treeâ€“based model (Random Forest)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Support Vector Machine
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train, y_train)

# Multi-Layer Perceptron
mlp = MLPClassifier(hidden_layer_sizes=(50, 50), max_iter=500, random_state=42)
mlp.fit(X_train, y_train)




## Performance Evaluation

In [None]:

models = {
    "Random Forest": rf,
    "SVM": svm,
    "MLP": mlp
}

results = {}

for name, model in models.items():
    y_pred = model.predict(X_test)
    results[name] = {
        "Accuracy": accuracy_score(y_test, y_pred),
        "F1 Score": f1_score(y_test, y_pred),
        "Confusion Matrix": confusion_matrix(y_test, y_pred)
    }

results


{'Random Forest': {'Accuracy': 0.803125,
  'F1 Score': 0.8119402985074626,
  'Confusion Matrix': array([[121,  28],
         [ 35, 136]])},
 'SVM': {'Accuracy': 0.7625,
  'F1 Score': 0.7639751552795031,
  'Confusion Matrix': array([[121,  28],
         [ 48, 123]])},
 'MLP': {'Accuracy': 0.771875,
  'F1 Score': 0.7871720116618076,
  'Confusion Matrix': array([[112,  37],
         [ 36, 135]])}}


## Insightful Analysis

The Random Forest model performs well due to its ability to capture non-linear feature
interactions and reduce overfitting through ensemble learning. The SVM performs competitively
but is sensitive to feature scaling and kernel choice. The MLP achieves reasonable performance
but requires careful tuning and sufficient data to avoid convergence issues.



## Ethical Reflection

Misclassification in wine quality prediction can have significant consequences.
False positives may allow low-quality wine to be sold as premium, misleading consumers
and harming trust. False negatives may unfairly penalize producers. Ethical deployment
requires transparency, validation, and human oversight.
