# Machine Learning Assignment
Στο συγγεκριμένο notebook βρίσκονται οι ασκήσεις του μαθήματος "Μηχανική Μάθηση" της χρονιάς 2024-2025.

Το dataset που χρησιμοποίησα είναι το Wine Quality Dataset του πανεπιστημίου UC Irvine (https://archive.ics.uci.edu/dataset/186/wine+quality). Πρόκειται για ένα dataset με 11 features και 1 label, την ποιότητα του κρασιού, που είναι ένα σκορ από 0 εώς 10. Για τους σκοπούς της άσκησης, θα υποθέσουμε ότι το κάθε σκορ μπορεί να ανήκει σε μία από 5 κατηγορίες:

*   Κάκιστο (Very Bad. Score: 0-2)
*   Κακό (Bad. Score: 2-4)
*   Εντάξει (Alright. Score 4-6)
*   Καλό (Good. Score 6-8)
*   Άριστο (Excellent. Score 8-10)

Τα ερωτήματα είναι τεράστια για ένα notebook, οπότε αποφάσισα να τα χωρίσω ανά 2. Το συγκεκριμένο notebook θα έχει τα παρακάτω ερωτήματα:

5.   Naive Bayes
6.   Multilayer Perceptron (MLP) in Pytorch

Στο τελευταίο αρχείο θα λυθούν τα θέματα με την ακόλουθη σειρά:

7.   Support Vector Machine (SVM)
8.   K Means

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
from sklearn.model_selection import train_test_split

from google.colab import drive
drive.mount('/content/drive')

import os
os.chdir('/content/drive/MyDrive/MLAssignment/Data')

Mounted at /content/drive


# Data Preprocessing

In [None]:
wines = pd.read_csv('winequality-red.csv', sep=';')

features = wines.drop("quality", axis=1)
label = wines["quality"]

X = np.array(features)
y = wines[["quality"]].to_numpy() #For shape (n,1)

# I am aware this method is not good for edge cases, but it is what it is

# Changed the classification from string to int in order do do one-hot vectors easier later

def classify(y):
    if y < 2.0:
        y = 0
    elif y < 4.0:
        y = 1
    elif y < 6.0:
        y = 2
    elif y < 8.0:
        y = 3
    else:
        y = 4
    return y

num_classes = 5
y_classified = np.array([classify(label) for label in y])

y_one_hot = np.eye(y_classified.max() + 1)[y_classified] #transform the label to a one-hot vector

In [None]:
def normalize_features(X):
    mean = np.mean(X, axis=0)
    std = np.std(X, axis=0)
    X_normalized = (X - mean) / std
    return X_normalized

X_normalized = normalize_features(X)

# 5) Naive Bayes Classifier

In [None]:
def calculate_mean_std_per_class(X, y):
    """Calculates mean and std of features for each class."""
    classes = np.unique(y)
    means = {}
    stds = {}
    for c in classes:
        X_c = X[y == c]  # Filter data for class c
        means[c] = np.mean(X_c, axis=0) # Mean of each feature for class c
        stds[c] = np.std(X_c, axis=0) # Std of each feature for class c
    return means, stds

In [None]:
def gaussian_log_pdf(x, mean, std):
    """Calculates the log of the probability density function for a Gaussian."""
    epsilon = 1e-8 # Small value
    std = std + epsilon # Adding a small number to avoid std = 0
    exponent = -((x - mean) ** 2) / (2 * std**2)
    log_coefficient = -0.5 * np.log(2 * np.pi) - np.log(std)
    return log_coefficient + exponent

In [None]:
def calculate_class_priors(y):
    """Calculates prior probability for each class"""
    classes = np.unique(y)
    priors = {}
    for c in classes:
        priors[c] = np.sum(y == c) / len(y) # Count occurences of each class
    return priors

In [None]:
def ml_naive_bayes_train(X, y):
    """Trains the Naive Bayes classifier."""
    means, stds = calculate_mean_std_per_class(X, y)
    priors = calculate_class_priors(y)
    return means, stds, priors

In [None]:
def ml_naive_bayes_test(means, stds, priors, X_test):
    """Applies the Naive Bayes classifier to the test data."""
    classes = list(priors.keys())
    num_samples = X_test.shape[0]
    class_log_probs = np.zeros((num_samples, len(classes)))
    for idx, c in enumerate(classes):
        log_likelihood = gaussian_log_pdf(X_test, means[c], stds[c])
        class_log_probs[:, idx] = np.sum(log_likelihood, axis=1) + np.log(priors[c])
    y_test = np.argmax(class_log_probs, axis=1)
    return y_test

In [None]:
# Train and Predict
means, stds, priors = ml_naive_bayes_train(X_normalized, y_classified) #Train
y_pred = ml_naive_bayes_test(means, stds, priors, X_normalized) #Test on training data

#Calculate Accuracy
accuracy = np.mean(y_pred == y_classified) # Calculate acc on the training set
print(f"Accuracy: {accuracy}")

Accuracy: 0.13758599124452783


# 6) MLP

In [None]:
!pip install torchmetrics

Collecting torchmetrics
  Downloading torchmetrics-1.6.1-py3-none-any.whl.metadata (21 kB)
Collecting lightning-utilities>=0.8.0 (from torchmetrics)
  Downloading lightning_utilities-0.11.9-py3-none-any.whl.metadata (5.2 kB)
Downloading torchmetrics-1.6.1-py3-none-any.whl (927 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m927.3/927.3 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading lightning_utilities-0.11.9-py3-none-any.whl (28 kB)
Installing collected packages: lightning-utilities, torchmetrics
Successfully installed lightning-utilities-0.11.9 torchmetrics-1.6.1


In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader
from torchmetrics import Accuracy
torch.manual_seed(42) # Setting the seed

<torch._C.Generator at 0x7f573954e2d0>

In [None]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_normalized, y_classified, test_size=0.3, random_state=42)

# Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

# Create TensorDatasets and DataLoaders for batching
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

In [None]:
# Define the MLP model
class MLP(torch.nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(input_size, hidden_size)
        self.tanh_act = torch.nn.Tanh()
        self.linear2 = torch.nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = self.linear1(x)
        x = self.tanh_act(x)
        x = self.linear2(x)
        return x

In [None]:
# Model parameters
input_size = X_normalized.shape[1] # Number of features
hidden_size = 128

# Initialize the model, loss function, and optimizer
model = MLP(input_size, hidden_size, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.005) # You can tune learning rate, momentum, and other optimizer parameters

In [None]:
# Training Loop
num_epochs = 500
for epoch in range(num_epochs):
    model.train() # Set the model to training mode
    running_loss = 0.0
    for inputs, labels in train_loader:
        # Zero gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)

        # Calculate the loss
        loss = criterion(outputs, labels)

        # Backward pass and optimization step
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {running_loss/len(train_loader):.4f}')

Epoch 1/500, Loss: 1.5230
Epoch 2/500, Loss: 1.3222
Epoch 3/500, Loss: 1.1711
Epoch 4/500, Loss: 1.0589
Epoch 5/500, Loss: 0.9754
Epoch 6/500, Loss: 0.9130
Epoch 7/500, Loss: 0.8656
Epoch 8/500, Loss: 0.8289
Epoch 9/500, Loss: 0.7997
Epoch 10/500, Loss: 0.7761
Epoch 11/500, Loss: 0.7568
Epoch 12/500, Loss: 0.7406
Epoch 13/500, Loss: 0.7268
Epoch 14/500, Loss: 0.7153
Epoch 15/500, Loss: 0.7051
Epoch 16/500, Loss: 0.6961
Epoch 17/500, Loss: 0.6883
Epoch 18/500, Loss: 0.6816
Epoch 19/500, Loss: 0.6758
Epoch 20/500, Loss: 0.6699
Epoch 21/500, Loss: 0.6651
Epoch 22/500, Loss: 0.6605
Epoch 23/500, Loss: 0.6566
Epoch 24/500, Loss: 0.6527
Epoch 25/500, Loss: 0.6496
Epoch 26/500, Loss: 0.6464
Epoch 27/500, Loss: 0.6434
Epoch 28/500, Loss: 0.6405
Epoch 29/500, Loss: 0.6385
Epoch 30/500, Loss: 0.6361
Epoch 31/500, Loss: 0.6339
Epoch 32/500, Loss: 0.6318
Epoch 33/500, Loss: 0.6298
Epoch 34/500, Loss: 0.6282
Epoch 35/500, Loss: 0.6264
Epoch 36/500, Loss: 0.6249
Epoch 37/500, Loss: 0.6235
Epoch 38/5

In [None]:
# Evaluation mode
model.eval()
accuracy_metric = Accuracy(task="multiclass", num_classes=num_classes) #initialize accuracy

with torch.no_grad(): # Disable gradient calculations for evaluation
    for inputs, labels in test_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        predicted = predicted.long()
        accuracy_metric.update(predicted, labels)

# Calculate evaluation metrics
accuracy = accuracy_metric.compute()

print(f"Test Accuracy: {accuracy:.4f}")

Test Accuracy: 0.7292
