# Models Evaluation

## Introduction
In this notebook, optimal hyperparameters will be selected and the performance of both models will be evaluated.

### Imports
The analysis commences with the necessary imports.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
from pathlib import Path

project_root = Path.cwd()
while not (project_root / "src").exists():
    project_root = project_root.parent

sys.path.append(str(project_root / "src"))

from model_selection import grid_search_cv
from models import SVM, LogisticRegression

### Data Loading
The data will be loaded.

In [None]:
X_train_df = pd.read_csv('../data/processed/X_train.csv')
y_train_df = pd.read_csv('../data/processed/y_train.csv')
X_test_df = pd.read_csv('../data/processed/X_test.csv')
y_test_df = pd.read_csv('../data/processed/y_test.csv')

y_train = np.where(y_train_df['quality'] >= 6, 1, -1)
y_test = np.where(y_test_df['quality'] >= 6, 1, -1)

X_train = X_train_df.to_numpy()
X_test = X_test_df.to_numpy()

## Hyperparameter Tuning
To identify optimal hyperparameters, multiple rounds of grid search are required to thoroughly explore all possible parameter combinations.

### SVM
For SVMs, two primary parameters require optimization: the number of iterations (*n_iters*) and the regularization parameter lambda (*lambda_param*). Typically, the number of folds ranges between 5 to 10; however, given our computational capacity, we can extend this to 100 folds without exceeding three minutes of processing time, thereby approximating Leave-One-Out validation and achieving a more robust validation framework.

In [None]:
svm_param_grid = {
        'n_iters': [1000, 2000, 3000, 4000, 5000, 6000],
        'lambda_param' : [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]
    }

svm_best_params, svm_best_metrics = grid_search_cv(SVM, svm_param_grid, X_train, y_train, 100)
print(f'SVM best parameter: {svm_best_params}')
print(f'SVM best metrics: {svm_best_metrics}')

The optimal hyperparameters identified are *n_iters: 5000* and *lambda_param: 0.1*. A refined search will now be conducted within the neighborhood of these parameters.

In [None]:
svm_param_grid = {
        'n_iters': [4500, 4750, 5000, 5250, 5500],
        'lambda_param' : [5e-1, 3e-1, 1e-1, 9e-2, 7e-2]
    }

svm_best_params, svm_best_metrics = grid_search_cv(SVM, svm_param_grid, X_train, y_train, 100)
print(f'SVM best parameter: {svm_best_params}')
print(f'SVM best metrics: {svm_best_metrics}')

Due to time constraints, the hyperparameters are manually assigned to variables; however, the grid search procedure remains fully reproducible.

In [None]:
svm_n_iters = 5000
svm_lambda_param = 1e-7

### Logistic Regression
As with SVMs, the parameters include n_iters and lambda_param, however, this model additionally incorporates the learning rate parameter (*learning_rate*). Unlike SVMs, this implementation exhibits significantly lower computational efficiency, necessitating the use of a more modest number of folds and a more judicious hyperparameter search approach rather than brute force methods. The initial step involves establishing the appropriate orders of magnitude.

In [None]:
lr_param_grid = {
        'n_iters': [10, 100, 300],
        'lambda_param' : [1e-1, 1e-3],
        'learning_rate' : [1e-1, 1e-3]
    }

lr_best_params, lr_best_metrics = grid_search_cv(LogisticRegression, lr_param_grid, X_train, y_train, 5)
print(f'Logistic Regression best parameter: {lr_best_params}')
print(f'Logistic Regression best metrics: {lr_best_metrics}')

It is immediately apparent that convergence occurs toward very low values of lambda and learning rate, while the number of iterations settles on an intermediate value. Given the probable noise introduced by the reduced number of folds, a cautious approach is required to explore the parameter neighborhood.

In [None]:
lr_param_grid = {
        'n_iters': [50, 150],
        'lambda_param' : [1e-2, 1e-4],
        'learning_rate' : [1e-2, 1e-4]
    }

lr_best_params, lr_best_metrics = grid_search_cv(LogisticRegression, lr_param_grid, X_train, y_train, 5)
print(f'Logistic Regression best parameter: {lr_best_params}')
print(f'Logistic Regression best metrics: {lr_best_metrics}')

It is notable that the model does not necessarily tend toward high iteration values, indicating that convergence likely occurs rapidly. This phenomenon will be more readily observable through the examination of learning curves.

Utilizing the optimal parameters from previous trials, the number of folds will be doubled to reduce noise and identify robust hyperparameter values.

In [None]:
lr_param_grid = {
        'n_iters': [50, 100],
        'lambda_param' : [1e-3, 1e-4],
        'learning_rate' : [1e-2, 1e-3]
    }

lr_best_params, lr_best_metrics = grid_search_cv(LogisticRegression, lr_param_grid, X_train, y_train, 10)
print(f'Logistic Regression best parameter: {lr_best_params}')
print(f'Logistic Regression best metrics: {lr_best_metrics}')

In [None]:
lr_param_grid = {
        'n_iters': [50, 100],
        'lambda_param' : [1e-4, 1e-5, 1e-6],
        'learning_rate' : [1e-3]
    }

lr_best_params, lr_best_metrics = grid_search_cv(LogisticRegression, lr_param_grid, X_train, y_train, 10)
print(f'Logistic Regression best parameter: {lr_best_params}')
print(f'Logistic Regression best metrics: {lr_best_metrics}')

Due to time constraints, the hyperparameters are manually assigned to variables; however, the grid search procedure remains fully reproducible.

In [None]:
lr_n_iters = 50
lr_lambda_param = 1e-5
lr_learning_rate = 1e-3

## Learning Curves
It is particularly valuable to analyze the learning curves of the various algorithms to observe how and when convergence occurs.

### Helper Functions

In [None]:
def calculate_f1(predictions, y_true):
    tp = np.sum((predictions == 1) & (y_true == 1))
    fp = np.sum((predictions == 1) & (y_true == -1))
    fn = np.sum((predictions == -1) & (y_true == 1))
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0
    
    return f1

def plot_learning_curve(model_class, X_train, y_train, X_test, y_test, iterations_list, **model_kwargs):
    
    train_scores = []
    test_scores = []
    
    for n_iter in iterations_list:
        model = model_class(n_iters=n_iter, **model_kwargs)
        model.fit(X_train, y_train)
        
        train_pred = model.predict(X_train)
        test_pred = model.predict(X_test)
        
        train_score = calculate_f1(train_pred, y_train)
        test_score = calculate_f1(test_pred, y_test)
        
        train_scores.append(train_score)
        test_scores.append(test_score)
        
        print(f"Iter {n_iter}: Train={train_score:.3f}, Test={test_score:.3f}")
    
    plt.figure(figsize=(8, 5))
    plt.plot(iterations_list, train_scores, 'o-', label='Training', color='blue')
    plt.plot(iterations_list, test_scores, 'o-', label='Test', color='red')
    
    plt.xlabel('Iteration Number')
    plt.ylabel('F1-Score')
    plt.title('Learning Curve')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

### SVM

In [None]:
plot_learning_curve(SVM, X_train, y_train, X_test, y_test, [100, 300, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000], lambda_param=svm_lambda_param)

### Logistic Regression

In [None]:
plot_learning_curve(LogisticRegression, X_train, y_train, X_test, y_test, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 50], lambda_param=lr_lambda_param, learning_rate=lr_learning_rate)

### Conclusions
Analysis of these two graphs reveals highly interesting differences in model behavior. The first graph displays rather chaotic patterns, with performance metrics oscillating continuously without achieving stability. The model appears unable to converge on an optimal solution, continuously altering its trajectory with each iteration. This behavior suggests that the optimization algorithm encounters difficulty in identifying a coherent direction.

The second graph presents a markedly different narrative. Here, the model initiates with modest performance but demonstrates rapid and systematic improvement during initial iterations, subsequently stabilizing at consistently high performance levels. This represents the classical convergence pattern expected from well-designed algorithms: rapid initial learning followed by stable retention of acquired knowledge.

From a practical perspective, the second model would prove significantly more reliable for deployment. It not only achieves superior performance but does so in a predictable and stable manner. Conversely, while the first model occasionally reaches noteworthy peaks, its inherent instability renders it unsuitable for real-world applications.

## Evaluation

### Helper Functions

In [None]:
def calculate_metrics(predictions, y_test):
    tp = np.sum((predictions == 1) & (y_test == 1))
    fp = np.sum((predictions == 1) & (y_test == -1))
    tn = np.sum((predictions == -1) & (y_test == -1))
    fn = np.sum((predictions == -1) & (y_test == 1))
    
    accuracy = (tp + tn) / len(y_test)
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1,
        'tp': tp, 'fp': fp, 'tn': tn, 'fn': fn
    }

def plot_metrics(predictions, y_test):
    metrics = calculate_metrics(predictions, y_test)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    names = ['Accuracy', 'Precision', 'Recall', 'F1']
    values = [metrics['accuracy'], metrics['precision'], 
              metrics['recall'], metrics['f1']]
    
    ax1.bar(names, values, color=['skyblue', 'lightcoral', 'lightgreen', 'orange'])
    ax1.set_ylim(0, 1)
    ax1.set_title('Metrics')
    
    for i, v in enumerate(values):
        ax1.text(i, v + 0.02, f'{v:.3f}', ha='center')
    
    cm = [[metrics['tn'], metrics['fp']], 
          [metrics['fn'], metrics['tp']]]
    
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax2,
                xticklabels=['Bad', 'Good'], yticklabels=['Bad', 'Good'])
    ax2.set_title('Confusion Matrix')
    ax2.set_xlabel('Predicted')
    ax2.set_ylabel('Actual')
    
    plt.tight_layout()
    plt.show()

### SVM

In [None]:
svm = SVM(svm_n_iters, svm_lambda_param)
svm.fit(X_train, y_train)
predictions = svm.predict(X_test)
plot_metrics(predictions, y_test)

### Logistic Regression

In [None]:
lr = LogisticRegression(lr_n_iters, lr_lambda_param, lr_learning_rate)
lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
plot_metrics(predictions, y_test)

### Conclusions

Performance visualization indicates that logistic regression generally demonstrates superior performance on this specific dataset.