# Spam Classification Challenge

>If, at any point before the written exam, you (as a group) submit an e-mail spam classifier that beats my precision/recall performance on the e-mail data test set, I will award everyone a 2.5% bonus in the written exam (e.g., if you score 87.5% in the exam, I will bump it up to 90%).


## Rules
- Must be submitted before the day of the written exam.
- Must be implemented in Python.
- Must use a method discussed in the course (variations/modifications are okay).
- Must be a single submission that all of you unanimously agree on.
- You win and earn the 2.5% bonus, if 2*(precision*recall)/(precision+recall) of your model on the test set is
larger than mine (this is called F1-score).
- I will upload the training set and my own code on May 12, 2025. You won’t have access to the test set.

### 1. Load imports and data
Classification-Data will be imported as a bag of words.

In [1]:
from models import NaiveBayes, NeuralNetwork, DecisionTreeClassifier, LinearClassifierClosedForm, LinearClassifierGD, KNN
from utils import DataLoader, Evaluator


# Load spam classification data
X, y = DataLoader.load_spam_data('./data_train')

Vocabulary loaded
Loaded 4125 emails (1176 spam, 2949 no-spam)
Feature matrix shape: (4125, 50371)


### 2. Add the model you want to evaluate
Add as many models as you want. You can add a model severalt times with different hyperparameters.

*Recommended:  Add a meaningful description for easier evaluation.*

Use existent models with different hyperparameters or add your own model by inheriting from `Model` class.

In [6]:
# Define models to evaluate
models = [
    #KNN(name="A", k=2, distance="manhattan"), -> Needs some fixes to be used
    LinearClassifierGD(name="LinearClassifier Gradient Descent", epochs=10, lr=0.1),
    LinearClassifierClosedForm(name="LinearClassifier Mathematical"),
    DecisionTreeClassifier(max_depth=1),
    NaiveBayes(name="Naive Bayes"),
    NeuralNetwork(name="NN (Logistic Loss, Hidden=16, Epochs=5, LR=0.01)",
        hidden_dim=16, epochs=5, lr=0.01, loss_type="logistic2"
    )
]
print(f"{len(models)} models defined.")

5 models defined.


### 3. Train all models
Evaluation will use a random state for reproducibility.

All defined models will be evaluated using k-fold-cross-validation to ensure robust performance. We should be able to identify overfitting and how good a model generalizes on unseen data. A higher k will result in a longer training duration.

In [7]:
print(f"\nEvaluating {len(models)} models...")

# Create evaluator and run k-fold cross-validation
evaluator = Evaluator(models=models, n_splits=3, random_state=42)

results = evaluator.evaluate(X, y, verbose=True)


Evaluating 5 models...

=== K-Fold Evaluation for Model: LinearClassifier Gradient Descent ===
Epoch 1/10, Loss: 1.224037
Epoch 2/10, Loss: 0.904395
Epoch 3/10, Loss: 0.843533
Epoch 4/10, Loss: 0.826610
Epoch 5/10, Loss: 0.781258
Epoch 6/10, Loss: 0.787429
Epoch 7/10, Loss: 0.759552
Epoch 8/10, Loss: 0.763252
Epoch 9/10, Loss: 0.751402
Epoch 10/10, Loss: 0.748896
Fold 1/3: Accuracy = 0.9760; F1 Score = 0.9613
Fold 2/3: Accuracy = 0.9840; F1 Score = 0.9710
Fold 3/3: Accuracy = 0.9782; F1 Score = 0.9613

Model: LinearClassifier Gradient Descent
Mean Accuracy: 0.9794 ± 0.0034

=== K-Fold Evaluation for Model: LinearClassifier Mathematical ===
Closed-form solution found. Final MSE loss: 0.000000
Fold 1/3: Accuracy = 0.6182; F1 Score = 0.6062
Fold 2/3: Accuracy = 0.5927; F1 Score = 0.5625
Fold 3/3: Accuracy = 0.6065; F1 Score = 0.5744

Model: LinearClassifier Mathematical
Mean Accuracy: 0.6058 ± 0.0104

=== K-Fold Evaluation for Model: DecisionTreeClassifier ===
Fold 1/3: Accuracy = 0.6953

## 4. Evaluate the trained models

In [8]:
evaluator.print_summary()

best_model_acc = evaluator.best_model("mean_accuracy")
if best_model_acc:
    print(f"\nBest performing model (ACC): {best_model_acc}")
    best_acc = results[best_model_acc]['mean_accuracy']
    best_std = results[best_model_acc]['std_accuracy']
    print(f"Best accuracy: {best_acc:.4f} ± {best_std:.4f}")
    
best_model_f1 = evaluator.best_model("mean_f1")
if best_model_f1:
    print(f"\nBest performing model (F1): {best_model_f1}")
    best_f1 = results[best_model_f1]['mean_f1']
    best_std = results[best_model_f1]['std_f1']
    print(f"Best f1-score: {best_f1:.4f} ± {best_std:.4f}")

print("\nEvaluation completed successfully!")


FINAL K-FOLD COMPARISON RESULTS
LinearClassifier Gradient Descent                 : ACC(0.9794 ± 0.0034); F1(0.9645 0.0045)
LinearClassifier Mathematical                     : ACC(0.6058 ± 0.0104); F1(0.5810 0.0184)
DecisionTreeClassifier                            : ACC(0.7149 ± 0.0140); F1(0.0000 0.0000)
Naive Bayes                                       : ACC(0.8364 ± 0.0173); F1(0.6188 0.0306)
NN (Logistic Loss, Hidden=16, Epochs=5, LR=0.01)  : ACC(0.9091 ± 0.0355); F1(0.8002 0.0964)

Best performing model (ACC): LinearClassifier Gradient Descent
Best accuracy: 0.9794 ± 0.0034

Best performing model (F1): LinearClassifier Gradient Descent
Best f1-score: 0.9645 ± 0.0045

Evaluation completed successfully!
