# Exercise 3: Implementing SVM for Spam Email Classification

Support Vector Machine (SVM) is a supervised machine learning algorithm primarily used for classification, but also effective for regression.

It works by finding the best boundary (hyperplane) that separates data points of different classes with the largest margin. The margin is the distance between the hyperplane and the closest data points from either class. These closest points are called support vectors. Kernels help SVM handle non-linear data by transforming it into a higher-dimensional space where it becomes linearly separable.

Common Kernel Types: Kernel Description linear For linearly separable data rbf (Gaussian) Popular for non-linear data poly Polynomial separation sigmoid Similar to a neural network activation function.

Regularization Parameter C C controls the trade-off between maximizing the margin and minimizing classification error. Small C → Wider margin, allows some misclassification (soft margin) Large C → Narrower margin, stricter classification (hard margin)

In [2]:
import pandas as pd
data={
    'Email Length': [120, 350, 180, 500, 200, 400, 150, 300],
    'Num. of Special Characters': [5, 12, 2, 15, 3, 10, 1, 8],
    'Num. of Uppercase Letters': [3, 10, 1, 20, 2, 8, 1, 7],
    'Num. of Links': [0, 3, 0, 5, 1, 2, 0, 3],
    'Spam (Target)': [0, 1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)
df.head()

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

X = df.drop('Spam (Target)', axis=1)
y = df['Spam (Target)']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

kernels=['linear','rbf','poly','sigmoid']
results={}
for kernel in kernels:
    model = SVC(kernel=kernel, C=1.0)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)
    results[kernel] = {
        'Accuracy': accuracy_score(y_test, y_pred),
        'Precision': report['1']['precision'],
        'Recall': report['1']['recall'],
        'F1-Score': report['1']['f1-score'],
        'Confusion Matrix': confusion_matrix(y_test, y_pred)
    }
print(results)

{'linear': {'Accuracy': 1.0, 'Precision': 1.0, 'Recall': 1.0, 'F1-Score': 1.0, 'Confusion Matrix': array([[2]])}, 'rbf': {'Accuracy': 1.0, 'Precision': 1.0, 'Recall': 1.0, 'F1-Score': 1.0, 'Confusion Matrix': array([[2]])}, 'poly': {'Accuracy': 0.0, 'Precision': 0.0, 'Recall': 0.0, 'F1-Score': 0.0, 'Confusion Matrix': array([[0, 0],
       [2, 0]])}, 'sigmoid': {'Accuracy': 1.0, 'Precision': 1.0, 'Recall': 1.0, 'F1-Score': 1.0, 'Confusion Matrix': array([[2]])}}


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
