In [79]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

In [81]:
data = pd.read_csv(r"C:\Users\ROHIT\OneDrive\Desktop\SM\AIML\datasets\emails_16_17_18_19.csv")

In [83]:
data.head()

Unnamed: 0,Email No.,the,to,ect,and,for,of,a,you,hou,...,connevey,jay,valued,lay,infrastructure,military,allowing,ff,dry,Prediction
0,Email 1,0,0,1,0,0,0,2,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Email 2,8,13,24,6,6,2,102,1,27,...,0,0,0,0,0,0,0,1,0,0
2,Email 3,0,0,1,0,0,0,8,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Email 4,0,5,22,0,5,1,51,2,10,...,0,0,0,0,0,0,0,0,0,0
4,Email 5,7,6,17,1,5,2,57,0,9,...,0,0,0,0,0,0,0,1,0,0


In [85]:
data.shape

(5172, 3002)

In [87]:
#Prepare Data
X = data.drop(columns=["Email No.","Prediction"]).astype(np.float32).values
y = data["Prediction"].values

In [89]:
#Split data
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)

In [91]:
#Convert labels {0,1} -> {-1,1}
y_train = np.where(y_train == 0,-1,1)
y_test_mod = np.where(y_test == 0,-1,1)

In [93]:
# Polynomial Kernel Function
def polynomial_kernel(x1, x2, degree=2, coef0 = 1):
    return (np.dot(x1,x2.T)+coef0) ** degree

In [95]:
class SVM:
    def __init__ (self, lr = 0.001, lambda_param = 0.01, n_iters = 100, degree = 2, coef0 = 1):
        self.lr = lr
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.degree = degree
        self.coef0 = coef0
        self.alpha = None
        self.b = None

    def fit(self,X,y):
        n_samples = X.shape[0]
        self.alpha = np.zeros(n_samples)
        self.b = 0

        K = polynomial_kernel(X,X,self.degree, self.coef0)

        for _ in range(self.n_iters):
            for i in range(n_samples):
                pred = np.sum(self.alpha * y * K[:, i]) + self.b
                condition = y[i] * pred >=1
                                    
                if condition:
                    self.alpha[i] -= self.lr * (self.lambda_param * self.alpha[i])
                else:
                    self.alpha[i] -= self.lr * (1 -  y[i] * pred)
                    self.b += self.lr * y[i]

    def predict(self, X_train, X):
            K = polynomial_kernel(X_train, X, self.degree, self.coef0)
            return np.sign(np.dot((self.alpha * y_train),K)+self.b)



In [97]:
#Train and Evaluate
svm_poly = SVM(lr = 0.0005, lambda_param = 0.01, n_iters = 30)
svm_poly.fit(X_train,y_train)
y_pred_poly = svm_poly.predict(X_train, X_test)

  pred = np.sum(self.alpha * y * K[:, i]) + self.b


In [101]:
print("Confusion Matrix:\n", confusion_matrix(y_test_mod,y_pred_poly))
print("\nClassification report:\n",classification_report(y_test_mod, y_pred_poly,target_names = ["Not Span", "Spam"]))

Confusion Matrix:
 [[739   0]
 [296   0]]

Classification report:
               precision    recall  f1-score   support

    Not Span       0.71      1.00      0.83       739
        Spam       0.00      0.00      0.00       296

    accuracy                           0.71      1035
   macro avg       0.36      0.50      0.42      1035
weighted avg       0.51      0.71      0.59      1035



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Good afternoon, ma’am. This script builds a spam vs. not-spam email classifier using a kernel SVM coded from scratch and evaluates it with a confusion matrix and a classification report. First, I import NumPy and pandas for numerical arrays and data frames, Matplotlib isn’t used later but is often handy for plots, and from scikit-learn I import the train/test split and classification metrics we’ll print at the end. I read the CSV emails_16_17_18_19.csv into data, quickly call head() to preview the first few rows and shape to know rows and columns (those two calls display information but aren’t stored). To prepare features and labels, I drop the non-predictive columns “Email No.” and “Prediction” from data, convert the remaining feature columns to float32, and take their .values into $\mathbf{X}$. The target $\mathbf{y}$ is the original Prediction column as a NumPy array. I split $\mathbf{X}$ and $\mathbf{y}$ into training and testing sets with an $80/20$ split and a fixed random seed for reproducibility. Because an SVM in this implementation expects labels $-1$ and $+1$, I map the original labels so that class $0$ (Not Spam) becomes $-1$ and class $1$ (Spam) becomes $+1$; I store the mapped training labels in $\mathbf{y}_{\text{train}}$ and the mapped test labels in $\mathbf{y}_{\text{test\_mod}}$.Next, I define a polynomial kernel function polynomial_kernel(x1, x2, degree=2, coef0=1), which returns $(x_1 \cdot x_2^\top + \text{coef}0)^{\text{degree}}$. This lets the SVM separate classes in a higher-dimensional feature space without explicitly creating polynomial features. Then I define the SVM class. In __init__, I set hyperparameters: the learning rate lr, regularization $\text{lambda\_param}$, number of passes n_iters, and the kernel parameters degree and coef0. I also initialize the dual weights $\alpha$ and the bias $b$ (decision threshold), which the algorithm will learn. In fit, I start by creating $\alpha$ as a zero vector of length equal to the number of training samples and set $b = 0$. I precompute the Gram matrix $\mathbf{K} = \text{polynomial\_kernel}(\mathbf{X}, \mathbf{X}, \dots)$, where each entry $\mathbf{K}[j, i]$ equals the kernel similarity between training sample $j$ and sample $i$. Then I run $\text{n\_iters}$ training passes; on each pass I loop over all samples. For sample $i$, I compute the decision value $\text{pred} = \Sigma_j (\alpha_j \cdot y_j \cdot \mathbf{K}[j, i]) + b$. I check the margin condition $y[i] \cdot \text{pred} \ge 1$: if it holds, the point is correctly classified with margin, so I apply only regularization by shrinking $\alpha[i]$ a little; otherwise, the point violates the margin or is misclassified, so I update $\alpha[i]$ in the direction that reduces hinge loss and also adjust the bias $b$ by a small step toward the true label $y[i]$. This iterative procedure moves the decision boundary to better separate spam and not-spam in the kernel-induced space. The predict method computes the kernel between all training points and the test matrix, forms the decision function $(\alpha \odot y_{\text{train}})^\top \mathbf{K} + b$, and returns the $\text{sign}$ ($-1$ or $+1$) as the predicted class.To train and evaluate, I create svm_poly with a small learning rate, some regularization, $30$ iterations, and a degree-2 polynomial kernel. I call fit on the training data, then predict on the test set to get $\mathbf{y}_{\text{pred\_poly}}$. Finally, I print the confusion matrix—which shows counts of true/false positives and negatives—and the classification report, which summarizes precision, recall, and F1-score for each class (labels shown as “Not Span” and “Spam”—that label has a small typo). Overall, this code loads and prepares the email feature matrix, implements a from-scratch kernel SVM that learns dual weights via a simple gradient-style loop on the hinge-loss condition, makes predictions by evaluating the kernel against the training set, and reports clear metrics to judge how well it distinguishes spam from non-spam emails.