Aim:

Perform SVM on non vectorial dataset

Algorithm:

Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks. It works by finding the hyperplane that best separates data points of different classes in a high-dimensional space. The goal of SVM is to maximize the margin, which is the distance between the closest points of each class (known as support vectors) to the hyperplane. SVM can efficiently handle both linear and non-linear classification problems by using different kernel functions, such as linear, polynomial, or radial basis function (RBF) kernels. Due to its ability to work well with high-dimensional data and its focus on maximizing the margin, SVM is particularly effective for tasks where the classes are clearly separated.

Code:



In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

In [2]:
url = "spam.csv"
df = pd.read_csv(url, encoding="latin-1")


In [3]:
df.drop(["Unnamed: 2","Unnamed: 3","Unnamed: 4"],axis = 1,inplace=True)
df.head()

Unnamed: 0,v1,v2
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [4]:
df.columns = ["label", "message"]

df['label'] = df['label'].map({'ham': 0, 'spam': 1})

In [5]:
X_train, X_test, y_train, y_test = train_test_split(df['message'], df['label'], test_size=0.2, random_state=42)

vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)



In [6]:
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_model.fit(X_train_tfidf, y_train)

y_pred = svm_model.predict(X_test_tfidf)

Output:

In [7]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print(classification_report(y_test, y_pred))

Accuracy: 0.9767
              precision    recall  f1-score   support

           0       0.97      1.00      0.99       965
           1       0.99      0.83      0.91       150

    accuracy                           0.98      1115
   macro avg       0.98      0.92      0.95      1115
weighted avg       0.98      0.98      0.98      1115



Result:

SVM has been performed on non vectorial dataset.