## Support Vector Machine (SVM) to the &quot;Social_Network_Ads.csv&quot; dataset for classification

In [11]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix


In [2]:
# Load the dataset
data = pd.read_csv("Social_Network_Ads.csv")

In [3]:
X = data.iloc[:, [2, 3]].values  
y = data.iloc[:, 4].values        

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

In [5]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [6]:
# SVM Model Selection
# We'll use a radial basis function (RBF) kernel, as it works well with non-linearly separable data
classifier = SVC(kernel='rbf', random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

In [7]:
# Model Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.93


In [8]:
# Calculating the confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)

In [9]:
# Displaying the confusion matrix
print("\nConfusion Matrix:")
print(conf_matrix)


Confusion Matrix:
[[64  4]
 [ 3 29]]


In [10]:
# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.94      0.95        68
           1       0.88      0.91      0.89        32

    accuracy                           0.93       100
   macro avg       0.92      0.92      0.92       100
weighted avg       0.93      0.93      0.93       100



1. Preprocessing Steps:
   - Feature Scaling: We used the StandardScaler to scale the features (Age and EstimatedSalary) to have a mean of 0 and a standard deviation of 1. This step is essential for SVM because it works better when features are on the same scale.
   - Train-Test Split: We split the dataset into training and testing sets using train_test_split. This allows us to train the model on one set of data and evaluate its performance on another set, which helps to assess how well the model generalizes to unseen data.

2. Kernel Choice:
   - Radial Basis Function (RBF) Kernel: We selected the RBF kernel for the SVM model. This kernel is suitable for cases where the data is not linearly separable, as it can map the data to a higher-dimensional space where it may become separable. RBF kernel is a popular choice and often performs well in practice for a wide range of datasets.

3. Evaluation Metrics Used:
   -Accuracy: This metric measures the overall correctness of the model's predictions, which is the ratio of correctly predicted instances to the total instances.
   - Classification Report: This report provides a detailed summary of various evaluation metrics such as precision, recall, F1-score, and support for each class. It gives insights into the model's performance for both classes (purchased and not purchased).
   - Confusion Matrix: The confusion matrix provides a detailed breakdown of predictions, showing the counts of true positive, true negative, false positive, and false negative predictions. It gives a more granular understanding of the model's performance, especially in terms of errors made for each class.