# Day 53 – Support Vector Machine (SVM) Classification

## Introduction

In this notebook, I focus on the **Support Vector Machine (SVM) Classifier**, a supervised learning algorithm that finds the optimal decision boundary (hyperplane) to separate different classes.

I begin with a short theory on how SVM works, the concept of **margins and support vectors**, and the role of the **kernel trick** in handling non-linear data. Then I implement an SVM model in Python, evaluate its performance using metrics such as **accuracy, confusion matrix, and classification report**, and also make **future predictions** using the trained model.

By the end of this notebook, it becomes clear how SVM builds robust classifiers, why kernels are important for complex datasets, and how SVM can be applied effectively for classification tasks.

---

## 1. Introduction to Support Vector Machines (SVM)

Support Vector Machines are a supervised machine learning algorithm used for both classification and regression tasks. In classification, an SVM's goal is to find an optimal way to separate data points belonging to different classes. The algorithm is particularly effective in high-dimensional spaces and in cases where the number of features is greater than the number of samples.

## 2. SVM Intuition: The Hyperplane

At its core, an SVM classifier works by finding a **hyperplane** that best divides a dataset into its classes.

-   **Hyperplane**: In a 2-dimensional space, a hyperplane is just a line. In a 3-dimensional space, it's a flat plane. For data with more than three dimensions, it's an abstract concept that separates the data points. The goal of the SVM is to find the "best" hyperplane.

-   **Support Vectors**: These are the data points from each class that are closest to the hyperplane. They are called "support vectors" because they are the points that "support" or define the position and orientation of the hyperplane. They are the most critical data points for the model, as the model's performance would change if they were removed.

-   **Margin**: The margin is the distance between the hyperplane and the nearest data points from each class (the support vectors). The SVM's primary objective is to find the hyperplane that maximizes this margin.

-   **Maximum Margin**: The best possible hyperplane is the one that has the largest possible distance to the support vectors. This is known as the **maximum margin hyperplane**. Maximizing the margin provides a clearer separation between classes and helps the model generalize better to new, unseen data, which reduces the risk of overfitting.



---

## 3. Linear vs. Non-Linear SVM

SVMs can handle both linear and non-linear classification problems.

-   **Linear SVM**: If the data can be separated by a single straight line (or a flat plane in higher dimensions), the SVM will find the optimal linear hyperplane.

-   **Non-Linear SVM**: When the data points are not linearly separable, a straight line is insufficient. In such cases, the SVM uses a technique called the **kernel trick** to transform the data into a higher-dimensional space where a linear separation is possible.

## 4. Important SVM Concepts: The Kernel Trick

The **kernel trick** is one of the most powerful concepts in SVM. It allows SVMs to work on complex, non-linear data without actually computing the coordinates in a higher-dimensional space.

-   **How it Works**: A kernel function calculates the similarity between two data points. Instead of transforming the data into a new space (which would be computationally expensive), the kernel function allows the SVM to find a hyperplane in the original feature space that corresponds to a linear separation in the higher-dimensional space.
-   **Common Kernels**: Some popular kernel functions include:
    * **Linear**: Used when the data is linearly separable.
    * **Polynomial**: A powerful kernel for non-linear problems.
    * **Radial Basis Function (RBF)**: A very common and versatile kernel that is effective for most non-linear datasets.

By understanding these core components—hyperplanes, support vectors, margins, and kernels—you can grasp the powerful intuition behind how SVMs work to find the best possible boundary for classification.

---

## Import Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')

## Load the dataset

In [2]:
dataset = pd.read_csv(r"C:\Users\Arman\Downloads\dataset\logit classification.csv")

In [3]:
dataset

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0
...,...,...,...,...,...
395,15691863,Female,46,41000,1
396,15706071,Male,51,23000,1
397,15654296,Female,50,20000,1
398,15755018,Male,36,33000,0


## Feature Selection
### Split into features (X) and target (y)
- X: Features (Age, EstimatedSalary)
- y: Target (Purchased)

In [4]:
X = dataset[["Age", "EstimatedSalary"]].values
y = dataset["Purchased"].values

## Splitting the dataset into the Training set and Test set

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

## Feature Scaling
### Apply StandardScaler

In [6]:
sc = StandardScaler() 
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Train SVM Classifier

In [7]:
svc = SVC(kernel='rbf', C=1.0, gamma='scale',probability=True)
svc.fit(X_train, y_train)

## Model Evaluation

In [8]:
y_pred = svc.predict(X_test)

print("SVM Classifier Results")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

SVM Classifier Results
Accuracy: 0.93
Confusion Matrix:
 [[64  4]
 [ 3 29]]
Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.94      0.95        68
           1       0.88      0.91      0.89        32

    accuracy                           0.93       100
   macro avg       0.92      0.92      0.92       100
weighted avg       0.93      0.93      0.93       100



## Training and Testing Accuracy

In [9]:
bias = svc.score(X_train,y_train)
print(bias) 

variance = svc.score(X_test,y_test)
print(variance)

0.9133333333333333
0.93


## Now Predict Future Data
## Upload Future Data

In [10]:
dataset1 = pd.read_csv(r"C:\Users\Arman\Downloads\dataset\final1.csv")

In [11]:
d2 = dataset1.copy()

In [12]:
d2

Unnamed: 0.1,Unnamed: 0,User ID,Gender,Age,EstimatedSalary
0,0,15724611,Male,45,60000
1,1,15725621,Female,79,64000
2,2,15725622,Male,23,78000
3,3,15720611,Female,34,45000
4,4,15588044,Male,29,76000
5,5,15746039,Female,70,89000
6,6,15704887,Male,86,120000
7,7,15746009,Female,46,23000
8,8,15876009,Male,32,70000
9,9,15886009,Female,100,90000


### Dataset Description – Unseen Data
This dataset contains new records with the same feature columns as the training dataset.  
- Features: [Age, Estimated Salary]  
- Target: Not available (since this is future data).  
Goal: Predict class labels for these unseen samples using the trained Logistic Regression model.  

## Select Required Columns

In [13]:
dataset1 = dataset1.iloc[:, [3, 4]].values

## Feature Scaling

In [14]:
sc = StandardScaler()
M = sc.fit_transform(dataset1)

## Prediction

In [15]:
y_pred1 = pd.DataFrame()

In [16]:
d2 ['y_pred1'] = svc.predict(M)

In [17]:
d2

Unnamed: 0.1,Unnamed: 0,User ID,Gender,Age,EstimatedSalary,y_pred1
0,0,15724611,Male,45,60000,0
1,1,15725621,Female,79,64000,1
2,2,15725622,Male,23,78000,0
3,3,15720611,Female,34,45000,0
4,4,15588044,Male,29,76000,0
5,5,15746039,Female,70,89000,1
6,6,15704887,Male,86,120000,1
7,7,15746009,Female,46,23000,0
8,8,15876009,Male,32,70000,0
9,9,15886009,Female,100,90000,1


## Interpretation of Predictions
- Each predicted value (0 or 1) indicates the class label assigned by the Logistic Regression model.
- This demonstrates how the trained model can generalize beyond the original dataset.

## Save Result to CSV¶

In [18]:
d2.to_csv('final2.csv')

---
## Summary

In this notebook, I explored the **Support Vector Machine (SVM) Classifier**, starting with the theoretical concepts of **hyperplanes, margins, and support vectors**, followed by the **kernel trick** to handle non-linear data. I then implemented SVM using Python’s `scikit-learn`, applied feature scaling, trained the model, and evaluated its performance through **accuracy, confusion matrix, and classification report**.

Additionally, I used the trained model to make **future predictions**, demonstrating how SVM can be applied to unseen data. The experiments highlighted how SVM constructs an optimal decision boundary and adapts to complex datasets with the help of kernels.

Overall, the notebook provides both **conceptual clarity** and **practical understanding** of how SVM works and why it is considered one of the most reliable algorithms for classification tasks.



## Key Takeaways

* **SVM is a margin-based classifier** that seeks the optimal hyperplane to separate classes.
* **Support vectors** are the critical data points that influence the decision boundary.
* The **kernel trick** enables SVM to handle non-linear and high-dimensional datasets.
* **Scaling features** is essential for SVM to perform effectively.
* Model performance can be evaluated using metrics like **accuracy, confusion matrix, and classification report**, and further enhanced with **ROC–AUC analysis**.
* SVM works well for both **binary and multi-class classification problems**, but it can be computationally heavy on very large datasets.
* Using the trained model for **future predictions** shows how SVM generalizes to unseen data.
