## Support Vector Machines (SVM): A Detailed Overview

Support Vector Machines (SVM) are powerful supervised learning algorithms used for classification and regression tasks. They are particularly effective in high-dimensional spaces and for problems with clear margins of separation. This repo

## Introduction

Support Vector Machines aim to find the optimal hyperplane that separates data into different classes. For non-linearly separable data, SVM employs kernel functions to map the data into a higher-dimensional space where a linear separator can be found.


## SVM for Binary Classification

### Hyperplane and Decision Boundary

The hyperplane is the decision boundary that maximizes the margin between two classes. In a binary classification problem, the goal of SVM is to find the hyperplane defined as:

$$
\mathbf{w}^T \mathbf{x} + b = 0
$$

Where:
- $\mathbf{w}$: Weight vector.
- $\mathbf{x}$: Feature vector.
- $b$: Bias term.

The classes are separated as:
- $\mathbf{w}^T \mathbf{x} + b > 0$ for class $+1$
- $\mathbf{w}^T \mathbf{x} + b < 0$ for class $-1$

### Mathematics of SVM

The margin is the distance between the hyperplane and the nearest data points from each class (support vectors). The optimization problem can be formulated as:

#### Objective Function:

$$
\min_{\mathbf{w}, b} \frac{1}{2} \|\mathbf{w}\|^2
$$

#### Subject to:

$$
y_i (\mathbf{w}^T \mathbf{x}_i + b) \geq 1 \quad \forall i
$$

Where:
- $\|\mathbf{w}\|$ is the norm of the weight vector (controls margin width).
- $y_i \in \{-1, +1\}$ are the class labels.
- $\mathbf{x}_i$ are the feature vectors.


## Kernel Trick

For non-linearly separable data, the kernel trick maps the input features into a higher-dimensional space, allowing a linear hyperplane to separate the data. The kernel function computes the dot product in the transformed feature space without explicitly computing the transformation.

Common kernel functions include:

1. **Linear Kernel:**
   $$
   K(\mathbf{x}_i, \mathbf{x}_j) = \mathbf{x}_i^T \mathbf{x}_j
   $$

2. **Polynomial Kernel:**
   $$
   K(\mathbf{x}_i, \mathbf{x}_j) = (\mathbf{x}_i^T \mathbf{x}_j + c)^d
   $$

3. **Gaussian (RBF) Kernel:**
   $$
   K(\mathbf{x}_i, \mathbf{x}_j) = \exp\left(-\frac{\|\mathbf{x}_i - \mathbf{x}_j\|^2}{2\sigma^2}\right)
   $$

4. **Sigmoid Kernel:**
   $$
   K(\mathbf{x}_i, \mathbf{x}_j) = \tanh(\alpha \mathbf{x}_i^T \mathbf{x}_j + c)
   $$


## Soft Margin and Regularization

Real-world data is often noisy and non-linearly separable. To handle such cases, SVM introduces a soft margin, allowing some misclassifications. The optimization problem becomes:

#### Objective Function:

$$
\min_{\mathbf{w}, b, \xi} \frac{1}{2} \|\mathbf{w}\|^2 + C \sum_{i=1}^n \xi_i
$$

#### Subject to:

$$
\begin{aligned}
& y_i (\mathbf{w}^T \mathbf{x}_i + b) \geq 1 - \xi_i \\
& \xi_i \geq 0 \quad \forall i
\end{aligned}
$$

Where:
- $\xi_i$: Slack variable representing the degree of misclassification.
- $C$: Regularization parameter that controls the trade-off between margin width and classification error.


## Mathematics of SVM Optimization

To solve the optimization problem, SVM uses the Lagrange multipliers. The dual form of the optimization problem is:

#### Dual Objective Function:

$$
\max_{\alpha} \sum_{i=1}^n \alpha_i - \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \alpha_i \alpha_j y_i y_j K(\mathbf{x}_i, \mathbf{x}_j)
$$

#### Subject to:

$$
\begin{aligned}
& \sum_{i=1}^n \alpha_i y_i = 0 \\
& 0 \leq \alpha_i \leq C \quad \forall i
\end{aligned}
$$

Where:
- $\alpha_i$: Lagrange multipliers.
- $K(\mathbf{x}_i, \mathbf{x}_j)$: Kernel function.

The decision function is:

$$
\hat{y} = \text{sign}\left(\sum_{i=1}^n \alpha_i y_i K(\mathbf{x}_i, \mathbf{x}) + b\right)
$$



In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load Titanic dataset
file_path = 'titanic.csv'  # Adjust this if needed
titanic_data = pd.read_csv(file_path)

### Step 1: Preprocessing
1. Drop irrelevant features
2. Handle missing values
3. Encode categorical variables
4. Separate features and target variable

In [20]:
titanic_data_cleaned = titanic_data.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin'])

titanic_data_cleaned['Age'].fillna(titanic_data_cleaned['Age'].mean(), inplace=True)  # Fill Age with mean
titanic_data_cleaned['Embarked'].fillna('missing', inplace=True)  # Fill Embarked with placeholder

titanic_data_encoded = pd.get_dummies(titanic_data_cleaned, drop_first=True)

X = titanic_data_encoded.drop(columns=['Survived'])
y = titanic_data_encoded['Survived']

#### Step 2: Standardize the data

In [24]:
X_mean = X.mean()
X_std = X.std()
X_standardized = (X - X_mean) / X_std

### Step 3: Compute the covariance matrix

In [27]:
cov_matrix = np.cov(X_standardized.T)

### Step 4: Compute eigenvalues and eigenvectors

In [30]:
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

### Step 5: Sort eigenvalues and eigenvectors in descending order

In [33]:
sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_indices]
eigenvectors = eigenvectors[:, sorted_indices]

### Step 6: Project data onto the top 4 principal components

In [36]:
k = 5  
top_eigenvectors = eigenvectors[:, :k]
X_pca = np.dot(X_standardized, top_eigenvectors)
explained_variance_ratio = eigenvalues / np.sum(eigenvalues)
print("Explained Variance Ratio:", explained_variance_ratio[:k])

Explained Variance Ratio: [0.2052705  0.19121186 0.17205714 0.10949081 0.09226028]


In [38]:
X=X_pca

i did pca above 
# final Implementation

In [44]:
def hinge_loss(w, X, y, C=1):
    return 0.5 * np.dot(w, w) + C * np.sum(np.maximum(0, 1 - y * (np.dot(X, w))))

def gradient(w, X, y, C=1):
    return w - C * np.dot(X.T, (y * (np.maximum(0, 1 - y * np.dot(X, w)))))

def train_svm(X, y, learning_rate=0.001, epochs=1000, C=1):
    w = np.zeros(X.shape[1])
    y = 2 * y - 1  # Convert target to {-1, 1}
    
    for epoch in range(epochs):
        grad = gradient(w, X, y, C)
        w -= learning_rate * grad
        
    return w
w = train_svm(X, y, learning_rate=0.001, epochs=1000, C=1)

y_pred = np.sign(np.dot(X, w))  # Predictions in -1, 1

y_pred_binary = np.where(y_pred == -1, 0, 1)

TP = np.sum((y_pred_binary == 1) & (y == 1))
TN = np.sum((y_pred_binary == 0) & (y == 0))
FP = np.sum((y_pred_binary == 1) & (y == 0))
FN = np.sum((y_pred_binary == 0) & (y == 1))

accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP) if (TP + FP) != 0 else 0
recall = TP / (TP + FN) if (TP + FN) != 0 else 0
f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0

print(f"Accuracy: {accuracy * 100:.2f}%")
print(f"Precision: {precision * 100:.2f}%")
print(f"Recall: {recall * 100:.2f}%")
print(f"F1 Score: {f1_score * 100:.2f}%")

Accuracy: 77.67%
Precision: 70.61%
Recall: 71.64%
F1 Score: 71.12%
