# Support Vector Machine

## For Binary Classification

### Decision Function

The decision function for a Support Vector Machine (SVM) in binary classification is:


### W<sup>T</sup> X + b = Label


where:

- \( W \) is the weight vector.
- \( X \) is the input vector.
- \( b \) is the bias term.

### Decision Boundary

This equation represents a hyperplane that separates the two classes in feature space.

### Classification:

1. **Positive Class:** If \( W^T X + b > 0 \), the input \( X \) is classified as belonging to one class.
2. **Negative Class:** If \( W^T X + b < 0 \), the input \( X \) is classified as belonging to the other class.

### Objective of SVM

The objective of SVM is to find the hyperplane that maximizes the margin between the two classes. This is done by placing support vectors (key data points) on the margin boundaries to improve classification accuracy and robustness.


### SVM Components

<ol>
<li>HyperPlane</li><br>

<li>Support Vector</li><br>

<li>Margin</li><br>
    
<li>Kernels</li><br>

</ol>


### SVM Kernels

<ol>
<li>Linear</li><br>

<li>Polynomial</li><br>

<li>Radial Basis Function (rbf)</li><br>
    
<li>Sigmoid</li><br>

</ol>


1.**Linear**:

K(x1,x2) = X1<sup>T</sup>X2


2.**Polynomial**:

K(x1,x2) = (X1<sup>T</sup>X2 +r)<sup>d</sup>

3.**Radial Basis Function (rbf)**:

K(x1,x2) = exp(-γ || X1 - X2|| <sup>2</sup>)



4.**Sigmoid Kernel**:

K(x1,x2) = tanh (γ . X<sub>1</sub><sup>T</sup>X<sub>2</sub>+r)

## Loss Function

<p style="color:red">Here we are going to use Hinge Loss</p>

## <mark>L = max(0,1 - y<sub>i</sub>( W<sup>T</sup>.X + b))</mark>

`0 - For Correct`

`1 - For Wrong`

`loss = max(0, 1 - decision_value)
`


## Gradient for SVM Classifier

### if ( y . (W.X - b)>=1) :

dJ/dw = 2 λw

dJ/db = 0 

### else ( y . (W.X - b)<1):


dJ/dw = 2 λw - y.x

dJ/db = y

# Dual Form and the Alpha Coefficients

When data is not linearly separable, we switch to a kernelized SVM, which relies on the dual form of the SVM optimization problem.

In the dual form:

<ol>

<li>Instead of learning the weights directly, we learn alpha coefficients (aplha) or each training sample i </li><br>

<li>Each alpha<sub>i</sub> represents the "importance" of each training sample in determining the decision boundary</li><br>

<li>Most of the alpha<sub>i</sub> values will be zero except for a few "support vectors" — the points closest to the decision boundary that influence its position and orientation.</li>

</ol>

In [None]:
import cupy as cp

class SVM:
    def __init__(self, learning_rate=0.01, num_of_iter=1000, lambda_parameter=0.01, kernel='linear'):
        self.learning_rate = learning_rate
        self.num_of_iter = num_of_iter
        self.lambda_parameter = lambda_parameter
        self.kernel = kernel
        self.alpha = None  # Coefficients for support vectors
        self.bias = 0
        self.X_train = None
        self.y_train = None

    # Kernel functions
    def linear_kernel(self, X1, X2):
        return cp.dot(X1, X2)

    def polynomial_kernel(self, X1, X2, degree=3, r=1):
        return (cp.dot(X1, X2) + r) ** degree

    def rbf_kernel(self, X1, X2, gamma=0.1):
        return cp.exp(-gamma * cp.linalg.norm(X1 - X2) ** 2)

    def sigmoid_kernel(self, X1, X2, gamma=0.1, r=1):
        return cp.tanh(gamma * cp.dot(X1, X2) + r)

    def apply_kernel(self, X1, X2):
        if self.kernel == 'linear':
            return self.linear_kernel(X1, X2)
        elif self.kernel == 'polynomial':
            return self.polynomial_kernel(X1, X2)
        elif self.kernel == 'rbf':
            return self.rbf_kernel(X1, X2)
        elif self.kernel == 'sigmoid':
            return self.sigmoid_kernel(X1, X2)
        else:
            raise ValueError(f"Unknown kernel: {self.kernel}")

    def fit(self, X, y):
        
        X = cp.asarray(X)
        y = cp.asarray(y)
        
        # Initialize parameters
        self.rows, self.cols = X.shape
        self.alpha = cp.zeros(self.rows)  # Alpha coefficients for dual SVM
        self.bias = 0
        y_mod = cp.where(y == 0, -1, 1)  # Convert labels to -1 and 1
        self.X_train = X
        self.y_train = y_mod

        # Training using stochastic gradient descent on the dual form
        for _ in range(self.num_of_iter):
            for i in range(self.rows):
                # Calculate the decision function with the kernel applied
                decision_value = y_mod[i] * (cp.sum(cp.array([self.alpha[j] * y_mod[j] * self.apply_kernel(X[j], X[i])
                                                     for j in range(self.rows)])) + self.bias)
                
                # Hinge loss condition
                if decision_value < 1:
                    # Update alpha and bias using hinge loss gradient
                    self.alpha[i] += self.learning_rate * (1 - decision_value)
                    self.bias += self.learning_rate * y_mod[i]
                else:
                    # L2 regularization only
                    self.alpha[i] -= self.learning_rate * self.lambda_parameter * self.alpha[i]
    
    def predict(self, X):
        X = cp.asarray(X)  
        y_pred = []
        
        for x in X:
            # Sum up contributions from each support vector
            output = cp.sum(cp.array([self.alpha[j] * self.y_train[j] * self.apply_kernel(self.X_train[j], x)
                             for j in range(self.rows)])) + self.bias
            
            predicted_label = cp.sign(output)
            y_pred.append(1 if predicted_label > 0 else 0)
        
        return cp.array(y_pred)


In [None]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import pandas as pd

In [None]:
df = pd.read_csv("diabetes_prediction_dataset.csv")

In [None]:
df.head()

### Basic Pre Processing

In [None]:
df.loc[df['gender'] == 'Other', 'gender'] = 'Female'

df['gender'] = pd.get_dummies(df['gender'],drop_first=True).astype(int)

df.drop(columns='smoking_history',inplace=bool(1))

In [None]:
X , x ,Y , y = train_test_split(df.iloc[:,:-1].values,df.iloc[:,-1].values)

In [None]:
model = SVM()

In [None]:
model.fit(X,Y)

In [None]:
y_pred = model.predict(x)

In [None]:
print("The accuraccy of the model is ",accuracy_score(y,y_pred))