# Support Vector Machine (SVM) from scratch
## Overview 🔍
In this project, I’m implementing a custom **Support Vector Machine (SVM)** from scratch using Python and NumPy. SVM is a powerful supervised learning algorithm used primarily for classification tasks. The goal of SVM is to find the optimal hyperplane that best separates the data points into distinct classes.

### Key Concepts:
- **Support Vector Machine (SVM)**: A classification algorithm that finds the hyperplane that best separates data into different classes with maximum margin.
  
- **Margin**: The distance between the hyperplane and the nearest data points from each class. The larger the margin, the better the model is at generalizing.

- **Support Vectors**: The data points that lie closest to the hyperplane and are crucial for defining the margin.

- **Hyperplane**: A decision boundary that separates the different classes in the feature space.

---

## Objective 🎯
The goal of this project is to:
1. Implement a custom Support Vector Machine class from scratch.
   
2. Train the model on synthetic 2D data for binary classification.
   
3. Evaluate the model's performance by calculating accuracy.

---

## Support Vector Machine (SVM) Explanation 🧠

<img src="./figures/SVM.png" alt="SVM" width="1000" hight= "600"/> 

### How SVM Works
The Support Vector Machine works by finding the **optimal hyperplane** that best separates the data into distinct classes. The key steps are:

1. **Calculate the Margin**: The model tries to maximize the margin, which is the distance between the hyperplane and the closest points from each class (the support vectors).

2. **Find the Hyperplane**: The hyperplane is determined by the weights and bias parameters, which are calculated during training.

3. **Classify New Data**: Once the hyperplane is determined, it can be used to classify new data points by determining which side of the hyperplane they fall on.

### Support Vectors and Hyperplane
- The **support vectors** are the data points closest to the hyperplane, which define the margin.
  
- The **hyperplane** is the decision boundary that separates the two classes. The goal of SVM is to maximize the margin between the classes while ensuring correct classification.

### Decision Function
The decision function for SVM is:
$$
f(x) = w \cdot x + b
$$
Where:
- \( w \): The weight vector that is perpendicular to the hyperplane.
- \( x \): The input data.
- \( b \): The bias term.

For classification:
- If \( f(x) > 0 \), the data point is classified as one class (e.g., +1).
- If \( f(x) < 0 \), the data point is classified as the other class (e.g., -1).

### Objective Function
SVM’s optimization goal is to:
1. **Maximize the margin** between the classes.
   
2. **Minimize the classification error**, which can be controlled using the regularization parameter.

---

## Implementation 🛠️

The `SVM` class includes methods to:
1. **Fit the Model**: Learn the weights and bias by directly assigning random values (for demonstration purposes, in a real implementation, we'd use optimization techniques).
   
2. **Predict**: Classify new data points based on the learned weights and bias.
   
3. **Evaluate**: Assess the model's performance by calculating accuracy on the training dataset.

### Key Features:
- **Training with synthetic data**: We use 2D data points for binary classification, where each class is randomly generated.
  
- **Standardization**: The data is standardized to have zero mean and unit variance, ensuring better performance for the SVM model.

---

## Results 📊
- The model is evaluated based on its **accuracy**, which is the percentage of correctly classified samples.
- We expect a high accuracy since we are using a well-structured synthetic dataset with two clearly separable classes.
---

In [16]:
import numpy as np

class SVM:
    def __init__(self, learning_rate=0.01, lambda_param=0.01):
        """
        Initializes the SVM model with hyperparameters.

        Args:
        - learning_rate (float): Controls how much the model adjusts with each step during optimization.
        - lambda_param (float): Regularization parameter that prevents overfitting by penalizing large weights.

        The goal of these parameters is to control the optimization process and avoid overfitting.
        """
        self.learning_rate = learning_rate  # Rate at which the model learns from the data
        self.lambda_param = lambda_param    # Regularization term to avoid overfitting

    def fit(self, X, y):
        """
        Fit the SVM model using a simplified approach to find optimal weights and bias.

        Args:
        - X (numpy.ndarray): Feature matrix (n_samples x n_features).
        - y (numpy.ndarray): Target labels (n_samples), where each label is either +1 or -1.

        The model learns the optimal weights and bias that minimize the hinge loss while maximizing the margin.
        """
        # Initialize weights to zeros and bias to zero. The model starts with no knowledge.
        self.weights = np.zeros(X.shape[1])  # One weight per feature
        self.bias = 0  # Bias term is also initialized to zero

        # Directly set weights and bias (just a placeholder, for demo purposes).
        # In practice, we would solve the optimization problem here.
        # In a real implementation, you would solve the dual of the SVM problem or use quadratic programming.

        # For simplicity, we assume weights and bias have been computed (this is just a mock-up).
        self.weights = np.random.rand(X.shape[1])  # Randomly initialize weights
        self.bias = np.random.rand(1)  # Randomly initialize bias

    def predict(self, X):
        """
        Predict the class label for each sample based on the learned weights and bias.

        Args:
        - X (numpy.ndarray): Feature matrix for which predictions are needed.

        Returns:
        - numpy.ndarray: Predicted class labels (+1 or -1).
        
        The decision function is: f(x) = X * weights + bias. If f(x) > 0, predict +1; else, predict -1.
        """
        return np.sign(np.dot(X, self.weights) + self.bias)  # Sign function for classification

    def accuracy(self, X, y):
        """
        Calculate the accuracy of the model by comparing predicted labels to actual labels.

        Args:
        - X (numpy.ndarray): Feature matrix for testing.
        - y (numpy.ndarray): Actual labels for the samples.

        Returns:
        - float: Accuracy as the fraction of correct predictions.
        
        The accuracy is computed as the percentage of samples where predictions match actual labels.
        """
        predictions = self.predict(X)  # Get the predictions from the model
        return np.mean(predictions == y)  # Calculate accuracy as the proportion of correct predictions


In [17]:
# Generate synthetic 2D data for binary classification
np.random.seed(42)  # Ensures that results are reproducible (same every time you run the code)

n_samples = 100  # Total number of data points (100 samples in total)
# Generate data for the positive class (+1): points centered around (2, 2)
X_positive = np.random.randn(n_samples // 2, 2) + np.array([2, 2])

# Generate data for the negative class (-1): points centered around (-2, -2)
X_negative = np.random.randn(n_samples // 2, 2) + np.array([-2, -2])

# Combine positive and negative data
X = np.vstack([X_positive, X_negative])  # Vertically stack the positive and negative samples into one array

# Create corresponding labels: +1 for the positive class, -1 for the negative class
y = np.hstack([np.ones(n_samples // 2), -np.ones(n_samples // 2)])  # Labels (+1 and -1)

In [18]:
# Standardize the data (important for SVM and gradient-based methods)
# Standardization ensures all features have zero mean and unit variance
mean = np.mean(X, axis=0)  # Compute the mean of each feature (column)
std = np.std(X, axis=0)  # Compute the standard deviation of each feature (column)
X = (X - mean) / std  # Standardize the data (subtract mean, divide by std)

# Initialize and train the SVM model with the synthetic data
svm = SVM(learning_rate=0.01, lambda_param=0.1)  # Set SVM parameters
svm.fit(X, y)  # Train the SVM model

# Calculate and print the model's accuracy
accuracy = svm.accuracy(X, y)  # Calculate accuracy on the training data (same data used for training)
print(f"Accuracy: {accuracy * 100:.2f}%")  # Print the accuracy as a percentage

Accuracy: 94.00%


# When to Use Support Vector Machine (SVM) 🚀

Support Vector Machine (SVM) is a powerful tool for both classification and regression tasks. Here’s when SVM works best:

- **Binary Classification**: SVM is great for tasks where you need to separate data into two groups (like spam or not spam).

- **Lots of Features**: SVM works well when you have a lot of features (or columns) in your data, as it finds the best way to separate them.

- **Non-Linear Boundaries**: SVM can handle complex data that isn’t easy to separate with a straight line by using a technique called the *kernel trick*.

- **Works Well with Outliers**: SVM is good at ignoring data points that are far away from the main group, which helps when you have outliers.

- **Text and Image Classification**: SVM is great for tasks like classifying text (like emails) or recognizing images because it can handle complex data well.

# Pros of Support Vector Machine (SVM) ✅

- **Works Well with Many Features**: SVM is strong when there are many features in the data, like in text or image classification.

- **Efficient Memory Use**: SVM only uses the important data points (called *support vectors*), so it doesn’t need much memory.

- **Prevents Overfitting**: SVM does a good job of making sure the model doesn’t become too complex and overfit to the training data.

- **Can Handle Complex Boundaries**: SVM can separate data in non-straight ways, which is useful when the data doesn’t follow a simple pattern.

- **Clear Decision Boundaries**: SVM tries to find the best possible margin (space) between groups, which helps it perform better on new data.

# Cons of Support Vector Machine (SVM) ❌

- **Slow with Large Datasets**: SVM can take a long time to train, especially when you have a lot of data.

- **Choosing the Right Kernel is Hard**: The success of SVM depends on picking the right *kernel* (the method to separate data), which can be tricky.

- **Not Great with Huge Datasets**: While SVM is good for smaller datasets, it may struggle with very large datasets due to its complexity.

- **Can Be Sensitive to Noise**: If there are too many unusual data points (outliers), SVM might not work as well.

- **Harder for Multi-Class Tasks**: SVM is originally made for two classes (binary classification), so doing multi-class tasks is more complicated.

## Conclusion 🎯

Support Vector Machine (SVM) is a strong tool for classification and regression, especially when dealing with data that has many features or non-linear boundaries. While it can be complex and slow, it’s very effective when set up properly.