# Support Vector Machines (SVMs)
The original SVM algorithm was invented by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1964. In 1992, Bernhard Boser, Isabelle Guyon and Vladimir Vapnik suggested a way to create nonlinear classifiers by applying the kernel trick to maximum-margin hyperplanes. The "soft margin" incarnation, as is commonly used in software packages, was proposed by Corinna Cortes and Vapnik in 1993 and published in 1995.

<p align="center">
    <img src="./../assets/svm.png" width="400">
</p>

> Image from acte.in

## Logic behind SVMs
A support vector machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It works by finding the best hyperplane that separates different classes in the data, maximizing the margin between them.

### What are Support Vectors?
**Support vectors**: are the data points that lie closest to the decision boundary (hyperplane) in an SVM model. These points are crucial because they define the margin of the classifier.

**Margin**: is the distance between the hyperplane and the nearest data points from either class.

The goal of SVM is to maximize this margin, thereby creating a robust classifier that generalizes well to unseen data.

### The Role of Training Data
The amount and quality of training data significantly impact the number of support vectors and, consequently, the performance of the SVM classifier. Here’s how:
1. **Data Complexity**:  If the training data is complex and not easily separable, the SVM will require more support vectors to define the decision boundary. For instance, in a high-dimensional space with intricate patterns, more support vectors are needed to capture the nuances of the data distribution.

2. **Sample Size**: The number of training samples directly influences the number of support vectors. In scenarios where the training set is large, the SVM might end up using a substantial portion of the data as support vectors, especially if the data is noisy or not well-separated. Conversely, with a smaller training set, fewer support vectors might be sufficient, but this can lead to overfitting if the model becomes too sensitive to the limited data

3. **Feature Space**: The dimensionality of the feature space also plays a role. Higher-dimensional spaces can lead to more complex decision boundaries, requiring more support vectors. However, this also increases the risk of overfitting, where the model captures noise rather than the underlying pattern.


### Impact of Support Vectors on Classifier Performance
The number of support vectors has a direct impact on both the accuracy and computational efficiency of the SVM classifier.

#### Accuracy:
1. **Generlaization**: A model with too many support vectors might indicate overfitting, where the classifier performs well on training data but poorly on unseen data. This is because the model is too complex and captures noise in the training data.

2. **Underfitting**: Conversely, too few support vectors might lead to underfitting, where the model is too simplistic to capture the underlying patterns in the data, resulting in poor performance on both training and test data.

3. **Trade-off**: *The trade-off between margin maximization and classification error is controlled by the regularization parameter C*. A high C value allows fewer misclassifications but can lead to more support vectors and a complex model. A lower C value increases the margin but allows more misclassifications, potentially reducing the number of support vectors and simplifying the model.

#### Computional Complexity
1. **Training Time**: The training time of an SVM is influenced by the number of support vectors. More support vectors mean more computations during the training phase, as the algorithm needs to solve a larger optimization problem.

2. **Prediction Time**: During prediction, the SVM classifier computes the dot product between the test point and each support vector. Hence, a larger number of support vectors increases the prediction time, making the model less efficient for real-time applications.

#### Optimizing SVM Performance: Practical Considerations
1. **Kernel Choice**: The choice of kernel function (linear, polynomial, radial basis function, etc.) affects the number of support vectors. Non-linear kernels, while powerful, often result in more support vectors due to their ability to capture complex patterns in the data.

2. **Parameter Tuning**: Proper tuning of SVM parameters, such as the regularization parameter C and kernel parameters, is crucial. Techniques like cross-validation can help in finding the optimal parameters that balance the number of support vectors and classifier performance.

3. **Data Preprocessing**: Preprocessing steps like normalization, feature selection, and dimensionality reduction can reduce the complexity of the data, potentially decreasing the number of support vectors needed and improving the model’s performance.

<a href="https://www.geeksforgeeks.org/machine-learning/optimizing-svm-classifiers-the-role-of-support-vectors-in-training-data-and-performance/">Notes from geeksforgeeks</a>