### Support Vector Machine (SVM)

---

#### **What is a Support Vector Machine (SVM)?**
- SVM is a **supervised machine learning algorithm** that finds the best hyperplane to separate two classes.

**Key difference from logistic regression:**
  - **Logistic Regression**: Probabilistic approach.
  - **SVM**: Based on statistical approaches.

- **Which hyperplane is selected?**
  - Among the many possible hyperplanes, SVM selects the one with the **maximum margin**, which is the maximum distance between the classes.

---

#### **Logistic Regression vs Support Vector Machine (SVM)**
- **When to use each**:
  - **SVM**: Best for small, complex datasets.
  - **Logistic Regression**: Good starting point; if performance is poor, try SVM without a kernel.
- Both algorithms seek the best hyperplane, but feature selection can determine which is more efficient.

---

#### **Types of SVM Algorithms**
- **Linear SVM**: Used when data is perfectly linearly separable (a single straight line in 2D).
- **Non-Linear SVM**: When data is not linearly separable; employs **kernel tricks** to classify. Most real-world data requires this.

---

#### **Important Terms**
- **Support Vectors**: Points closest to the hyperplane; they define the separating boundary.
- **Margin**: The distance between the hyperplane and the closest data points (support vectors). A larger margin is preferable.

---

#### **Margins in Support Vector Machine (SVM)**
- **Hard Margin**: No tolerance for misclassification, only works if data is perfectly separable.
- **Soft Margin**: Allows some misclassifications for better generalization.

---

#### **How Does Support Vector Machine Work?**
- SVM operates based on **support vectors**, ignoring other observations.
- This is a speed advantage over logistic regression, which considers all points.
- **Goal**: Maximize the margin using support vectors to determine the best hyperplane.
- A **hyperplane** is a decision boundary in higher dimensions.

---

#### **Mathematical Intuition Behind Support Vector Machine**
- SVM involves complex mathematics, but for implementation:
  - Focus on the margin, primal and dual formulations, and dot product concepts.
  - Topics like **Lagrange multipliers** are more relevant for research purposes.



### Understanding the Dot Product

---

#### **Vectors and Mathematical Operations**
- A **vector** is a quantity with both **magnitude** and **direction**.
- Just like numbers, vectors can undergo mathematical operations such as **addition** and **multiplication**.
- Vector multiplication can be performed in two ways:
  - **Dot Product**: Results in a scalar value.
  - **Cross Product**: Results in another vector.

---

#### **Definition of Dot Product**
- The dot product measures the **projection of one vector onto another**, multiplied by the magnitude of the second vector.
- Mathematically, the dot product of vectors $\vec{A}$ and $\vec{B}$ is given by:
  $$
  \vec{A} \cdot \vec{B} = |\vec{A}| \cdot |\vec{B}| \cdot \cos(\theta)
  $$
  - $|\vec{A}|$: Magnitude of vector $\vec{A}$.
  - $|\vec{B}|$: Magnitude of vector $\vec{B}$.
  - $\theta$: Angle between vectors $\vec{A}$ and $\vec{B}$.

- In this equation:
  - $|\vec{A}| \cdot \cos(\theta)$ is the **projection** of $\vec{A}$ onto $\vec{B}$.

---

#### **Simplification for SVM**
- In SVM, we only need the **projection** of $\vec{A}$, not the magnitude of $\vec{B}$.
- To get just the projection, use the **unit vector** of $\vec{B}$ (denoted as $\hat{B}$), which has a magnitude of 1.
- Therefore, the equation becomes:
  $$
  \vec{A} \cdot \hat{B} = |\vec{A}| \cdot \cos(\theta)
  $$

---

### **Use of Dot Product in SVM**

#### **Determining Position Relative to a Hyperplane**
- Suppose we have a point $\vec{X}$ and want to determine if it is on the right side or the left side of a hyperplane.
- **Steps**:
  1. Treat $\vec{X}$ as a vector.
  2. Create a vector $\vec{w}$ that is **perpendicular** to the hyperplane.
  3. Let $c$ be the distance from the origin to the decision boundary along $\vec{w}$.
  4. Compute the projection of $\vec{X}$ on $\vec{w}$ using the dot product: $\vec{X} \cdot \vec{w}$.
     - If $\vec{X} \cdot \vec{w} > c$: Point is on the **right** side.
     - If $\vec{X} \cdot \vec{w} < c$: Point is on the **left** side.
     - If $\vec{X} \cdot \vec{w} = c$: Point lies **on** the decision boundary.

---

#### **Why Use a Perpendicular Vector in SVM?**
- The aim is to determine the **distance** of a point $\vec{X}$ from the decision boundary.
- Since there are infinite points on the boundary, a standard reference is needed.
- A **perpendicular vector** to the hyperplane is used as a reference for projections.
- Projections of other points on this perpendicular vector are compared to find their relative positions.

---

#### **Equation of a Hyperplane in SVM**
- SVM involves optimizing the **margin**, the distance between support vectors and the hyperplane.
- The next step will be to derive the **hyperplane equation** and understand what needs to be optimized.


## What is a Support Vector Machine (SVM)?

- **SVM** is a **supervised machine learning algorithm** that finds a **hyperplane** to best separate two classes.
  
#### Key Difference Between SVM and Logistic Regression
- **Logistic Regression**: A **probabilistic** approach.
- **SVM**: A **statistical** approach, focusing on maximizing the margin between classes.

#### Choosing the Best Hyperplane
- There are multiple possible hyperplanes that can separate two classes.
- SVM selects the **optimal hyperplane** by maximizing the **margin**.
  - The margin is the maximum distance between the hyperplane and the closest data points from each class.

#### Goal of SVM
- To find a hyperplane that results in the **largest possible margin**, ensuring the best separation between the two classes.


## Logistic Regression vs Support Vector Machine (SVM)

- **Choosing Between Logistic Regression and SVM**:
  - The decision depends on the **number of features** in your dataset.
  - Generally, **Logistic Regression** is a good starting point.


- **When to Use SVM**:
  - **Small and Complex Datasets**: SVM is often more effective.
  - If Logistic Regression does not provide good accuracy, try **SVM without a kernel**.
    - SVM without a kernel performs similarly to Logistic Regression but may handle complex relationships better depending on the features.


## Types of Support Vector Machine (SVM) Algorithms


- **Linear SVM**:
  - Used when data is **perfectly linearly separable**.
  - In 2D, data points can be classified using a **single straight line**.


- **Non-Linear SVM**:
  - Applied when data is **not linearly separable**.
  - Utilizes advanced techniques like **kernel tricks** to handle complex data.
  - Real-world data is often **non-linearly separable**, requiring these techniques.


## Important Terms


- **Support Vectors**:
  - Points that are **closest to the hyperplane**.
  - They help in defining the **separating line**.


- **Margin**:
  - The **distance between the hyperplane** and the nearest observations (support vectors).
  - A **large margin** is preferred in SVM for better classification.
  - There are two types of margins:
    - **Hard Margin**: Strict separation with no misclassifications.
    - **Soft Margin**: Allows some misclassifications for better generalization.


![567891.webp](attachment:567891.webp)


## Key Definitions

- **Maximum Margin**: The largest distance between the separating hyperplane and the closest data points from either class.

- **Positive Hyperplane**: A hyperplane that passes through the support vectors of the positive class.

- **Support Vector**: Data points that are closest to the hyperplane and influence its position and orientation.

- **Maximum Margin Hyperplane**: The optimal hyperplane that maximizes the margin between the two classes.

- **Negative Hyperplane**: A hyperplane that passes through the support vectors of the negative class.


## How Does Support Vector Machine Work?

SVM focuses on the **support vectors** (the closest data points to the hyperplane) to define the optimal hyperplane. Unlike **logistic regression**, which uses all data points, SVM uses only the support vectors, resulting in faster computation.

### Example:
Suppose we have a dataset with two classes: **green** and **blue**. The goal is to classify a new data point as either **green** or **blue**. 

SVM will find a hyperplane that best separates the two classes by maximizing the margin (the distance between the hyperplane and the closest points of each class). This allows SVM to create a decision boundary with the maximum separation, ensuring the best classification performance.

![467902-1.webp](attachment:467902-1.webp)


## Finding the Best Decision Boundary in SVM

To classify data points, there can be many possible decision boundaries. The question is: which one is the best? 

- In **2D**, the decision boundary is a straight line. 
- In **higher dimensions**, the decision boundary is referred to as a **hyperplane**.


![492453.webp](attachment:492453.webp)

The best decision boundary is the one that maximizes the **margin** — the distance between the boundary and the closest points from each class (the **support vectors**). This maximization ensures the best separation between the classes, providing a more accurate classifier.



![729834.webp](attachment:729834.webp)

## Mathematical Intuition Behind Support Vector Machine

### Understanding Dot-Product

A **vector** is a quantity that has both magnitude and direction. Just like numbers, vectors can be added, subtracted, or multiplied. In this section, we focus on the multiplication of vectors, which can be done in two ways: **dot product** and **cross product**.

- The **dot product** results in a **scalar** value.
- The **cross product** results in a **vector**.

The dot product can be understood as the projection of one vector onto another, multiplied by the magnitude of the second vector. Mathematically, the dot product of two vectors **A** and **B** is given by:

$$
A \cdot B = |A| |B| \cos \theta
$$

Where:
- $|A|$ and $|B|$ are the magnitudes of vectors **A** and **B**, respectively.
- $\theta$ is the angle between the two vectors.

In the context of SVM, we are often interested in the **projection** of one vector onto another, which is why the dot product plays an essential role in determining the distance from a point to the decision boundary (hyperplane).


![204495.webp](attachment:204495.webp)

### Dot Product Calculation

Given two vectors **A** and **B**, to find their dot product, we first need to determine the magnitudes of both vectors. To do this, we use the **Pythagorean theorem** or the **distance formula**. Once we have the magnitudes, we multiply them by the cosine of the angle between the vectors. Mathematically, the dot product is expressed as:

$$
A \cdot B = |A| \cdot |B| \cdot \cos \theta
$$

Where:
- $|A|$ and $|B|$ are the magnitudes of vectors **A** and **B**, respectively.
- $\theta$ is the angle between the vectors.

Here, $|A| \cdot \cos \theta$ represents the **projection** of vector **A** onto vector **B**. This is the key quantity we're interested in when working with SVM.

Now, in **SVM**, we are only concerned with the **projection** of vector **A** onto **B**, and we don't need the full magnitude of vector **B**. To simplify, we can use the **unit vector** of **B** (a vector in the same direction as **B**, but with magnitude 1). This leads to the modified formula:

$$
A \cdot B = |A| \cdot \cos \theta \cdot \hat{B}
$$

Where:
- $\hat{B}$ is the **unit vector** of **B**.

### Use of Dot Product in SVM

Now, let's consider a random point **X** and want to determine whether it lies on the right side or the left side of the hyperplane (the decision boundary). This is a key question when classifying data in SVM.

To determine this, we assume:
- The point **X** is represented as a vector.
- We have a vector **w**, which is perpendicular (normal) to the hyperplane.

The goal is to find out whether the point lies on the positive side or negative side of the hyperplane, which is done by checking the **projection** of **X** onto **w** (the normal vector to the hyperplane).

This is where the dot product is useful, as it gives the projection of the point **X** onto the vector **w**.
![885076.webp](attachment:885076.webp)

### Using Dot Product for Classification in SVM
![947387.webp](attachment:947387.webp)

To determine the position of a point **X** relative to the hyperplane, we proceed with the following steps:

1. **Assume Point X as a Vector**: The given point **X** is represented as a vector in the feature space.
   
2. **Create Perpendicular Vector (w)**: We define a vector **w**, which is **perpendicular** to the hyperplane (this is the normal vector to the hyperplane).

3. **Distance from the Origin**: Let's assume the distance of the vector **w** from the origin to the decision boundary is denoted by **c** (this is a scalar value that helps to define the margin of separation between the two classes).

4. **Projection of X on w**: To determine whether the point **X** lies on the right side or left side of the hyperplane, we calculate the **projection** of vector **X** onto vector **w**. This projection is obtained using the **dot product**:

$$
X \cdot w
$$

The dot product of **X** and **w** gives a scalar value that tells us how far **X** is from the hyperplane. 

- If the projection $X \cdot w$ is **equal to c**, point **X** lies exactly on the decision boundary (hyperplane).
- If the projection $X \cdot w$ is **greater than c**, point **X** lies on one side of the hyperplane (for example, the positive class).
- If the projection $X \cdot w$ is **less than c**, point **X** lies on the other side of the hyperplane (for example, the negative class).


Thus, the dot product helps to measure the position of **X** relative to the hyperplane and classify it into one of the two classes.



![328308.webp](attachment:328308.webp)

### Why Perpendicular Vector (w) in SVM?

We choose the vector **w** to be perpendicular to the hyperplane to calculate the **distance** of a data point **X** from the decision boundary. Since there are infinite points on the boundary, using a perpendicular vector **w** standardizes the calculation. By projecting all data points onto **w**, we can consistently measure their distances and determine which side of the hyperplane they lie on.


## Margin in Support Vector Machine (SVM)

In SVM, the equation of a hyperplane is given by:

$$
w \cdot x + b = 0
$$

Where **w** is a vector normal (perpendicular) to the hyperplane, and **b** is an offset.

![677668.1.webp](attachment:677668.1.webp)

To classify a point as negative or positive, we define a **decision rule** based on the distance of the point from the hyperplane. This margin is the distance between the hyperplane and the closest data points (support vectors). SVM aims to maximize this margin for better classification performance.



![555649.webp](attachment:555649.webp)

### Margin and Classification in SVM

In SVM, if the value of the equation **w · x + b > 0**, we classify the point as **positive**, otherwise, it is **negative**.

To maximize the margin, we need to find the values of **w** and **b** such that the distance between the hyperplane and the closest support vectors is maximized. Let's call this distance **d**.

The objective of SVM is to find **w** and **b** that maximize this margin **d** to ensure better classification performance.
![844918.2.webp](attachment:844918.2.webp)

### Optimization Function and Its Constraints in SVM

In Support Vector Machines (SVM), the goal is to find the hyperplane that maximizes the margin between two classes while minimizing classification errors. This can be formulated as an **optimization problem** with constraints.

#### 1. **Optimization Function**:
The objective of SVM is to maximize the margin $d$ between the two classes. This is equivalent to minimizing the following cost function:

$ 
\text{Minimize} \quad \frac{1}{2} \|w\|^2
$

Where:
- $w$ is the weight vector normal to the hyperplane.
- The factor $\frac{1}{2}$ is used for simplification in the derivative.

#### 2. **Constraints**:
For the classifier to work properly, the constraints ensure that all data points are correctly classified with respect to the hyperplane.

- For a positive class point $x_i$, we require:

$
w \cdot x_i + b \geq 1
$

- For a negative class point $x_i$, we require:

$
w \cdot x_i + b \leq -1
$

These constraints enforce that each data point lies on the correct side of the margin.

#### 3. **Final Formulation**:
The optimization problem for SVM can be expressed as:

$
\text{Minimize} \quad \frac{1}{2} \|w\|^2
$

Subject to:

$
y_i (w \cdot x_i + b) \geq 1 \quad \text{for all} \quad i = 1, 2, ..., n
$

Where:
- $y_i$ is the label of the data point $x_i$ (either $+1$ or $-1$).
- $n$ is the number of data points.
- $w$ and $b$ are the parameters of the hyperplane.

#### 4. **Dual Form**:
SVM can be solved efficiently using the **dual form** by introducing Lagrange multipliers $\alpha_i$ to convert the problem into a form easier to optimize.

The dual problem is formulated as:

$
\text{Maximize} \quad \sum_{i=1}^n \alpha_i - \frac{1}{2} \sum_{i,j=1}^n \alpha_i \alpha_j y_i y_j (x_i \cdot x_j)
$

Subject to:

$
\sum_{i=1}^n \alpha_i y_i = 0
$

$
0 \leq \alpha_i \leq C \quad \text{for all} \quad i = 1, 2, ..., n
$

Where $C$ is the regularization parameter that controls the trade-off between margin maximization and classification error.

#### 5. **Conclusion**:
The optimization in SVM focuses on finding the optimal values of $w$ and $b$ that maximize the margin while satisfying the constraints for correct classification. This can be done efficiently using the dual form and Lagrange multipliers.
