# **Day 5: Support Vector Machine (SVM)**
## **1. What is SVM?**

Support Vector Machine is a supervised ML algorithm used for classification and regression.
It tries to find the best separating boundary (hyperplane) between classes.

## **2. Hyperplane**

A hyperplane is a decision boundary that separates different classes.

- In 2D, it’s a line
- In 3D, it’s a plane
- In higher dimensions, it’s a hyperplane

SVM chooses the hyperplane that best separates the data with maximum margin.

## **3. Margin**

The margin is the distance between the hyperplane and the closest data points (support vectors).

- Large margin → better generalization → less overfitting
- Small margin → unstable model

SVM aims to maximize this margin.

## **4. Support Vectors**

These are the critical data points closest to the boundary.
Removing them changes the decision boundary.

They are the “most powerful” points controlling the classifier.

## **5. Types of SVM**

- Linear SVM → when data is linearly separable
- Non-Linear SVM → use kernel trick when data cannot be separated by a straight line

## **6. Kernels (Kernel Trick)**

Kernels help SVM handle non-linear data by mapping it to a higher-dimensional space *without explicitly creating those dimensions.*

**Common Kernels**

- **Linear Kernel** → for linearly separable data       
- **Polynomial Kernel** → curved boundaries
- **RBF (Gaussian Kernel)** → very powerful for complex patterns
Sigmoid Kernel → behaves like neural network activation

## **7. Why Kernel Trick?**

Instead of manually transforming data to higher dimensions, the kernel function:

- Computes similarity between points in high-D space
- Makes non-linear separation possible
- Avoids computational cost of explicit transformation

## **8. Soft Margin vs Hard Margin**

- **Hard Margin** → no misclassifications allowed
    - Used when data is perfectly separable

- **Soft Margin** → allows some misclassifications
    - Used in real-world noisy data
    - Controlled by parameter C

## **9. Parameter C**
- Large C → tries to classify everything correctly → risk of overfitting
- Small C → allows some errors → better generalization

## **10. When to Use SVM?**
- When data is high-dimensional
- When classes are well-separated
- hen non-linear patterns exist (use kernel)
- Works well for:
    - Text classification
    - Image classification
    - Bioinformatics

In [100]:
import pandas as pd
from sklearn.datasets import load_iris

In [101]:
iris=load_iris()
dir(iris)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [102]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [103]:
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [104]:
iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [105]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [106]:
df['target']=iris.target

In [107]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [108]:
X=df.drop('target',axis=1)
y=df['target']

In [109]:
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1)

In [110]:
from sklearn.svm import SVC

model=SVC()

In [111]:
model.fit(X_train,y_train)

0,1,2
,C,1.0
,kernel,'rbf'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [112]:
model.score(X_test,y_test)

0.9666666666666667

In [113]:
model.score(X_train,y_train)

0.975

In [114]:
from sklearn.metrics import classification_report

y_pred=model.predict(X_test)
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       1.00      0.92      0.96        13
           2       0.86      1.00      0.92         6

    accuracy                           0.97        30
   macro avg       0.95      0.97      0.96        30
weighted avg       0.97      0.97      0.97        30

