# Support Vector Regression

## Table of Contents
1. What is Support Vector Machine?
2. How does it work?
3. Pros and Cons associated with SVM
4. Whether Feature Scaling is required?
5. Kernel Trick
6. Impact of Missing Values?
7. Overfitting And Underfitting
8. Important point and Parameters

## Types of Problems it can solve(Supervised)
1. Classification
2. Regression

## What is Support Vector Machine?

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However,  it is mostly used in classification problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well

![SVM_1.webp](attachment:SVM_1.webp)

## How does it work?

### Let’s understand:

#### 1. Identify the right hyper-plane (Scenario-1):
Here, we have three hyper-planes (A, B and C). Now, identify the right hyper-plane to classify star and circle.
SVM_2You need to remember a thumb rule to identify the right hyper-plane: 

![SVM_21.webp](attachment:SVM_21.webp)

“Select the hyper-plane which segregates the two classes better”. In this scenario, hyper-plane “B” has excellently performed this job.


### 2. Identify the right hyper-plane (Scenario-2):
Here, we have three hyper-planes (A, B and C) and all are segregating the classes well. Now, How can we identify the right hyper-plane?
Here, maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as Margin.


![SVM_3.webp](attachment:SVM_3.webp)

Above, you can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-classification.


### 3. Identify the right hyper-plane (Scenario-3):
Hint: Use the rules as discussed in previous section to identify the right hyper-plane

![SVM_5.webp](attachment:SVM_5.webp)

Some of you may have selected the hyper-plane B as it has higher margin compared to A. But, here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin. Here, hyper-plane B has a classification error and A has classified all correctly. Therefore, the right hyper-plane is A.

### 4. Can we classify two classes (Scenario-4)?:
Below, I am unable to segregate the two classes using a straight line, as one of the stars lies in the territory of other(circle) class as an outlier. 



![SVM_61.webp](attachment:SVM_61.webp)


The SVM algorithm has a feature to ignore outliers and find the hyper-plane that has the maximum margin. Hence, we can say, SVM classification is robust to outliers.

![SVM_71.webp](attachment:SVM_71.webp)

## Advantages

1. SVM is more effective in high dimensional spaces.
2. SVM is relatively memory efficient.
3. SVM’s are very good when we have no idea on the data.
4. Works well with even unstructured and semi structured data like text, Images and trees.
5. The kernel trick is real strength of SVM. With an appropriate kernel function, we can solve any complex problem.
6. SVM models have generalization in practice, the risk of over-fitting is less in SVM.
7. It works really well with a clear margin of separation


## Disadvantages
1. More Training Time is required for larger dataset
2. It is difficult to choose a good kernel function 
3. The SVM hyper parameters are Cost -C and gamma. It is not that easy to fine-tune these hyper-parameters. It is hard to visualize their impact
4. It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping

### Whether Feature Scaling is required?
Yes

## The are two main types of classification SVM algorithms Hard Margin and Soft Margin:
### 1. Hard Margin:
Aims to find the best hyperplane without tolerating any form of misclassification.
### 2. Soft Margin:
We add a degree of tolerance in SVM. In this way we allow the model to voluntary misclassify a few data points if that can lead to identifying a hyperplane able to generalise better to unseen data.
Soft Margin SVM can be implemented in Scikit-Learn by adding a C penalty term in svm.SVC. The bigger C and the more penalty the algorithm gets when making a misclassification.

## Kernel Trick
If the data we are working with is not linearly separable (therefore leading to poor linear SVM classification results), it is possible to apply a technique known as the Kernel Trick. This method is able to map our non-linear separable data into a higher dimensional space, making our data linearly separable. Using this new dimensional space SVM can then be easily implemented

![1_zWzeMGyCc7KvGD9X8lwlnQ.png](attachment:1_zWzeMGyCc7KvGD9X8lwlnQ.png)

There are many different types of Kernels which can be used to create this higher dimensional space,some examples are 
1. linear
2. polynomial 
3. Sigmoid
4. Radial Basis Function (RBF)

In Scikit-Learn a Kernel function can be specified by adding a kernel parameter in svm.SVC. An additional parameter called gamma can be included to specify the influence of the kernel on the model.

In Scikit-Learn a Kernel function can be specified by adding a kernel parameter in svm.SVC. An additional parameter called gamma can be included to specify the influence of the kernel on the model.

When working with a large amount of data using RBF, speed might become a constraint to take into account.

## Impact of Missing Values?¶
Although SVMs are an attractive option when constructing a classifier, SVMs do not easily accommodate missing covariate information. Similar to other prediction and classification methods, in-attention to missing data when constructing an SVM can impact the accuracy and utility of the resulting classifier

## Overfitting And Underfitting

In SVM, to avoid overfitting, we choose a Soft Margin, instead of a Hard one i.e. we let some data points enter our margin intentionally (but we still penalize it) so that our classifier don't overfit on our training sample

## Important point and Parameters

class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)

### 1. Cfloat, default=1.0
Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

### 2. kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’
Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).

### 3. degreeint, default=3
Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

### 4. gamma{‘scale’, ‘auto’} or float, default=’scale’
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

if ‘auto’, uses 1 / n_features.

### 5. coef0float, default=0.0
Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.