<a href="https://colab.research.google.com/github/hussain0048/Machine-Learning/blob/master/Support_Vector_Machine_(SVM)_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machine (SVM) 

**Introduction:**

SVMs are the most popular algorithm for classification in machine learning algorithms. Their mathematical background is quintessential in building the foundational block for the geometrical distinction between the two classes.[1] 


![](https://drive.google.com/uc?export=view&id=17EEgUZ0z3lMOLOZoU4BS2EY-iIOgEBpI)

In [None]:
!git clone https://github.com/hussain0048/Machine-Learning.git

**What is SVM?**

Support Vector Machines are a type of supervised machine learning algorithm that provides analysis of data for classification and regression analysis. While they can be used for regression, SVM is mostly used for classification. We carry out plotting in the n-dimensional space. Value of each feature is also the value of the specific coordinate. Then, we find the ideal hyperplane that differentiates between the two classes.

These support vectors are the coordinate representations of individual observation. It is a frontier method for segregating the two classes [1]

![](https://drive.google.com/uc?export=view&id=1cWAgdofFTVDzY5GQl9X8zp14ZnS13Ol8)

**How does SVM work?**

The basic principle behind the working of Support vector machines is simple – Create a hyperplane that separates the dataset into classes. Let us start with a sample problem. Suppose that for a given dataset, you have to classify red triangles from blue circles. Your goal is to create a line that classifies the data into two classes, creating a distinction between red triangles and blue circles.

![](https://drive.google.com/uc?export=view&id=1i-PtkBz_aGEsSNJhcfvIzrJnrriRFgqK)

While one can hypothesize a clear line that separates the two classes, there can be many lines that can do this job. Therefore, there is not a single line that you can agree on which can perform this task. Let us visualize some of the lines that can differentiate between the two classes as follows –

![](https://drive.google.com/uc?export=view&id=1FJcF0cs2fDfINXZ3uAJ-tYxaKJ5EOqBz)

In the above visualizations, we have a green line and a red line. Which one do you think would better differentiate the data into two classes? If you choose the red line, then it is the ideal line that partitions the two classes properly. However, we still have not concretized the fact that it is the universal line that would classify our data most efficiently.

The green line cannot be the ideal line as it lies too close to the red class. Therefore, it does not provide a proper generalization which is our end goal.

According to SVM, we have to find the points that lie closest to both the classes. These points are known as **support vectors**. In the next step, we find the proximity between our dividing plane and the support vectors. The distance between the points and the dividing line is known as **margin**. The aim of an SVM algorithm is to maximize this very margin. When the margin reaches its maximum, the hyperplane becomes the optimal one.

![](https://drive.google.com/uc?export=view&id=1M8XNNcRVa0eVsmzEWZyYsYg9qYRZXU3F)

The SVM model tries to enlarge the distance between the two classes by creating a well-defined decision boundary. In the above case, our hyperplane divided the data. While our data was in 2 dimensions, the hyperplane was of 1 dimension. For higher dimensions, say, an n-dimensional Euclidean Space, we have an n-1 dimensional subset that divides the space into two disconnected components.

#**2-How to implement SVM in Python?**

## **2.1 - Importing necessary libraries**

In [2]:
import pandas as pd
import numpy as np                            #DataFlair
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
%pylab inline

Populating the interactive namespace from numpy and matplotlib


## **2.2 - Load Datasets**
In the second step of implementation of SVM in Python, we will use the iris dataset that is available with the load_iris() method. We will only make use of the petal length and width in this analysis.


In [None]:
pylab.rcParams['figure.figsize'] = (10, 6)
iris_data = datasets.load_iris()
# We'll use the petal length and width only for this analysis
X = iris_data.data[:, [2, 3]]
y = iris_data.target
# Input the iris data into the pandas dataframe
iris_dataframe = pd.DataFrame(iris_data.data[:, [2, 3]],
                  columns=iris_data.feature_names[2:])
# View the first 5 rows of the data
print(iris_dataframe.head())
# Print the unique labels of the dataset
print('\n' + 'Unique Labels contained in this data are '
     + str(np.unique(y)))

## **2.3 - Splitting Data Into Train/Test Sets**
In the next step, we will split our data into training and test set using the train_test_split() function as follows –

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
print('The training set contains {} samples and the test set contains {} samples'.format(X_train.shape[0], X_test.shape[0]))

## **2.4 - Visualizing Data**
Let us now visualize our data. We observe that one of the classes is linearly separable.


In [None]:
markers = ('x', 's', 'o')
colors = ('red', 'blue', 'green')
cmap = ListedColormap(colors[:len(np.unique(y_test))])
for idx, cl in enumerate(np.unique(y)):
    plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
           c=cmap(idx), marker=markers[idx], label=cl)

## **2.5-Data Scalling**
Then, we will perform scaling on our data. Scaling will ensure that all of our data-values lie on a common range such that there are no extreme values.

In [None]:
standard_scaler = StandardScaler()
#DataFlair
standard_scaler.fit(X_train)
X_train_standard = standard_scaler.transform(X_train)
X_test_standard = standard_scaler.transform(X_test)
print('The first five rows after standardisation look like this:\n')
print(pd.DataFrame(X_train_standard, columns=iris_dataframe.columns).head())

##**2.6- Fitting SVM Model**
After we have pre-processed our data, the next step is the implementation of the SVM model as follows. We will make use of the SVC function provided to us by the sklearn library. In this instance, we will select our kernel as ‘rbf’.

In [None]:
#DataFlair
SVM = SVC(kernel='rbf', random_state=0, gamma=.10, C=1.0)
SVM.fit(X_train_standard, y_train)
print('Accuracy of our SVM model on the training data is {:.2f} out of 1'.format(SVM.score(X_train_standard, y_train)))
print('Accuracy of our SVM model on the test data is {:.2f} out of 1'.format(SVM.score(X_test_standard, y_test)))


## 7-Evaluating Trained Model On Test Data ##
Almost all models in Scikit-Learn API provides predict() method which can be used to predict the target variables on Test Set passed to it. Most of the models also provide score() method which generally returns accuracy in the case of classification models. We'll utilize both methods below to compare results on test data.

The majority of classifiers in scikit-learn also provide the predict_proba() method which can be used to see probability generated by the model for each class of classification task.

In [None]:
 Y_preds = classifier.predict(X_test)
print(Y_preds)
print(Y_test)
print('Accuracy : %.3f'%(Y_preds == Y_test).mean() )
print('Accuracy : %.3f'%classifier.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.

In [None]:
print(classifier.predict_proba(X_test)[:10])  ## It returns probability predicted by model for each class for each example.

As we discussed above, logistic regression tries to generate lines through data to separate classes. We can access coordinates of those lines through coef_ and intercept_ attributes of classifier. In the case of binary classification, only 1 line separating both classes is generated. But in our case which consists of 3 classes, there are 3 lines generated separating each class from the other 2 classes

In [None]:
print('Weight Coefficients : '+str(classifier.coef_))
print('Y-axis Intercept : '+str(classifier.intercept_))

## 8 -Visualizing Prediction Results On Test Data ##

Below we are trying to visualize how our model performed on test data by plotting scatter chart of sepal length vs petal width and color-encoding them with flower class.


In [None]:
with plt.style.context(('ggplot','seaborn')):
    plt.figure(figsize=(12,5))
    plt.subplot(121)
    for i,c in [(0,'red'),(1,'green'),(2,'blue')]:
        plt.scatter(X_test[Y_test==i,0],X_test[Y_test==i,3], c=c, s=40, marker='s', label=iris.target_names[i])
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[3])
    plt.legend(loc='best')
    plt.title('Actual')

    plt.subplot(122)
    for i,c in [(0,'red'),(1,'green'),(2,'blue')]:
        plt.scatter(X_test[Y_preds==i,0],X_test[Y_preds==i,3], c=c, s=40, marker='s', label=iris.target_names[i])
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[3])
    plt.legend(loc='best')
    plt.title('Prediction');

## 9- Finetuning Model By Doing Grid Search On Various Hyperparameters

Below are list of hypterparameters that we can tune to get best estimator for our data. 
1. penalty - Penalty to be used in model to penalize weights to avoid over-fitting and under-fitting. It accepts string like l1, l2, elasticnet, and none. elasticnet refers to using both l1 and l2 in proportion. default=l2

2. fit_intercept - It's boolean value referring whether to include intercept in model or not ( y=mx+c  - here c is referring to intercept).default=True
3. C - It's inverse of regularization strength(1/ α  whereas  α  is regularization strength in our cost function). default=1.0
4. solver - Algorithms for optimization. It accepts string from list ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'] default=liblinear 
5. l1_ratio - When penalty is elasticnet then this parameter helps in determining proportion of l1 & l2 penalties. It accepts float(0.0-1.0] or None value. l1_ratio=0 is equivalent to using penalty=l2. l1_ratio=1 is equivalent to using penalty=l1. default=None
**GridSearchCV**

It's a wrapper class provided by sklearn which loops through all parameters provided as params_grid parameter with a number of cross-validation folds provided as cv parameter, evaluates model performance on all combinations and stores all results in cv_results_ attribute. It also stores model which performs best in all cross-validation folds in best_estimator_ attribute and best score in best_score_ attribute.

Note: n_jobs parameter is provided by many estimators. It accepts a number of cores to use for parallelization. If the value of -1 is given then it uses all cores. We are also using %%time which jupyter notebook cell magic command which prints time taken by that cell to complete running. Time will be different on different computers based on their configurations.

Below we are trying liblinear solver for our purpose. We can only use penalties l2, l1 with this algorithm. It works faster for small datasets.

In [None]:
from sklearn.model_selection import GridSearchCV

params = {'penalty' : ['l1', 'l2',],
         'fit_intercept': [True, False],
         'C': np.linspace(0.1,1.0,10)}

grid = GridSearchCV(LogisticRegression(random_state=1, n_jobs=-1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)

print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)

Train Accuracy : 0.975
Test Accuracy : 0.967
Best Score Through Grid Search : 0.975
Best Parameters :  {'C': 0.4, 'fit_intercept': False, 'penalty': 'l2'}


## 10 - Printing First Few Cross-Validation Results##
GridSearchCV object maintains all different parameters tried and results generated for each split of data in an attribute cv_results_ as a dictionary. Below we are loading that cross-validation results as pandas dataframe and printing first few entries.

In [None]:
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.

Below we are trying saga solver for our purpose. We can only use penalties l2, l1, elasticnet or no penalty(none) with this algorithm. It's the only algorithm which supports elasticnet penalty. It works faster for large datasets.# New Section

In [None]:
%%time

params = {'penalty' : ['l1', 'l2','elasticnet', 'none'],
         'fit_intercept': [True, False],
         'C': np.linspace(0.1,1.0,10),
         'l1_ratio': np.linspace(0.1,1.0,10)}

grid = GridSearchCV(LogisticRegression(random_state=1, solver='saga', n_jobs=-1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)

print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)

**Printing First Few Cross Validation Results**

In [None]:
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.

Below we are trying sag solver for our purpose. We can only use penalty l2 or no penalty(none) with this algorithm. It works faster for large datasets.

In [None]:
%%time

params = {'penalty' : ['l2', 'none'],
         'fit_intercept': [True, False],
         'C': np.linspace(0.1,1.0,10),
         'l1_ratio': np.linspace(0.1,1.0,10)}

grid = GridSearchCV(LogisticRegression(random_state=1, solver='sag', n_jobs=-1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)

print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)

**Printing First Few Cross Validation Results**

In [None]:
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.

Below we are trying lbfgs solver for our purpose. We can only use penalty l2 or no penalty(none) with this algorithm.

In [None]:
%%time

params = {'penalty' : ['l2','none'],
         'fit_intercept': [True, False],
         'C': np.linspace(0.1,1.0,10),
         'l1_ratio': np.linspace(0.1,1.0,10)}

grid = GridSearchCV(LogisticRegression(random_state=1, solver='lbfgs', n_jobs=-1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)

print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)

**Printing First Few Cross Validation Results**

In [None]:
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.

Below we are trying newton-cg solver for our purpose. We can only use penalty l2 or no penalty(none) with this algorithm

In [None]:
%%time

params = {'penalty' : ['l2','none'],
         'fit_intercept': [True, False],
         'C': np.linspace(0.1,1.0,10),
         'l1_ratio': np.linspace(0.1,1.0,10)}

grid = GridSearchCV(LogisticRegression(random_state=1, solver='newton-cg', n_jobs=-1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)

print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)

Train Accuracy : 0.975
Test Accuracy : 0.967
Best Score Through Grid Search : 0.975
Best Parameters :  {'C': 0.4, 'fit_intercept': False, 'l1_ratio': 0.1, 'penalty': 'l2'}
CPU times: user 2.01 s, sys: 80.2 ms, total: 2.09 s
Wall time: 1min 2s


  "(penalty={})".format(self.penalty))


**Printing First Few Cross Validation Results**

In [None]:
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.

## 11 - K-Nearest Neighbors##

 K-nearest neighbor is one of the simplest algorithms which maintains all points from the train dataset and class to which it belongs. Later on, whenever a new unknown point comes for prediction it checks a predefined number of points nearer to that new point and based on majority class it assigns that majority class to a new point.n_neighbors is used to set the number of neighbors to check for predicting class for new unseen points.

### 11.1 Initializing Model

In [None]:
 from sklearn.neighbors import KNeighborsClassifier
knn_classifier = KNeighborsClassifier(n_neighbors=5, n_jobs=-1)
knn_classifier

### 11.2 Fitting Model To Train Data

In [None]:
knn_classifier.fit(X_train,Y_train)

### 11.3 - Evaluating Trained Model On Test Data.###
    

In [None]:
Y_preds = knn_classifier.predict(X_test)
print(Y_preds)
print(Y_test)
print('Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Accuracy : %.3f'%knn_classifier.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.

In [None]:
print(knn_classifier.predict_proba(X_test)[:10]) ## It returns probability predicted by model for each class for each example.

### 11.4 Visualizing Prediction Results On Test Data

In [None]:
with plt.style.context(('ggplot','seaborn')):
    plt.figure(figsize=(12,5))
    plt.subplot(121)
    for i,c in [(0,'red'),(1,'green'),(2,'blue')]:
        plt.scatter(X_test[Y_test==i,0],X_test[Y_test==i,3], c=c, s=40, marker='s', label=iris.target_names[i])
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[3])
    plt.legend(loc='best')
    plt.title('Actual')

    plt.subplot(122)
    for i,c in [(0,'red'),(1,'green'),(2,'blue')]:
        plt.scatter(X_test[Y_preds==i,0],X_test[Y_preds==i,3], c=c, s=40, marker='s', label=iris.target_names[i])
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[3])
    plt.legend(loc='best')
    plt.title('Prediction');

###11.5 -Finetuning Model By Doing Grid Search On Various Hyperparameters.#
Below are list of hypterparameters that we can tune to get best estimator for our data.

**n_neighbors** - Number of neighbors to use to determine class of target. default=5

**algorithm** - Algorithm for finding nearest neighbors. It takes one of the values from list [ball_tree, kd_tree, brute, auto]. default=auto

**leaf_size** - Leaf size of KDTree and BallTree. It controls speed of construction and quer of tree as well as memory requirement of tree.default=30

In [None]:
%%time

params = {'n_neighbors' : np.arange(1,10),
         'leaf_size': np.arange(5,50,5),
         'algorithm': ['ball_tree', 'kd_tree', 'brute', 'auto']}

grid = GridSearchCV(KNeighborsClassifier(n_jobs=-1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)

print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)

### 11.6 Printing First Few Cross Validation Results

In [None]:
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.

# **Refences**
[1] Support Vector Machines Tutorial

https://data-flair.training/blogs/svm-support-vector-machine-tutorial/?fbclid=IwAR0WAHSGp4wFaVpT38IfpQXsHTgSzM8ziTkrjaXGQtzAPmbQy9oMcDjrRvE

How to insert an inline image in Google Colaboratory from Google Drive

https://stackoverflow.com/questions/50670920/how-to-insert-an-inline-image-in-google-colaboratory-from-google-drive