# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

### Not for Grading

### Learning Objectives:

At the end of the experiment, you will be able to:

* undertand how non-linear separable data can be visualized linearly in a higher dimensional space 

### Dataset

#### Description

In this experiment, we will use make_circles() dataset from sklearn. This make_circles() function generates a binary classification problem with datasets that fall into concentric circles. This function is suitable for algorithms that can learn complex non-linear manifolds


In [None]:
!  wget https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/social_advertising.csv

### Import required packages

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_circles
from sklearn.svm import SVC
from mlxtend.plotting import plot_decision_regions

### Load and Visualize the Data

Load the data from the SKlearn datasets

In [None]:
# The number of points generated is 100 
# The scale factor between inner and outer circle is 0.1. Inner circle is one class and outer circle is another class.
# The Standard deviation of Gaussian noise added to the data is 0.1

X, y = make_circles(100, factor = .1, noise = .1)

To get a sense of the data, let us visualize the data

In [None]:
# c is the color sequence which assigns colors based on the no.of labels
# s is the marker size in the plot
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')

### Try to separate the data by applying SVM linear classifier



Apply the SVM classifier and try to fit the model using Linear Kernel

In [None]:
clf_linear = SVC(kernel='linear').fit(X, y)

Let us visualize the decision boundaries of the data

In [None]:
plot_decision_regions(X, y, clf_linear, legend = 1)

From the above plot, observe that the data points are not linearly seperable by using linear SVM model. 

One strategy for separting the classess is to compute a **basis function** centered at every point in the dataset

### How to work with non-linear separable data in SVM?

* Transform a two-dimensional dataset onto a new three-dimensional feature space (higher dimensional space) via a mapping function where the classes become separable

Mapping Function (Radial basis function) 

* The Radial basis function is commonly used in support vector machine classification. RBF can map an input space in infinite dimensional space.
* By using Radial basis function add one more dimension to the original data to visualize the data linearly in high dimensional space
* Below is the formula to compute RBF function. The gamma value ranges between 0 to 1. Here take gamma = 1

   $K(X, X_i) = exp(-\gamma * \sum(X-X_i)^2)$

In [None]:
# Radial Basis Function where gamma = 1
rbf = np.exp(-np.sum((X - np.mean(X))**2, axis = -1))
print(rbf)

Visulaization of data in 3D

In [None]:
# Visualzing in 3d

from mpl_toolkits import mplot3d
fig = plt.figure(figsize=(10,8))
ax = plt.axes(projection='3d')
ax.scatter(X[:, 0], X[:, 1], rbf, c=y, s=20, cmap='autumn')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

From the above plot, observe that the data becomes linearly separable by transforming the data to a higher dimensions

This type of basis function transformation is known as a kernel transformation, as it is based on a similarity relationship (or kernel) between each pair of points.

### Try to apply SVM Classifier using RBF Kernel

In Scikit-Learn, apply kernelized SVM simply by changing linear kernel to an RBF (radial basis function) kernel

In [None]:
# Kernel is 'rbf'
clf_RBF = SVC(kernel='rbf').fit(X, y)

Visualization using RBF Kernel

In [None]:
plot_decision_regions(X, y, clf_RBF, legend=1)

Using this kernelized support vector machine, we see a suitable nonlinear decision boundary.