### Formal Definition:

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

In [1]:
# Support Vector Machine (SVM)

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
# Importing the dataset
Ad = pd.read_csv('Advertisement.csv')
X = Ad.iloc[:, [2, 3]].values
y = Ad.iloc[:, 4].values

In [4]:
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)


In [6]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
# C - The Regularization parameter (often termed as C parameter in python’s sklearn library) 
# tells the SVM optimization how much you want to avoid misclassifying each training example.

# For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane 
# does a better job of getting all the training points classified correctly. Conversely, a very small 
# value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that 
# hyperplane misclassifies more points.

#_________________________________________

# Kernel : In machine learning, the radial basis function kernel, or RBF kernel, 
# is a popular kernel function used in various kernelized learning algorithms. 
# In particular, it is commonly used in support vector machine classification.

#_________________________________________

# Learn more about this from:

#  https://chrisalbon.com/machine_learning/support_vector_machines/svc_parameters_using_rbf_kernel/



#### Gamma
gamma is a parameter of the RBF kernel and can be thought of as the ‘spread’ of the kernel and therefore 
the decision region. **When gamma is low, the ‘curve’ of the decision boundary is very low and thus the decision 
region is very broad. When gamma is high, the ‘curve’ of the decision boundary is high**, which creates islands of
decision-boundaries around data points. We will see this very clearly below.

#### C
C is a parameter of the SVC learner and is the penalty for misclassifying a data point. When C is small, the classifier is okay with misclassified data points (high bias, low variance). When C is large, the classifier is heavily penalized for misclassified data and therefore bends over backwards avoid any misclassified data points (low bias, high variance).

In [9]:
# Fitting SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=0, shrinking=True,
  tol=0.001, verbose=False)

In [10]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)

In [11]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [12]:
cm

array([[66,  2],
       [ 8, 24]], dtype=int64)