### SVM
SVM offers very high accuracy compared to other classifiers such as logistic regression, and decision trees. It is known for its kernel trick to handle nonlinear input spaces. It is used in a variety of applications such as face detection, intrusion detection, classification of emails, news articles and web pages, classification of genes, and handwriting recognition.

Generally, Support Vector Machines is considered to be a classification approach, it but can be employed in both types of classification and regression problems. It can easily handle multiple continuous and categorical variables. SVM constructs a hyperplane in multidimensional space to separate different classes. SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error. The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes.

ajánlott irodalom: [here](https://towardsdatascience.com/support-vector-machine-explained-8bfef2f17e71), [here](https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/)

In [None]:
from IPython.display import Image
Image("img/svm.png",width=300)

#### Support Vectors
Support vectors are the data points, which are closest to the hyperplane. These points will define the separating line better by calculating margins. These points are more relevant to the construction of the classifier.

#### Hyperplane
A hyperplane is a decision plane which separates between a set of objects having different class memberships.

#### Margin
A margin is a gap between the two lines on the closest class points. This is calculated as the perpendicular distance from the line to support vectors or closest points. If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin.

#### How does SVM work?
The main objective is to segregate the given dataset in the best possible way. The distance between the either nearest points is known as the margin. The objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. SVM searches for the maximum marginal hyperplane in the following steps:

1. Generate hyperplanes which segregates the classes in the best way. Left-hand side figure showing three hyperplanes black, blue and orange. Here, the blue and orange have higher classification error, but the black is separating the two classes correctly.

2. Select the right hyperplane with the maximum segregation from the either nearest data points as shown in the right-hand side figure.



In [None]:
Image("img/svm2.png",width=600)

#### Dealing with non-linear and inseparable planes
Some problems can’t be solved using linear hyperplane, as shown in the figure below (left-hand side).

In such situation, SVM uses a kernel trick to transform the input space to a higher dimensional space as shown on the right. The data points are plotted on the x-axis and z-axis (Z is the squared sum of both x and y: z=x^2+y^2). Now you can easily segregate these points using linear separation.

In [None]:
Image("img/svm3.png",width=600)

### Exercise

In [2]:
# Load the built-in breast_cancer dataset: cancer
# NOTE: from sklearn datasets (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html)

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target


In [3]:
# print the names of the 13 features
for feature in cancer.feature_names:
    print("- ", feature)


# print the label type of cancer
print("Label type of cancer: ", cancer.target_names)


-  mean radius
-  mean texture
-  mean perimeter
-  mean area
-  mean smoothness
-  mean compactness
-  mean concavity
-  mean concave points
-  mean symmetry
-  mean fractal dimension
-  radius error
-  texture error
-  perimeter error
-  area error
-  smoothness error
-  compactness error
-  concavity error
-  concave points error
-  symmetry error
-  fractal dimension error
-  worst radius
-  worst texture
-  worst perimeter
-  worst area
-  worst smoothness
-  worst compactness
-  worst concavity
-  worst concave points
-  worst symmetry
-  worst fractal dimension
Label type of cancer:  ['malignant' 'benign']


In [4]:
# print the cancer data (top 3 records)
print(X[:3])


[[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
  1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
  6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
  1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
  4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 1.326e+03 8.474e-02 7.864e-02 8.690e-02
  7.017e-02 1.812e-01 5.667e-02 5.435e-01 7.339e-01 3.398e+00 7.408e+01
  5.225e-03 1.308e-02 1.860e-02 1.340e-02 1.389e-02 3.532e-03 2.499e+01
  2.341e+01 1.588e+02 1.956e+03 1.238e-01 1.866e-01 2.416e-01 1.860e-01
  2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 1.203e+03 1.096e-01 1.599e-01 1.974e-01
  1.279e-01 2.069e-01 5.999e-02 7.456e-01 7.869e-01 4.585e+00 9.403e+01
  6.150e-03 4.006e-02 3.832e-02 2.058e-02 2.250e-02 4.571e-03 2.357e+01
  2.553e+01 1.525e+02 1.709e+03 1.444e-01 4.245e-01 4.504e-01 2.430e-01
  3.613e-01 8.758e-02]]


In [5]:
# print the cancer labels (0:malignant, 1:benign)
print("Cancer labels:")
for idx, label in enumerate(y):
    print("Sample", idx, ": ", "Malignant" if label == 0 else "Benign")


Cancer labels:
Sample 0 :  Malignant
Sample 1 :  Malignant
Sample 2 :  Malignant
Sample 3 :  Malignant
Sample 4 :  Malignant
Sample 5 :  Malignant
Sample 6 :  Malignant
Sample 7 :  Malignant
Sample 8 :  Malignant
Sample 9 :  Malignant
Sample 10 :  Malignant
Sample 11 :  Malignant
Sample 12 :  Malignant
Sample 13 :  Malignant
Sample 14 :  Malignant
Sample 15 :  Malignant
Sample 16 :  Malignant
Sample 17 :  Malignant
Sample 18 :  Malignant
Sample 19 :  Benign
Sample 20 :  Benign
Sample 21 :  Benign
Sample 22 :  Malignant
Sample 23 :  Malignant
Sample 24 :  Malignant
Sample 25 :  Malignant
Sample 26 :  Malignant
Sample 27 :  Malignant
Sample 28 :  Malignant
Sample 29 :  Malignant
Sample 30 :  Malignant
Sample 31 :  Malignant
Sample 32 :  Malignant
Sample 33 :  Malignant
Sample 34 :  Malignant
Sample 35 :  Malignant
Sample 36 :  Malignant
Sample 37 :  Benign
Sample 38 :  Malignant
Sample 39 :  Malignant
Sample 40 :  Malignant
Sample 41 :  Malignant
Sample 42 :  Malignant
Sample 43 :  Malig

In [6]:
# Split dataset into training set and test set (sklearn train_test_split)
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)


In [7]:
# Create a svm Classifier: clf (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)
# Use linear kernel
from sklearn.svm import SVC
clf = SVC(kernel='linear')

# Train (fit) the model using the training sets
clf.fit(X_train, y_train)

# Predict the response for test dataset
y_pred = clf.predict(X_test)


In [8]:
# Model Accuracy: how often is the classifier correct?
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

0.956140350877193


# Model Precision: what percentage of positive predictions are truly positive?


# Model Recall: what percentage of positive datapoints are labelled as such?


#### Tuning Hyperparameters
* **Kernel:** The main function of the kernel is to transform the given dataset input data into the required form. There are various types of functions such as linear, polynomial, and radial basis function (RBF). Polynomial and RBF are useful for non-linear hyperplane. Polynomial and RBF kernels compute the separation line in the higher dimension. In some of the applications, it is suggested to use a more complex kernel to separate the classes that are curved or nonlinear. This transformation can lead to more accurate classifiers.
* **Regularization:** Regularization parameter in python's Scikit-learn C parameter used to maintain regularization. Here C is the penalty parameter, which represents misclassification or error term. The misclassification or error term tells the SVM optimization how much error is bearable. This is how you can control the trade-off between decision boundary and misclassification term. A smaller value of C creates a small-margin hyperplane and a larger value of C creates a larger-margin hyperplane.
* **Gamma:** A lower value of Gamma will loosely fit the training dataset, whereas a higher value of gamma will exactly fit the training dataset, which causes over-fitting. In other words, you can say a low value of gamma considers only nearby points in calculating the separation line, while the a value of gamma considers all the data points in the calculation of the separation line.

In [9]:
from sklearn.model_selection import GridSearchCV
#(https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

# defining parameter range
param_grid = {'C': [0.1, 1, 10, 100, 1000],
'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
'kernel': ['rbf']}


# Check how the function GridSearchCV works
# Use it with estimator SVC and with the above-given param_grid
# Set the verbose parameter to at least 1
clf = SVC()
grid_clf = GridSearchCV(clf, param_grid, verbose=1)

# Fit the created grid model on your train data
grid_clf.fit(X_train, y_train)


Fitting 5 folds for each of 25 candidates, totalling 125 fits


In [10]:
# Print best parameter after tuning 
# (your created grid model has a function named best_params_)
print("Best Parameters: ", grid_clf.best_params_)

# Print how our model looks after hyper-parameter tuning
# (check the best_estimator_ function)
print("Best Estimator: ", grid_clf.best_estimator_)


Best Parameters:  {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}
Best Estimator:  SVC(C=10, gamma=0.0001)


In [11]:
# Predict with the help of your new model: grid_predictions
# As usual, this model also has a 'predict' function
grid_pred = grid_clf.predict(X_test)


In [12]:
# Evaluate your model: print its accuracy, precision and recall values
from sklearn.metrics import accuracy_score, precision_score, recall_score

accuracy = accuracy_score(y_test, grid_pred)
precision = precision_score(y_test, grid_pred)
recall = recall_score(y_test, grid_pred)

print("Accuracy: {:.4f}".format(accuracy))
print("Precision: {:.4f}".format(precision))
print("Recall: {:.4f}".format(recall))


Accuracy: 0.9474
Precision: 0.9452
Recall: 0.9718
