<center>

<img src="images/UNB.png">

# GGE 6322: IMAGE PROCESSING AND COMPUTER VISION
## Support Vector Machine

### By Vaasudevan Srinivasan presented on **March 26, 2019 09:30**
</center>

---

1. A Gentle Introduction to Classification and its jargons 😊
2. Types of Classification (Supervised) 🤐
3. Support Vector Machines 🙂
4. Parameter Optimization 🤕
5. Code Along 😋

# A Gentle introduction to Classification and its jargons 😊

## What is Classification ?

**Classification** is the problem of identifying to which of a set of categories (sub-populations) a **new observation** belongs, on the **basis of a training set of data** containing observations (or instances) whose **category membership is known.**


## Classifier What?

An **algorithm** that implements classification, especially in a concrete implementation, is known as a **classifier**. The term "classifier" sometimes also refers to the **mathematical function**, implemented by a classification algorithm, that maps input data to a category.

## Features and Regions

A crude but **functional definition** of a feature is something that can be **measured in an image**. A feature is therefore a number or a set of numbers derived from a digital image.

Features are associated with **image regions**. An object within an image has a set of measurements (features) that can be used to characterize it.

## Training and Testing
"It is **standard practice** to measure and classify a set of data to establish a normal range for the features to be used in automatic classification. This is what is referred to as **training**, and it is an essential part of building a recognition."

## Class
**One of a set of enumerated target values for a label**. For example, in a binary classification model that detects spam, the two classes are **spam and not spam**. In a multi-class classification model that identifies dog breeds, the classes would be **poodle, beagle, pug**, and so on.

Classification = **Class** - ification

## Classification model
A type of machine learning model for distinguishing among two or more discrete classes.

## Decision Boundary
The **separator** between classes learned by a model in a binary class or multi-class classification problems.

<img src="images/decision_boundary.png" width=400 height=400>

## Confusion matrix
An **NxN table** that summarizes how successful a classification model's predictions were..!!

<center>
<img src="images/ConfusionMatrix.png" width=600 height=500>
</center>

<center>
<img src="images/fp_fn.jpeg" width=800 height=650>
</center>

### Accuracy:
The fraction of predictions that a classification model got right.

$\text{Accuracy} =
\frac{\text{Correct Predictions}} {\text{Total Number Of Examples}}$

### Precision:
Precision identifies the frequency with which a model was correct when predicting the positive class.

$\text{Precision} =
\frac{\text{True Positives}} {\text{True Positives} + \text{False Positives}}$

### Recall:
Out of all the possible positive labels, how many did the model correctly identify?

$\text{Recall} =
\frac{\text{True Positives}} {\text{True Positives} + \text{False Negatives}}$

---

# Types of Supervised Classification 🤐

### 1.) Logistic Regression

Logistic regression is kind of like linear regression but is used when the dependent variable is not a number, but something else (like a Yes/No response)

<img src="images/LogisticRegression.png">

### 2.) K-Nearest Neighbours (K-NN)

K-NN algorithm is one of the **simplest classification algorithm** and it is used to identify the data points that are separated into several classes to predict the classification of a new sample point. K-NN is a non-parametric, **lazy learning algorithm**. It classifies new cases based on a **similarity measure** (e.g. distance functions).

Some of the distance metrics that are mentioned in the book are:

* Pythagorean distance
* Manhattan distance or city block distance
* Mahanalobis distance

### 3.) Naive Bayes

Naive Bayes classifier is based on Bayes’ theorem with the independence assumptions between predictors.

<img src="images/NaiveBayes.png">

### 4.) Decision Tree Classification

Decision tree builds **classification or regression models in the form of a tree structure**. It breaks down a dataset into **smaller and smaller subsets** while at the same time an associated decision tree is incrementally developed. The final result is a **tree with decision nodes and leaf nodes**.

<img src="images/DecisionTrees.png">

---

# Support Vector Machines 🙂

## What is SVM ?
Support Vector is used for **both regression and Classification**. It is based on the concept of decision planes that define decision boundaries. A decision plane(hyperplane) is one that separates between a set of objects having different class memberships.
<table><tr>
    <td> <img src="images/SupportVectors.png"> </td>
    <td> <img src="images/Hyperplane.png"> </td>
</tr></table>

## Features of SVM
* SVM attempts to optimize the **line or plane** so that it is the **best one** that can be used.

* A **line** divides **two-dimensional data** into two parts; a **plane** divides **three-dimensional data** into two parts.

* The **maximum margin hyperplane** is always as far from both data sets as possible

# Hyper-plane ? 
It performs classification by finding the **hyperplane** that maximizes the margin between the two classes with the help of support vectors. It is a linear function that divides **N-dimensional data** into two parts.

<img src="images/Hyperplane_book.png">

# Convex Hull and Support Vectors

The basic idea, though, is to use feature vectors on the convex hull of the data sets as candidates to be used
to guide the optimization. 

The candidates are called support vectors and are illustrated, along with the convex hulls for the data sets.

The support vectors completely define the maximal margin line, which is the line that passes as **far as possible from all three of those vectors**. There can be more than three support vectors, but not fewer.

<img src="images/SupportVectors_book.png">

# Non-Linear Hyperplane ?

Support vector machines can also find non-linear boundaries between classes, which is their another major advantage over other classification methods.

It is accomplished by **transforming** those feature vectors so that a linear boundary can be found.

Consider the below example.

<img src="images/Non_linear_ex.png">

# Transformation

* Add a dimension and transform the points appropriately into a third dimension. Voila..!!

<img src="images/Non_linear_plane.png">

# Kernels

In SVM parlance, any given transformation uses a **kernel**, which is the function that projects the data from one dimension into other dimension.

<img src="images/Kernels.png" width=500 height=400>

# Note

Any classifier that uses linear discriminants can distinguish between only two classes.

If there are more classes, an SVM classifier must approach them pair-wise.

---

# Paramter Optimisation 🤕

# SVM C Parameter

C Parameter controls the **trade-off** between **Smooth Decision Boundary** and **Classifying points correctly**.




# Code Along 😍😋

In [None]:
# Importing the modules
import matplotlib.pyplot as plt
import pandas as pd
import cowsay

cowsay.dragon("Modules are imported successfully..!!")

In [None]:
# Iris Dataset
cols = ["SLength", "SWidth", "PLength", "PWidth", "Class"]
types = ["Setosa", "Versicolor", "Virginica"]
iris = pd.read_csv("Chap8_Datas_Code/iris-data.txt", sep="\t", names=cols)

<table> 
    <tr>
        <td style="text-align:center"> <h1>Setosa</h1> </td>
        <td style="text-align:center"> <h1>Veriscolor</h1> </td>
        <td style="text-align:center"> <h1>Virginica</h1> </td>
    </tr>
    <tr>
        <td> <img src="images/Iris_Setosa.jpeg" width=400 height=400> </td>
        <td> <img src="images/Iris_Versicolor.jpeg" width=400 height=440> </td>
        <td> <img src="images/Iris_Virginica.jpeg" width=440 height=440> </td>
    </tr>
</table>

<center><img src="images/Petal_Sepal.png" width=600 height=600></center>

In [None]:
# Plotting
colors = [{1:'red', 2:'blue', 3:'green'}[i] for i in iris.Class]
plt.xlabel("Sepal Length")
plt.ylabel("Petal Length")

plt.scatter(iris.SLength, iris.PLength, c=colors)

# Scikit-learn to the rescue

In [None]:
# Scikit-learn
from sklearn import svm, model_selection as ms
from sklearn.metrics import *

In [None]:
# Split the Dataset into Training and Testing

iris_len = pd.DataFrame([iris.SLength, iris.PLength, iris.Class]).transpose()

train, test = ms.train_test_split(iris_len, test_size=0.4, random_state=100)
cTrain, cTest = train.pop('Class'), test.pop('Class')

In [None]:
# Classifier
clf = svm.SVC(gamma='auto')
clf.fit(train, cTrain)

# Predict
predicted = clf.predict(test)

print("Accuracy: {}%".format(accuracy_score(cTest, predicted) *100), "\n")
print(classification_report(cTest, predicted, target_names=types))
print(confusion_matrix(cTest, predicted))

# References

*  Algorithms for Image Processing and Computer Vision Second Edition by J.R. Parker ([pdf](http://www.manalhelal.com/Books/crol/Algorithms%20for%20Image%20Processing%20and%20Computer%20Vision_2011.pdf))
*   https://en.wikipedia.org/wiki/Statistical_classification
*   https://developers.google.com/machine-learning/glossary/
*   https://towardsdatascience.com/supervised-machine-learning-classification-5e685fe18a6d


## Python
* <a href="https://scikit-learn.org/stable/"> Scikit-learn</a> | <a href="https://matplotlib.org/"> Matplotlib</a> | <a href="http://www.numpy.org/"> Numpy</a> | <a href="https://pandas.pydata.org/"> Pandas</a> | <a href="https://ipython.org/"> Ipython</a> | <a href="https://github.com/VaasuDevanS/cowsay-python"> Cowsay</a>

Built with <b><a href="https://jupyter.org/">Jupyter-Notebook</a></b> and hosted with <b><a href="https://mybinder.org/">mybinder</a></b>