<a href="https://colab.research.google.com/github/RahulJuluru2/unit2assignments/blob/main/U2W10_17_SVMandKernels_BankNotes_Data_C_RJ.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objective

At the end of the experiment, you will be able to :

* perform SVM classifier using different kernels

In [None]:
#@title Experiment Walthrough Video
from IPython.display import HTML

HTML("""<video width="854" height="480" controls>
  <source src="https://cdn.exec.talentsprint.com/content/svm_and_kernels.mp4" type="video/mp4">
</video>
""")

## Dataset

### History

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.


### Description

Whenever you go to the bank to deposit some cash, money, the cashier places banknotes in a machine which tells whether a banknote is real or not. Therefore, to identify whether a banknote is real or not, you need a dataset of real as well as fake bank notes along with their different features.


The Banknote Authentication Data Set consists of 1372 instances. This is a binary classification problem which consists of 2 classes. Here our task is to predict whether a bank currency note is authentic or not based upon four attributes of the note.



We have the below data attributes for Banknote Authentication

- variance of Wavelet Transformed image
- skewness of Wavelet Transformed image
- curtosis of Wavelet Transformed image
- entropy of image 



## AI / ML Technique

### SVM

In this experiment, we are using SVM.

**Below is a quick overview of SVM.**

* SVM assumes that the data is linearly separable.

* It chooses the line which is more distant from both the classes.

In the SVM algorithm, we find the points closest to the line from both the classes. These points are called support vectors. We compute the distance between the line and the support vectors which is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is called an optimal hyperplane.

![alttxt](https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Images/SVM.png)




## Setup Steps

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2216842" #@param {type:"string"}

In [None]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "9959488784" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython
import warnings
warnings.simplefilter("ignore")

ipython = get_ipython()
  
notebook= "U2W10_17_SVMandKernels_BankNotes_Data_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")  
    ipython.magic("sx -qq pip install seaborn")
    ipython.magic("sx -qq pip install sklearn")
    ipython.magic("sx wget -qq https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/bill_authentication.csv")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None
    
    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getWalkthrough() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook, "feedback_walkthrough":Walkthrough ,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:        
        print(r["err"])
        return None   
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://aiml.iiith.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if not Additional: 
      raise NameError
    else:
      return Additional  
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None
  
  
def getWalkthrough():
  try:
    if not Walkthrough:
      raise NameError
    else:
      return Walkthrough
  except NameError:
    print ("Please answer Walkthrough Question")
    return None
  
def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None
  

def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError 
    else: 
      return Answer
  except NameError:
    print ("Please answer Question")
    return None
  

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup() 
else:
  print ("Please complete Id and Password cells before running setup")



## Importing required packages

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from mlxtend.plotting import plot_decision_regions

## Loading the data

In [None]:
dataset = "bill_authentication.csv"
bankdata = pd.read_csv(dataset)

In [None]:
# Check the dimensions of the data 
bankdata.shape

In [None]:
# Print first 5 rows of the data
bankdata.head()

**Perform feature correlation to find linearly and non-linear separable classes that can be classified using SVM**

To see the relationship between different features in our dataset let us use the “pairplot()” function from the Seaborn library. The function takes dataset as a parameter and plots a graph that contains relationships between all the features in the dataset as shown below:

In [None]:
sns.pairplot(bankdata, hue="Class", palette="bright", height=2, aspect=1)

Considering 'Variance' and 'Skewness' features from the Banknote dataset, we see that the classes are not linearly separable.

## Storing the data into features and labels


In [None]:
# Storing the data and labels into "X" and "y" variables
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']
X = X.iloc[:,:2]
X.head()

## Visualizing the data

In [None]:
from matplotlib import pyplot as plt
plt.scatter(X.iloc[:,0], X.iloc[:,1], c=y)

## Splitting the data into train and test sets 

Note: [Train-Test split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

## Training a SVM Classifier with different Kernels

### Apply 'Linear' Kernel

Linear Kernel in SVM is used to see the data is linearly separable data or not, which means if a dataset can be classified into two classes by using a single straight line.

**Note:** Refer to  [SVM](https://scikit-learn.org/0.16/modules/generated/sklearn.svm.SVC.html)

In [None]:
# Create an object for 'SVC' with 'linear' kernel
svc_linear = SVC(kernel='linear') 

In [None]:
# Fit the data
svc_linear.fit(X_train, y_train)

# Get the predictions on the test data
y_pred = svc_linear.predict(X_test)

# Calculate the accuracy
accuracy_score(y_pred, y_test)

Visualization using linear Kernel

In [None]:
plot_decision_regions(X.values, y.values, svc_linear, legend=1)
plt.show()

### Apply 'poly' Kernel

The polynomial kernel is a kernel function that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables, allowing learning of non-linear models. Below is the formula to compute Polynomial function

* $K(X, X_i) = (X*X_i + r)^d$
    * $X$ and $X_i$ are vectors in the input space, i.e. vectors of features computed from training or test samples
    * r determines the coefficients of the polynomial.
    * d determines the degree of the polynomial.

**Note:** Refer to [SVM](https://scikit-learn.org/0.16/modules/generated/sklearn.svm.SVC.html)

In [None]:
# Create an object for 'SVC' with 'poly' kernel
svc_poly = SVC(kernel='poly')

In [None]:
# Fit the data
svc_poly.fit(X_train, y_train)

# Get the predictions on the test data
poly_pred = svc_poly.predict(X_test)

# Calculate the accuracy
accuracy_score(poly_pred, y_test)

Visualization using polynomial kernel

In [None]:
plot_decision_regions(X.values, y.values, svc_poly, legend=1)
plt.show()

### Apply 'rbf' Kernel

The RBF kernel is also called the Gaussian Radial Basis kernel. RBF (Radial Basis Function) can map an input space in infinite dimensional space. This type of basis function transformation is known as a kernel transformation. The RBF kernel function for two points a and b computes the similarity between each pair of points or how close they are to each other. Below is the formula to compute RBF function.

* $K(X, X_i) = exp^{(-\gamma * \sum(X-X_i)^2)}$
    * $X$ and $X_i$ are two feature vectors of two samples.
    * The difference between the vectors is then squared, i.e.  it gives squared euclidean distance.
    * $γ$ (Gamma) scales the squared euclidean distance and thus scales the influence the two vectors/points have on each other.

In [None]:
# Create an object for 'SVC' with 'rbf' kernel
svc_rbf = SVC(kernel='rbf')

In [None]:
# Fit the data
svc_rbf.fit(X_train, y_train)

# Get the predictions on the test data
rbf_pred = svc_rbf.predict(X_test)

# Calculate the accuracy
accuracy_score(rbf_pred, y_test)

Visualization using RBF Kernel

In [None]:
plot_decision_regions(X.values, y.values, svc_rbf, legend=1)
plt.show()

# Please answer the questions below to complete the experiment:




In [None]:
#@title State True or False: The Kernel trick is used to classify only linear data { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "FALSE" #@param ["","TRUE","FALSE"]


In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "Everything is good" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [None]:
#@title  Experiment walkthrough video? { run: "auto", vertical-output: true, display-mode: "form" }
Walkthrough = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")