# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint
## Not for grades

## Learning Objective

The objective of this experiment is to understand Linear classifier.

## Dataset

#### History

This is a multivariate dataset introduced by R.A.Fisher (Father of Modern Statistics) for showcasing linear discriminant analysis. This is arguably the best known dataset in Feature Selection literature.


The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

#### Description
The Iris dataset consists of 150 data instances. There are 3 classes (Iris Versicolor, Iris Setosa and Iris Virginica) each have 50 instances. 


For each flower we have the below data attributes 

- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm

To make our experiment easy we rename the classes  with numbers : 

    "0": setosa
    "1": versicolor
    "2": virginica
    

### Challenges

When we use the data with large number of features or dimensionality, models usually choke because

    1. Training time increases exponentially with number of features.
    2. Models have increasing risk of overfitting with increasing number of features.
    
To avoid the above mentioned problems while learning about data analysis, we use simple, well behaved, data that reduces the cognitive load, and makes it easier to debug as we are able to better comprehend the data we are working with.  

Hence, this is a good dataset to work on.

## Domain Information



Iris Plants are flowering plants with showy flowers. They are very popular among movie directors as it gives excellent background. 

They are predominantly found in dry, semi-desert, or colder rocky mountainous areas in Europe and Asia. They have long, erect flowering stems and can produce white, yellow, orange, pink, purple, lavender, blue or brown colored flowers. There are 260 to 300 types of iris.

![alt text](https://cdn-images-1.medium.com/max/1275/1*7bnLKsChXq94QjtAiRn40w.png)

As you could see, flowers have 3 sepals and 3 petals.  The sepals are usually spreading or drop downwards and the petals stand upright, partly behind the sepal bases. However, the length and width of the sepals and petals vary for each type.


### Setup Steps

In [0]:
#@title Please enter your registration id to start: (e.g. P181900101) { run: "auto", display-mode: "form" }
Id = "P181902118" #@param {type:"string"}


In [0]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "8860303743" #@param {type:"string"}


In [0]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()
  
notebook="Experiment_L_DT_Iris" #name of the notebook
Answer = "Ungraded"
def setup():
#  ipython.magic("sx pip3 install torch") 
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/Data_setosa_versicolor.csv")
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/Data_setosa_virginica.csv")
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/Data_versicolor_virginica.csv")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      print("Your submission is successful.")
      print("Ref Id:", submission_id)
      print("Date of submission: ", r["date"])
      print("Time of submission: ", r["time"])
      print("View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions")
      print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
      return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if Additional: return Additional      
    else: raise NameError('')
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None

def getAnswer():
  try:
    return Answer
  except NameError:
    print ("Please answer Question")
    return None

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
    from IPython.display import HTML
    HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id))
  
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


In [0]:
!ls

Data_setosa_versicolor.csv     Experiment_K_KNN_Iris.ipynb
Data_setosa_virginica.csv      Experiment_L_DT_Iris.ipynb
Data_versicolor_virginica.csv  sample_data


#### Importing Required Packages

In [0]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn import linear_model
import pandas as pd

#### Loading the data

There are three files choose one among those

* Data_setosa_versicolor.csv
* Data_setosa_virginica.csv
* Data_versicolor_virginica.csv

In [0]:
dataset = "Data_versicolor_virginica.csv"

In [0]:
data = pd.read_csv(dataset)

In [0]:
data.shape

(100, 5)

### Splitting the data into train and test sets 

In [0]:
from sklearn.model_selection import train_test_split
print(data)

     x1   x2   x3   x4  targets
0   7.0  3.2  4.7  1.4        1
1   6.4  3.2  4.5  1.5        1
2   6.9  3.1  4.9  1.5        1
3   5.5  2.3  4.0  1.3        1
4   6.5  2.8  4.6  1.5        1
5   5.7  2.8  4.5  1.3        1
6   6.3  3.3  4.7  1.6        1
7   4.9  2.4  3.3  1.0        1
8   6.6  2.9  4.6  1.3        1
9   5.2  2.7  3.9  1.4        1
10  5.0  2.0  3.5  1.0        1
11  5.9  3.0  4.2  1.5        1
12  6.0  2.2  4.0  1.0        1
13  6.1  2.9  4.7  1.4        1
14  5.6  2.9  3.6  1.3        1
15  6.7  3.1  4.4  1.4        1
16  5.6  3.0  4.5  1.5        1
17  5.8  2.7  4.1  1.0        1
18  6.2  2.2  4.5  1.5        1
19  5.6  2.5  3.9  1.1        1
20  5.9  3.2  4.8  1.8        1
21  6.1  2.8  4.0  1.3        1
22  6.3  2.5  4.9  1.5        1
23  6.1  2.8  4.7  1.2        1
24  6.4  2.9  4.3  1.3        1
25  6.6  3.0  4.4  1.4        1
26  6.8  2.8  4.8  1.4        1
27  6.7  3.0  5.0  1.7        1
28  6.0  2.9  4.5  1.5        1
29  5.7  2.6  3.5  1.0        1
..  ... 

In [0]:
X_train, X_test, y_train, y_test = train_test_split(data.values[:,:4], data.values[:,4], test_size=0.33, random_state=42)

In [0]:
# Let us see the size of train and  test sets
X_train.shape, X_test.shape

((67, 4), (33, 4))

In [0]:
# Let us see first five rows of the training data

X_train[:5]

array([[6. , 2.9, 4.5, 1.5],
       [6.8, 3.2, 5.9, 2.3],
       [5.7, 2.8, 4.5, 1.3],
       [6.5, 3. , 5.5, 1.8],
       [6.4, 3.2, 5.3, 2.3]])

In [0]:
y_train

array([1., 2., 1., 2., 2., 1., 1., 1., 1., 1., 2., 1., 1., 2., 1., 2., 1.,
       1., 1., 1., 1., 1., 2., 1., 2., 1., 2., 2., 2., 2., 1., 2., 2., 1.,
       2., 2., 2., 2., 1., 2., 1., 2., 2., 2., 1., 2., 2., 2., 2., 1., 1.,
       1., 2., 1., 1., 1., 2., 2., 2., 2., 2., 1., 2., 2., 1., 2., 2.])

### Training a  Linear Classifier 

In [0]:
linear_classifier = linear_model.SGDClassifier()

In [0]:
# Training or fitting the model with the train data
linear_classifier.fit(X_train,y_train)



SGDClassifier(alpha=0.0001, average=False, class_weight=None,
       early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
       l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=None,
       n_iter=None, n_iter_no_change=5, n_jobs=None, penalty='l2',
       power_t=0.5, random_state=None, shuffle=True, tol=None,
       validation_fraction=0.1, verbose=0, warm_start=False)

In [0]:
# Testing the trained model
linear_classifier.predict(X_test)

array([2., 2., 2., 1., 1., 1., 2., 2., 1., 1., 2., 1., 2., 2., 2., 1., 2.,
       2., 1., 1., 2., 2., 1., 1., 2., 1., 1., 2., 1., 2., 1., 1., 2.])

In [0]:
# Calculating the score
linear_classifier.score(X_test,y_test)

0.9090909090909091

### Please answer the questions below to complete the experiment:

In [0]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good, But Not Challenging for me" #@param ["Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [0]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "test" #@param {type:"string"}

In [0]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["Yes", "No"]

In [0]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 973
Date of submission:  10 Mar 2019
Time of submission:  12:07:34
View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions
For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.
