# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

The objective of this experiment is to understand how data negatively impacts the performance of the model.

In this experiment we will use famous Iris data set.This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. 

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

#### Data Attributes

  1. sepal length in cm 
  2. sepal width in cm 
  3. petal length in cm 
  4. petal width in cm 
  5. class: 
     -- Iris Setosa  
     -- Iris Versicolour 
     -- Iris Virginica

#### Overfitting

Overfitting refers to a model that models the training data too well.

Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. 

In this experiment we are going to use 2 features from Iris Dataset to Visualise Overfitting step by step.
  1. Plot training error and Test error
  2. Observe when the overfitting starts in the plot.

#### Keywords

* Train and Test Error
* Overfitting
* Underfitting

#### Expected time : 30mins




In [0]:
#@title Experiment Explanation Video
from IPython.display import HTML

HTML("""<video width="500" height="300" controls>
  <source src="https://cdn.talentsprint.com/talentsprint/archives/sc/aiml/aiml_2018_blr_b6/cfus/week_8/module_2_week_8_experment_4.mp4" type="video/mp4">
</video>
""")

### Setup Steps

In [0]:
#@title Please enter your registration id to start: (e.g. P181900101) { run: "auto", display-mode: "form" }
Id = "P19A06E_test" #@param {type:"string"}


In [0]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "981234567" #@param {type:"string"}


In [3]:
#@title Run this cell to complete the setup for this Notebook

from IPython import get_ipython
ipython = get_ipython()
  
notebook="BLR_M2W8_SAT_EXP_4" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
   
   print ("Setup completed successfully")
   return

def submit_notebook():
    
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getComplexity() and getAdditional() and getConcepts():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "id" : Id, "file_hash" : file_hash, "notebook" : notebook}

      r = requests.post(url, data = data)
      print("Your submission is successful.")
      print("Ref Id:", submission_id)
      print("Date of submission: ", datetime.datetime.now().date().strftime("%d %b %Y"))
      print("Time of submission: ", datetime.datetime.now().time().strftime("%H:%M:%S"))
      print("View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions")
      print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
      return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if Additional: return Additional      
    else: raise NameError('')
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
  
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


#### Expected Time to complete the experiment : 30 mins

In [0]:
## Importing required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

Loading the dataset from sklearn package

In [0]:
# Loading iris dataset from sklearn
iris = datasets.load_iris()
## Storing only 2 features 
X = iris.data[:,(0,2)]
## Storing the target data
Y = iris.target

#### Exercise 1 

Split the data into train,test and validation sets.

In [0]:
## Hint : you can use np.split
X_train, X_test, X_validation = ???
Y_train, Y_test, Y_validation = ???

In [0]:
## Linear function
def linf(m, x):
    return np.matmul(x,m)

def one_step(x, y, m, eta):
    #Predicting the values
    ypred = linf(m, x)
    #Calculating the error
    error = np.linalg.norm((y - ypred)**2)
    #calculating the delta value
    delta_m = -2*np.matmul(x.T,(y - ypred))
    #updating m value
    m = m - (delta_m * eta)
    return m, error

#### Exercise 2 

Calculate the test error

In [0]:
num_feat = len(X_train[0]) 
#Intilizing the m value with random value
m = np.random.uniform(-1,1,(num_feat+1,1))
# Learning rate
eta = 2e-4
train_errs = []
test_errs = []
#reshaping the size of Y_test array
Y_test = np.reshape(Y_test, (Y_test.shape[0],1))
#reshaping the size of Y_train array
Y_train = np.reshape(Y_train, (Y_train.shape[0],1))
## adding additional ones to X_train and X_test arrays
X_train=np.hstack( (X_train,np.ones((X_train.shape[0],1)))) 
X_test=np.hstack( (X_test,np.ones((X_test.shape[0],1)))) 

for times in range(50):
    ## Calling the function
    m, error = one_step(X_train, Y_train, m, eta)
    if times%1==0:
        # appending the trained error to train_errs
        train_errs.append(error)
        # Calculating the test errors and appending them to test_errs
        test_errs.append(???)

In [0]:
## Plotting the train_errs and test_errs
plt.plot(train_errs)
plt.plot(test_errs)
plt.legend(["Train","Test"])
plt.show()

In [0]:
print('\nMinimum Training Error occurs at {} degrees.'.format(int(np.argmin(train_errs))))
print('Minimum Testing Error occurs at {} degrees.\n'.format(int(np.argmin(test_errs))))

#### Exercise 3

Vary the train ,test and validation ratios and observe how overfitting changes.

In [0]:
#### Your code here

### Please answer the questions below to complete the experiment:




In [0]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging me" #@param ["Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging me", "Was Tough, but I did it", "Too Difficult for me"]


In [0]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "test" #@param {type:"string"}

In [0]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["Yes", "No"]

In [10]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 7101
Date of submission:  22 Dec 2018
Time of submission:  03:43:52
View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions
For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.
