
# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objectives

At the end of the experiment, you will be able to:


*   understand how data negatively impacts the performance of the model



In [None]:
#@title Experiment Explanation Video
from IPython.display import HTML

HTML("""<video width="800" height="300" controls>
  <source src="https://cdn.talentsprint.com/talentsprint1/archives/sc/misc/overfitting_iris.mp4" type="video/mp4">
</video>
""")



## Dataset

#### History

This is a multivariate dataset introduced by R.A.Fisher (Father of Modern Statistics) for showcasing linear discriminant analysis. This is arguably the best known dataset in Feature Selection literature.


The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

#### Description
The Iris dataset consists of 150 data instances. There are 3 classes (Iris Versicolor, Iris Setosa, and Iris Virginica) each has 50 instances. 


For each flower, we have the below data attributes 

- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm


## Domain Information



Iris Plants are flowering plants with showy flowers. They are very popular among movie directors as it gives an excellent background. 

They are predominantly found in dry, semi-desert, or colder rocky mountainous areas in Europe and Asia. They have long, erect flowering stems and can produce white, yellow, orange, pink, purple, lavender, blue, or brown colored flowers. There are 260 to 300 types of iris.

![alt text](https://cdn-images-1.medium.com/max/1275/1*7bnLKsChXq94QjtAiRn40w.png)

As you could see, flowers have 3 sepals and 3 petals.  The sepals are usually spreading or drop downwards and the petals stand upright, partly behind the sepal bases. However, the length and width of the sepals and petals vary for each type.


## AI / ML Technique

Overfitting refers to a model that models the training data too well.

Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. 

In this experiment we are going to use 2 features from Iris Dataset to visualise Overfitting step by step.
  1. Plot train error and test error
  2. Observe when the overfitting starts in the plot.

### Setup Steps

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2100121" #@param {type:"string"}


In [None]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "5142192291" #@param {type:"string"}


In [None]:
#@title Run this cell to complete the setup for this Notebook  
from IPython import get_ipython

ipython = get_ipython()
  
notebook="U3W14_28_Overfitting_Iris_C" #name of the notebook
def setup():
#  ipython.magic("sx pip3 install torch")  
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None
    
    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getWalkthrough() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook, "feedback_walkthrough":Walkthrough ,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:        
        print(r["err"])
        return None   
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://aiml.iiith.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if not Additional: 
      raise NameError
    else:
      return Additional  
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None
  
  
def getWalkthrough():
  try:
    if not Walkthrough:
      raise NameError
    else:
      return Walkthrough
  except NameError:
    print ("Please answer Walkthrough Question")
    return None
  
def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None
  

def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError 
    else: 
      return Answer
  except NameError:
    print ("Please answer Question")
    return None
  

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup() 
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Importing the required packages

In [1]:
# Importing required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

### Loading the dataset from sklearn package and split into train and test data

In [None]:
# Loading iris dataset from sklearn
iris = datasets.load_iris()

# Storing only 2 features 
X = iris.data[:,(0,2)]

# Storing the target data
Y = iris.target

X_train = X[:45,:]
Y_train = Y[:45]
X_test = X[45:,:]
Y_test = Y[45:]

### Linear Function

Define a function which takes slope and inputs to give the output $y$ (product of input * slope) in the form of $y$ = $f(x)$

In [2]:
# Linear function
def linf(m, x):
    return np.matmul(x,m)

### Model evaluation



*   Get the predictions by applying the linear function.
*   Calculate the error by comparing actuals with predictions.
*   Calculate the change in slope and update the values.



In [None]:
def one_step(x, y, m, eta):
    #Predicting the values
    ypred = linf(m, x)
    
    # Calculating the error
    error = np.linalg.norm((y - ypred)**2)

    # Calculating the delta value
    delta_m = -2*np.matmul(x.T,(y - ypred))

    # Updating m value
    m = m - (delta_m * eta)
    return m, error

### Calculate the test error

In [None]:
num_feat = len(X_train[0]) 

np.random.seed(17)

# Intializing the m value with random value
m = np.random.uniform(-1,1,(num_feat,1))

# Learning rate
eta = 2e-4

train_errs = []
test_errs = []

# Reshaping the size of Y_train array 
Y_train = Y_train.reshape(-1,1) 

# Reshaping the size of Y_test array
Y_test = Y_test.reshape(-1,1)  
 
for i in range(50):
    # Calling the function
    m, error = one_step(X_train, Y_train, m, eta)
    
    # Appending the trained error
    train_errs.append(error)

    # Calculating the test errors using the updated m value
    test_errs.append(np.linalg.norm((Y_test - linf(m, X_test))**2))

### Printing the training and testing errors

In [None]:
error = pd.DataFrame([train_errs, test_errs]).T
error

Unnamed: 0,0,1
0,26.41712,65.033739
1,6.81591,34.822766
2,1.770713,23.311562
3,0.471133,18.440719
4,0.135416,16.217633
5,0.047263,15.155377
6,0.022629,14.635696
7,0.015211,14.379447
8,0.013028,14.253812
9,0.012418,14.193661


In [None]:
print('\nMinimum Training Error occurs at {}'.format(int(np.argmin(train_errs))))
print('Minimum Testing Error occurs at {}\n'.format(int(np.argmin(test_errs))))


Minimum Training Error occurs at 49
Minimum Testing Error occurs at 12



### Please answer the questions below to complete the experiment:




In [None]:
#@title Over-fitting occurs when the variance is high and the model is complicated with lots of unnecessary curves and angles { run: "auto", form-width: "500px", display-mode: "form" }
Answer= "TRUE" #@param ["","TRUE","FALSE"]


In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good, But Not Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "non" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [None]:
#@title  Experiment walkthrough video? { run: "auto", vertical-output: true, display-mode: "form" }
Walkthrough = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 10933
Date of submission:  06 Dec 2020
Time of submission:  16:01:26
View your submissions: https://aiml.iiith.talentsprint.com/notebook_submissions
