# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objectives


At the end of the experiment, you will be able to:
- Understand CIFAR-10 dataset
- Experiment using perceptron algorithm
- Perform Multi-class classification using Linear  Classifier - One vs Rest

In [0]:
#@title Experiment Walkthrough
from IPython.display import HTML

HTML("""<video width="420" height="240" controls>
  <source src= "https://cdn.talentsprint.com/talentsprint/archives/sc/aiml/aiml_2018_b7_hyd/experiment_details_backup/experiment_perceptron.mp4" type="video/mp4">
</video>
""")

## Dataset

#### Description

In this experiment, we will use the CIFAR-10 dataset. It consists of 60,000 32x32 colour images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images.


The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class. 

**The code returns the contents of each data file as a dictionary**.

There are 8 pickled (To know more about pickle refer **Python_Pickle_Introduction** notebook )files in the CIFAR-10 directory.

    1. batches.meta

    2. data_batch_1

    3. data_batch_2	

    4. data_batch_3

    5. data_batch_4	

    6. data_batch_5

    7. readme.html

    8. test_batch

Getting into details of this dataset:


**data** : A 50,000x3072 numpy array of unsigned integers. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.

**labels** : A list of 50,000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.



### DataSource

https://www.cs.toronto.edu/~kriz/cifar.html

#### Perceptron


A perceptron has one or more inputs, a bias, an activation function, and a single output. The perceptron receives inputs, multiplies them by some weight, and then passes them into an activation function to produce an output. 


## Keywords



*   Perceptron
*   Linear Classifier
*   CIFAR-10
*   Multi-class Classification

### Setup Steps

In [0]:
#@title Please enter your registration id to start: (e.g. P181900101) { run: "auto", display-mode: "form" }
Id = "P181902118" #@param {type:"string"}


In [0]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "8860303743" #@param {type:"string"}


In [0]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()
  
notebook="BLR_M1W3E6_Perceptron_CIFAR10" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch") 
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/week3/Exp3/AIML_DS_CIFAR-10_STD.zip")
    ipython.magic("sx unzip AIML_DS_CIFAR-10_STD.zip")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      print("Your submission is successful.")
      print("Ref Id:", submission_id)
      print("Date of submission: ", r["date"])
      print("Time of submission: ", r["time"])
      print("View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions")
      print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
      return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if Additional: return Additional      
    else: raise NameError('')
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None

def getAnswer():
  try:
    return Answer
  except NameError:
    print ("Please answer Question")
    return None

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
    from IPython.display import HTML
    HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id))
  
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Expected time to complete experiment is 60 min

In [0]:
# Importing required packages
import numpy as np
import matplotlib.pyplot as plt
import os
import scipy.io as sio
import itertools
import operator
import random
import collections
from scipy import stats
from sklearn.metrics import accuracy_score

#### Function to unpickle the data

In [0]:
import pickle
def unpickle(file):
    with open(file, 'rb') as fo:
        dict_1 = pickle.load(fo, encoding='Latin1')
    return dict_1

### Visualizing the images in CIFAR-10 Dataset


When you pass a pickled file to the get_data function it returns features, labels, file names, list of classes of the corresponding file.

In [0]:
def get_data(file):
  dict_1 = unpickle(file)
  X = np.asarray(dict_1['data']).astype("uint8")
  Y = np.asarray(dict_1['labels'])
  names = np.asarray(dict_1['filenames'])
  list_class=(unpickle("AIML_DS_CIFAR-10_STD/batches.meta")['label_names'])
  return X,Y,names,list_class

In [0]:
# Function to visualize the data
def visualize_image(X, Y, names, image_id,size=(5,5)):
    rgb = X[image_id,:]
    plt.figure(figsize = size)
    img = rgb.reshape(3, 32, 32).transpose([1, 2, 0])
    print(img.shape)
    plt.grid(False)
    plt.imshow(img)
    plt.title(names[image_id])
    plt.show()

In [0]:
###### MY Changed ###############
# Function to visualize the data
def visualize_image(X, Y, names, image_id,size=(5,5)):
    rgb = X[image_id,:]
    plt.figure(figsize = size)
    reshaped = rgb.reshape(3, 32, 32)
    
    print(reshaped[0][0][0], reshaped[1][0][0], reshaped[2][0][0])
    print(reshaped[0][0][1], reshaped[1][0][1], reshaped[2][0][1])
    
    img = rgb.reshape(3, 32, 32).transpose([1, 2, 0])
    print("img = ", img[:2, :5], len(img), len(img[0]), len(img[0][0]))
#     print("type(rgb.reshape(3, 32, 32)) = ", type(rgb.reshape(3, 32, 32)))
#     print(img.shape)
#     plt.grid(False)
#     plt.imshow(img)
#     plt.title(names[image_id])
#     plt.show()

In [0]:
# Read 10000 images -- from batch 3
X, Y, names, classes = get_data("AIML_DS_CIFAR-10_STD/data_batch_3")
# Display the 10th image
pick = 10
print("Class =",classes[Y[pick]])
visualize_image(X, Y, names, pick,size=(0.3,0.3)) # output image would be a blured image
visualize_image(X, Y, names, pick,size=(3,3)) 

Class = horse
206 232 235
210 239 244
img =  [[[206 232 235]
  [210 239 244]
  [207 239 241]
  [210 237 244]
  [208 238 244]]

 [[214 238 244]
  [220 246 254]
  [218 247 252]
  [221 243 255]
  [220 244 255]]] 32 32 3


<Figure size 21.6x21.6 with 0 Axes>

In [0]:
###### MY Changed ###############

from collections import Counter


# Read 10000 images -- from batch 3
X, Y, names, classes = get_data("AIML_DS_CIFAR-10_STD/data_batch_3")
print("X[:10] = ", X[:10], len(X), len(X[0]))
CC = Counter(X[0])
print("CC for X[0] = ", sorted(CC.items()), "\n\n")

c = Counter(Y)
print("sorted(c.items()) for Y = ",  sorted(c.items()))
# Display the 10th image
pick = 10
print("Y = ",  Y, len(Y), "\n\n")

print("classes =", classes)

print("Y[pick] = ", Y[pick])
print("Class =",classes[Y[pick]])

visualize_image(X, Y, names, pick,size=(0.3,0.3)) # output image would be a blured image
visualize_image(X, Y, names, pick,size=(3,3)) 

In [0]:
###### MY Changed ###############

a = np.arange(30).reshape(5, 2, 3)
print(a)


[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]]

 [[24 25 26]
  [27 28 29]]]


**NOTE: **

**The images you see above are pixelated and hence they are  blur.** (Pixelation happens when you display a low resolution of an image on a larger canvas (such a large screen), where each pixel ends up being displayed as an image. You could read more about it on https://whatis.techtarget.com/definition/pixelation ).  This however does not affect the prediction of your machine learning algorithm, for the same reason.

In [0]:
from sklearn.linear_model import Perceptron

def predict(train_features,test_features,train_labels): 
  clf = Perceptron(tol=1e-3, random_state=0)
  # Fitting the data into the model
  clf.fit(train_features, train_labels)
  # Predicting the labels for test data
  predicted_values = clf.predict(test_features)
  return predicted_values


**Let us define a function to calculate accuracy score.**

In [0]:
from sklearn.metrics import accuracy_score
def calc_accuracy(train_features,test_features,train_labels,test_labels):
    # Calling predict function to get the predicted labels of test data
    pred = predict(train_features,test_features,train_labels)
    return accuracy_score(pred, test_labels)
  

**Now let us unpickle the data and labels from CIFAR-10 dataset and divide them into training and testing sets..**

In [0]:
train_features = []
train_labels = []
# Read all training features and labels
for j in "12345": 
    batch_file = 'AIML_DS_CIFAR-10_STD/data_batch_'+ j
    x_train, y_train, names_train, classes_train = get_data(batch_file)
    train_features.extend(x_train)
    train_labels.extend(y_train)

train_features = np.asarray(train_features)
train_labels = np.asarray(train_labels)

# Read all test features and labels
test_features, test_labels, names_test, classes_test = get_data("AIML_DS_CIFAR-10_STD/test_batch")

In [0]:
test_labels.shape, train_labels.shape, test_features.shape, train_features.shape

((10000,), (50000,), (10000, 3072), (50000, 3072))

In [0]:
# Function to extract the classes
def extract_2classes(class0, class1, X, Y):
    # Select class #0
    X_0 = X[Y == class0]
    Y_0 = Y[Y == class0]
    # Select class #1
    X_1 = X[Y == class1]
    Y_1 = Y[Y == class1]
    # Join the two classes to make the set
    X_2classes = np.vstack((X_0, X_1))
    Y_2classes = np.append(Y_0, Y_1)
    return X_2classes, Y_2classes

In [0]:
# Select classes #5 and #7
X_train_2classes, Y_train_2classes = extract_2classes(5, 7, train_features, train_labels)
X_test_2classes, Y_test_2classes = extract_2classes(5, 7,test_features, test_labels)

In [0]:
calc_accuracy(X_train_2classes,X_test_2classes,Y_train_2classes,Y_test_2classes)

0.7235

### Please answer the questions below to complete the experiment:

In [0]:
#@title Does pixelation affect a machine learning prediction accuracy? { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "No" #@param ["Yes", "No"]


In [0]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging me" #@param ["Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging me", "Was Tough, but I did it", "Too Difficult for me"]


In [0]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = " good" #@param {type:"string"}

In [0]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["Yes", "No"]

In [0]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 2908
Date of submission:  25 Mar 2019
Time of submission:  19:28:33
View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions
For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.
