
# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objectives

At the end of the experiment, you will be able to:


* Classify the MNIST dataset using MLP and then quantize the weights of the network 
* Understand how quantization reduces the storage needs of the network

In [0]:
#@title Experiment Explanation Video
from IPython.display import HTML

HTML("""<video width="800" height="300" controls>
  <source src="https://cdn.talentsprint.com/aiml/AIML_BATCH_HYD_7/March31/uniform_nonuniform_quantization.mp4" type="video/mp4">
</video>
""")

## Dataset


###Description


1. The dataset contains 60,000 Handwritten digits as training samples and 10,000 Test samples, which means each digit occurs 6000 times in the training set and 1000 times in the testing set (approximately). 
2. Each image is Size Normalized and Centered. 
3. Each image is 28 X 28 Pixel with 0-255 Gray Scale Value. 
4. That means each image is represented as 784 (28 X28) dimension vector where each value is in the range 0- 255.

###History

Yann LeCun (Director of AI Research, Facebook, Courant Institute, NYU) was given the task of identifying the cheque numbers (in the 90’s) and the amount associated with that cheque without manual intervention. That is when this dataset was created which raised the bars and became a benchmark.

Yann LeCun and Corinna Cortes (Google Labs, New York) hold the copyright of MNIST dataset, which is a subset of the original NIST datasets. This dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license. 

It is the handwritten digits dataset in which half of them are written by the Census Bureau employees and remaining by the high school students. The digits collected among the Census Bureau employees are easier and cleaner to recognize than the digits collected among the students.


###Challenges

Now, if you notice the images below, you will find that between 2 characters there are always certain similarities and differences. To teach a machine to recognize these patterns and identify the correct output is intriguing.

![altxt](https://www.researchgate.net/profile/Radu_Tudor_Ionescu/publication/282924675/figure/fig3/AS:319968869666820@1453297931093/A-random-sample-of-6-handwritten-digits-from-the-MNIST-data-set-before-and-after.png)

Hence, all these challenges make this a good problem to solve in Machine Learning.

## Domain Information

Handwriting changes person to person. Some of us have neat handwriting and some have illegible handwriting such as doctors. However, if you think about it even a child who recognizes alphabets and numerics can identify the characters of a text even written by a stranger. But even a technically knowledgeable adult cannot describe the process by which he or she recognizes the text/letters. As you know this is an excellent challenge for Machine Learning.

![altxt](https://i.pinimg.com/originals/f2/7a/ac/f27aac4542c0090872110836d65f4c99.jpg)

The experiment handles a subset of text recognition, namely recognizing the 10 numerals (0 to 9) from scanned images.


## AI / ML Technique

### Quantization for Image Classification:

In this experiment, we train a neural network on the dataset to classify the images and then reduce the storage requirements of the network by quantizing the weights of the network using compression.

Neural network models can take up a lot of space on disk where almost all of that space is taken up by the weights of the neural connections, which are often millions in number in a single model. As the weights arer all slightly different floating point numbers, simple compression formats like zip don't compress them well. Quantization is a network compression technique that is used to save the storage for the many parameters of the network by compressing the weights. The weights intially are represented as 8-bit values, so we are using 2 * 8 = 16 in storage. If we are compressing the weights to 1-bit values, we are storing only 1 * 8 = 8 in storage thus reducing our storage needs by half.  Depending on how the weight space is distributed into clusters, there are two types of quantization techniques:

1. **Uniform Quantization**: The cluster heads are uniformly spaced.
2. **Non-uniform Quantization**:  The cluster heads are non - uniformly spaced using K - Means clustering.
 
You will understand these in detail while working on the code in the experiment. 

## Keywords

* Uniform quantization
* Non-uniform quantization
* K-means clustering


### Expected time to complete the experiment is : 90 min

### Setup Steps

In [0]:
#@title Please enter your registration id to start: (e.g. P181900101) { run: "auto", display-mode: "form" }
Id = "P181902118" #@param {type:"string"}


In [0]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "8860303743" #@param {type:"string"}


In [33]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()
  
notebook="BLR_M3W2E37_Uniform_Nonuniform_Quantization" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")  
    ipython.magic("sx pip3 install torch")
    ipython.magic("sx pip3 install torchvision")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      print("Your submission is successful.")
      print("Ref Id:", submission_id)
      print("Date of submission: ", r["date"])
      print("Time of submission: ", r["time"])
      print("View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions")
      print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
      return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if Additional: return Additional      
    else: raise NameError('')
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None

def getAnswer():
  try:
    return Answer
  except NameError:
    print ("Please answer Question")
    return None

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
    from IPython.display import HTML
    HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id))
  
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


Once again we do our regular imports.

In [0]:
import numpy as np
np.random.seed(1337)  # for reproducibility
from sklearn.cluster import KMeans
import torch 
import torchvision
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms

from torch.autograd import Variable
%matplotlib inline
import matplotlib.pyplot as plt

### Hyperparameters

In [0]:
num_epochs = 5
batch_size = 100
learning_rate = 0.001
use_reg = True

In [0]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### Downloading the MNIST dataset

In [6]:
train_dataset = dsets.MNIST(root='../data/',
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='../data/',
                           train=False, 
                           transform=transforms.ToTensor())

  0%|          | 0/9912422 [00:00<?, ?it/s]

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


9920512it [00:00, 27046523.65it/s]                            


Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz


32768it [00:00, 446907.99it/s]
  1%|          | 16384/1648877 [00:00<00:11, 142643.74it/s]

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


1654784it [00:00, 7463163.21it/s]                           
8192it [00:00, 180335.79it/s]


Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Processing...
Done!


### Dataloader

In [0]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

### Define the network

In [0]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU())
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 16, kernel_size=3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc1 = nn.Linear(7*7*32, 300)
        self.fc2 = nn.Linear(300, 10)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

<b>The below function is called to reinitialize the weights of the network and define the required loss criterion and the optimizer.</b> 

In [0]:
def reset_model():
    net = Net()
    net = net.to(device)

    # Loss and Optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
    return net,criterion,optimizer

### Initializing the model

In [0]:
net, criterion, optimizer = reset_model()

### Defining a L1 Regularizer

In [0]:
def l1_regularizer(net, loss, beta):
    l1_crit = nn.L1Loss(size_average=False)
    reg_loss = 0
    for param in net.parameters():
        target = (torch.FloatTensor(param.size()).zero_()).to(device)
        reg_loss += l1_crit(param, target)
        
    loss += beta * reg_loss
    return loss

### Training function

In [0]:
# Train the Model

def training(net, reset = True):
    if reset == True:
        net, criterion, optimizer = reset_model()
    else:
        criterion = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
    
    net.train()
    for epoch in range(num_epochs):
        total_loss = 0
        accuracy = []
        for i, (images, labels) in enumerate(train_loader):
            images = images.to(device)
            labels = labels.to(device)
            temp_labels = labels
          

            # Forward + Backward + Optimize
            optimizer.zero_grad()
            outputs = net(images)
            loss = criterion(outputs, labels)

            if use_reg == True :
                loss = l1_regularizer(net,loss,beta=0.001)

            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == temp_labels).sum().item()
            accuracy.append(correct/float(batch_size))

        print('Epoch: %d, Loss: %.4f, Accuracy: %.4f' %(epoch+1,total_loss, (sum(accuracy)/float(len(accuracy)))))
    
    return net

### Testing function

In [0]:
# Test the Model
def testing(net):
    net.eval() 
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
       
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the network on the 10000 test images: %.2f %%' % (100.0 * correct / total))

### Training and testing the network

In [17]:
reset = False
net = training(net, reset)
testing(net)



Epoch: 1, Loss: 535.3272, Accuracy: 0.9543
Epoch: 2, Loss: 235.3397, Accuracy: 0.9765
Epoch: 3, Loss: 187.6216, Accuracy: 0.9798
Epoch: 4, Loss: 167.9835, Accuracy: 0.9811
Epoch: 5, Loss: 158.0960, Accuracy: 0.9819
Test Accuracy of the network on the 10000 test images: 98.17 %


### Uniform Quantization

The simplest motivation for quantization is to shrink file sizes by storing the min and max for each layer, and then compressing each float value to an eight-bit integer representing the closest real number in a linear set of 256 within the range.

In the function below we send 8 bits as input which ressembles that the weights of the network should be represented with only 8 bits while storing to disk. In other words we use only 2^8 or 256 clusters. Hence each weight is represented as a 8-bit integer between 0-255.

Thus before using the weights during test time they need to be projected into the original weight space by using the following equation:

$$
W_{i} = min + \dfrac{max-min}{255}*W_{index}
$$

In [0]:
def uniform_quantize(weight, bits):
    print('-------------------------LAYER---------------------------')
    print("Number of unique parameters before quantization: " + str(len(np.unique(weight))))
    n_clusters = 2**bits
    
    maxim = np.amax(weight)
    minim = np.amin(weight)
    step= (maxim-minim)/(n_clusters - 1)

    clusters=[]

    for i in range(0,n_clusters):
        clusters.append(minim)
        minim+=step

    for i in range(0,len(weight)):
        dist= (clusters-weight[i])**2     
        weight[i]=clusters[np.argmin(dist)]
        
    print("Number of unique parameters after quantization: " + str(len(np.unique(weight))))
    
    return weight  

### Uniform Quantization

Different number of bits can be used for representing the weights and biases. The exact number of bits to use is a design choice and may depend on the complexity of the task at hand since using too less number of bits can result in poor performance. Here, we use 8 bits for quantizing the weights and 1 bit for the biases.

In [0]:
for m in net.modules():
    if isinstance(m,nn.Conv2d) or isinstance(m,nn.BatchNorm2d) or isinstance(m,nn.Linear):
        temp_weight = m.weight.data.cpu().numpy()
        dims = temp_weight.shape
        temp_weight = temp_weight.flatten()
        temp_weight = uniform_quantize(temp_weight, 8)
        temp_weight=np.reshape(temp_weight,dims)
        m.weight.data = (torch.FloatTensor(temp_weight).cuda())
        
        temp_bias = m.bias.data.cpu().numpy()
        dims = temp_bias.shape
        temp_bias = temp_bias.flatten()
        temp_bias = uniform_quantize(temp_bias, 1)
        temp_bias = np.reshape(temp_bias,dims)
        m.bias.data = (torch.FloatTensor(temp_bias).cuda())

-------------------------LAYER---------------------------
Number of unique parameters before quantization: 400
Number of unique parameters after quantization: 107
-------------------------LAYER---------------------------
Number of unique parameters before quantization: 16
Number of unique parameters after quantization: 2
-------------------------LAYER---------------------------
Number of unique parameters before quantization: 16
Number of unique parameters after quantization: 11
-------------------------LAYER---------------------------
Number of unique parameters before quantization: 16
Number of unique parameters after quantization: 2
-------------------------LAYER---------------------------
Number of unique parameters before quantization: 2304
Number of unique parameters after quantization: 131
-------------------------LAYER---------------------------
Number of unique parameters before quantization: 16
Number of unique parameters after quantization: 2
-------------------------LAYER--

Now that we have replaced the weight matrix with the approximated weight of the nearest cluster, we can test the network with the modified weights.

In [0]:
testing(net)

Test Accuracy of the network on the 10000 test images: 94.59 %


## Non-uniform quantization

We have seen in the previous method that we divide the weight space into equally partitioned cluster heads. However, instead of forcing the cluster heads to be equally spaced it would make more sense to learn them. A common and obvious practice is to learn the weight space as a distribution of cluseter centers using k-means clustering. Here, we define a function to perform k-means to the weight values.

$$
min\sum_{i}^{mn}\sum_{j}^{k}||w_{i}-c_{j}||_{2}^{2}
$$

In [0]:
num_clusters = 8
kmeans = KMeans(n_clusters=num_clusters, random_state=0,  max_iter=500, precompute_distances='auto', verbose=0)

In [0]:
def non_uniform_quantize(weights):
    print("---------------------------Layer--------------------------------")
    print("Number of unique parameters before quantization: " + str(len(np.unique(weights))))
    weights = np.reshape(weights,[weights.shape[0],1])
    print(weights.shape)
    kmeans_fit = kmeans.fit(weights)
    clusters = kmeans_fit.cluster_centers_
    
    for i in range(0,len(weights)):
        dist= (clusters-weights[i])**2     
        weights[i]=clusters[np.argmin(dist)]
        
    print("Number of unique parameters after quantization: " + str(len(np.unique(weights))))
    
    return weights  

We reset the model and train the network since we had earlier done uniform quantization on the weight already.

In [21]:
reset = True
net = training(net, reset)
testing(net)



Epoch: 1, Loss: 555.3051, Accuracy: 0.9532
Epoch: 2, Loss: 236.4261, Accuracy: 0.9772
Epoch: 3, Loss: 192.8441, Accuracy: 0.9798
Epoch: 4, Loss: 172.5762, Accuracy: 0.9809
Epoch: 5, Loss: 162.5743, Accuracy: 0.9822
Test Accuracy of the network on the 10000 test images: 98.40 %


Uniform quantization on the weights and biases

In [22]:
for m in net.modules():
    if isinstance(m,nn.Conv2d) or isinstance(m,nn.BatchNorm2d) or isinstance(m,nn.Linear):
        temp_weight = m.weight.data.cpu().numpy()
        dims = temp_weight.shape
        temp_weight = temp_weight.flatten()
        temp_weight = non_uniform_quantize(temp_weight)
        temp_weight=np.reshape(temp_weight,dims)
        m.weight.data = (torch.FloatTensor(temp_weight).cuda())
        
        temp_bias = m.bias.data.cpu().numpy()
        dims = temp_bias.shape
        temp_bias = temp_bias.flatten()
        temp_bias = non_uniform_quantize(temp_bias)
        temp_bias = np.reshape(temp_bias,dims)
        m.bias.data = (torch.FloatTensor(temp_bias).cuda())

---------------------------Layer--------------------------------
Number of unique parameters before quantization: 400
(400, 1)
Number of unique parameters after quantization: 8
---------------------------Layer--------------------------------
Number of unique parameters before quantization: 16
(16, 1)
Number of unique parameters after quantization: 8
---------------------------Layer--------------------------------
Number of unique parameters before quantization: 16
(16, 1)
Number of unique parameters after quantization: 8
---------------------------Layer--------------------------------
Number of unique parameters before quantization: 16
(16, 1)
Number of unique parameters after quantization: 8
---------------------------Layer--------------------------------
Number of unique parameters before quantization: 2304
(2304, 1)
Number of unique parameters after quantization: 8
---------------------------Layer--------------------------------
Number of unique parameters before quantization: 16
(1

In [23]:
testing(net)

Test Accuracy of the network on the 10000 test images: 98.24 %


### Retraining the network

Here we see that 8 clusters are too less in order to maintain the network at the same accuracy since we see almost a 3% drop in performance. One of the solutions is to retrain the network. This helps the other weights to compensate for those weights which on being rounded off to the nearest cluster center have resulted in a drop in performance. Accuracy can be recovered significantly on retraining the network and then non-uniformly quantizing the weights again.

In [24]:
reset = False
net = training(net, reset)
#perform non-uniform quantization
testing(net)



Epoch: 1, Loss: 156.5617, Accuracy: 0.9828
Epoch: 2, Loss: 149.7550, Accuracy: 0.9829
Epoch: 3, Loss: 143.9351, Accuracy: 0.9840
Epoch: 4, Loss: 141.0685, Accuracy: 0.9845
Epoch: 5, Loss: 137.2367, Accuracy: 0.9853
Test Accuracy of the network on the 10000 test images: 98.45 %


### References

1. https://arxiv.org/pdf/1412.6115.pdf

#### Please answer the questions below to complete the experiment:

In [0]:
#@title The k-means quantization used above, clusters all the input data points (features) before training ? { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "FALSE" #@param ["TRUE","FALSE"]


In [0]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging me" #@param ["Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging me", "Was Tough, but I did it", "Too Difficult for me"]


In [0]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = " test" #@param {type:"string"}

In [0]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["Yes", "No"]

In [34]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 5649
Date of submission:  27 May 2019
Time of submission:  22:46:57
View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions
For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.
