# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

The objective of this experiment is to understand student teacher networks.

Training thin deep networks following the student-teacher learning paradigm has received intensive attention because of its excellent performance. In such a paradigm, there is a huge neural network known as the teacher network which is expert at performing a certain task. There is also a much smaller student network which learns to perform the same task using some form of guidance from the teacher. 

The student can be small in terms of 1) Depth 2) Number of parameters.

The guidance is provided by the teacher network based on hints in some form or the other. In this notebook we will see one such setup where the guidance is provided by the outputs of the teacher network.


### Keywords

* Teacher-student network

#### Expected time to complete the experiment is : 90min

### Setup Steps

In [0]:
#@title Please enter your registration id to start: (e.g. P181900101) { run: "auto", display-mode: "form" }
Id = "P19A06E_test" #@param {type:"string"}


In [0]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "981234567" #@param {type:"string"}


In [3]:
#@title Run this cell to complete the setup for this Notebook

from IPython import get_ipython
ipython = get_ipython()
  
notebook="BLR_M3W13_SAT_EXP_3" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
    ipython.magic("sx pip3 install torch")
    ipython.magic("sx pip3 install torchvision")
    print ("Setup completed successfully")
    return


def submit_notebook():
    
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getComplexity() and getAdditional() and getConcepts():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "id" : Id, "file_hash" : file_hash, "notebook" : notebook}

      r = requests.post(url, data = data)
      print("Your submission is successful.")
      print("Ref Id:", submission_id)
      print("Date of submission: ", datetime.datetime.now().date().strftime("%d %b %Y"))
      print("Time of submission: ", datetime.datetime.now().time().strftime("%H:%M:%S"))
      print("View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions")
      print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
      return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if Additional: return Additional      
    else: raise NameError('')
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
  
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully



Here are the imports.

In [0]:
import numpy as np
import torch 
import torchvision
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

### Hyperparameters

In [0]:
num_epochs = 5
batch_size = 100
learning_rate = 0.001

In [0]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### Downloading MNIST data

In [7]:
train_dataset = dsets.MNIST(root='../data/',
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='../data/',
                           train=False, 
                           transform=transforms.ToTensor())


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


### Dataloader

In [0]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

### Defining the Teacher Network

A comparitively bigger and deeper network as compared to the student network defined later.

In [0]:
class Teacher(nn.Module):
    def __init__(self):
        super(Teacher, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU())
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 16, kernel_size=3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc1 = nn.Linear(7*7*32, 300)
        self.fc2 = nn.Linear(300, 10)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.fc2(out)
        return out
    

### Defining the student network

A comparitively smaller and shallower network than the teacher.

In [0]:
class Student(nn.Module):
    def __init__(self):
        super(Student, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc1 = nn.Linear(14*14*16, 10)
        
    def forward(self, x):
        out = self.layer1(x)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        return out
    

<b>The below function is called to reinitialize the weights of the network and define the required loss criterion and the optimizer.</b> 

In [0]:
def reset_model(is_teacher = True):
    if is_teacher == True:
        net = Teacher()
    else:
        net = Student()
    net = net.to(device)


    # Loss and Optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
    return net,criterion,optimizer

### Training the teacher network

The first step is to train the teacher network to become an expert. We move ahead with regular training procedure using the cross entropy loss and the Adam optimizer.

In [0]:
teacher, criterion, optimizer = reset_model()

In [0]:
# Train the Model

def training(net, reset = True):
    if reset == True:
        net, criterion, optimizer = reset_model()
    else:
        criterion = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
    
    net.train()
    for epoch in range(num_epochs):
        total_loss = 0
        accuracy = []
        for i, (images, labels) in enumerate(train_loader):
            images = images.to(device)
            labels = labels.to(device)
            temp_labels = labels
        

            # Forward + Backward + Optimize
            optimizer.zero_grad()
            outputs = net(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == temp_labels).sum().item()
            accuracy.append(correct/float(batch_size))

        print('Epoch: %d, Loss: %.4f, Accuracy: %.4f' %(epoch+1,total_loss, (sum(accuracy)/float(len(accuracy)))))
    
    return net

### Testing the teacher network

In [0]:
# Test the Model
def testing(net):
    net.eval() 
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the network on the 10000 test images: %.2f %%' % (100.0 * correct / total))

In [15]:
reset = True
teacher = training(teacher, reset)
testing(teacher)

Epoch: 1, Loss: 75.8315, Accuracy: 0.9616
Epoch: 2, Loss: 31.8225, Accuracy: 0.9829
Epoch: 3, Loss: 23.6808, Accuracy: 0.9872
Epoch: 4, Loss: 20.4323, Accuracy: 0.9892
Epoch: 5, Loss: 17.2109, Accuracy: 0.9907
Test Accuracy of the network on the 10000 test images: 98.11 %


## Parameters for Student Network

Here, we define a few more parameters of the student network. In the student network, we will train with the soft targets as well the hard targets. The soft targets will be calculated by the following equation:

$$
f(z_{i}) = \frac{\exp(z_{i})}{\sum_{j}\exp(z_{j})}
$$

This results in softening out the outputs of the teacher and this can be used as hints for the student network.
<img src='images/stud_teach.png', style="width: 350px;">

The loss doesn't need to get backpropagated accross the teacher network and therefore we make the corresponding modification.

Also, for training witht he soft labels, we use mean square error loss since using a Cross Entropy loss for soft labels makes no sense.

In [16]:
temperature = 1.5
for p in teacher.parameters():
    p.requires_grad= False

student, criterion, optimizer = reset_model(is_teacher = False)
alpha = 0.6

mse_criterion = nn.MSELoss()
softmax = nn.Softmax()

print(student)

Student(
  (layer1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc1): Linear(in_features=3136, out_features=10, bias=True)
)


### Training and testing the student network

In [17]:
#Train the Model

for epoch in range(num_epochs):
    total_loss = 0
    accuracy = []
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        temp_labels = labels
        
        # Forward + Backward + Optimize
        optimizer.zero_grad()
        
        student_outputs = student(images)
        
        hard_outputs = teacher(images)
        soft_outputs = hard_outputs/ temperature
        soft_outputs = softmax(soft_outputs)
        
        hard_loss = criterion(student_outputs, labels)
        soft_loss = mse_criterion(student_outputs, soft_outputs)
        loss = alpha*hard_loss + (1-alpha)*soft_loss
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = torch.max(student_outputs.data, 1)
        correct = (predicted == temp_labels).sum().item()
        accuracy.append(correct/float(batch_size))
    
    print('Epoch: %d, Loss: %.4f, Accuracy: %.4f' %(epoch+1,total_loss, (sum(accuracy)/float(len(accuracy)))))



Epoch: 1, Loss: 360.9362, Accuracy: 0.9304
Epoch: 2, Loss: 317.3437, Accuracy: 0.9679
Epoch: 3, Loss: 308.5094, Accuracy: 0.9721
Epoch: 4, Loss: 303.7235, Accuracy: 0.9749
Epoch: 5, Loss: 300.3334, Accuracy: 0.9760


In [18]:
testing(student)

Test Accuracy of the network on the 10000 test images: 97.46 %


### Excercise

Try out the small student network on the CIFAR dataset. (Easy enough to load with the data loader!)

### References

1. https://arxiv.org/abs/1412.6550
2. https://www.cs.toronto.edu/~hinton/absps/distillation.pdf

### Please answer the questions below to complete the experiment:

In [0]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "" #@param ["Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging me", "Was Tough, but I did it", "Too Difficult for me"]


In [0]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "" #@param {type:"string"}

In [0]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "" #@param ["Yes", "No"]

In [0]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")