## Name: Saif Ur Rahman
## Roll Number: 2022-10-0001

#Structured Pruning via Gates
In this assignment, we will implement filter pruning via gates. [relevant paper](https://arxiv.org/pdf/2010.02623.pdf) 

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torchvision import models
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch.optim as optim
import os
from torch.autograd import Variable
import tqdm
from torchsummary import summary

batch_size = 100
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=100, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=100, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

cuda
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


#Create Gates Class

2.1 In this cell you will create a gates class. This class should initiate a 1-D tensor weight of length : 'size'. Initialization value of all gates should be 1. In forward method reshape this weight tensor to (1, size, 1, 1) so that it can be broadcasted to (batch, depth, width, height) and then multiply multiply it with input 'x' and return the product.

In [2]:
class Gates(nn.Module):
    def __init__(self, size):
        super().__init__()
        self.size = size
        self.weight = torch.nn.Parameter(torch.ones((self.size,), device="cuda"))

    def forward(self, x):
        temp = self.weight[None, :, None, None]
        result =  torch.mul(temp, x)
        return result

#Place Gates in the network
2.2 In this cell you need to place gate layers infront of conv layers during network initialization. Pass number of filters in the conv to gates class to initiate gates. 

In [3]:
cfg = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


class VGG(nn.Module):
    def __init__(self, vgg_name):
        super(VGG, self).__init__()
        self.features = self._make_layers(cfg[vgg_name])
        self.classifier = nn.Linear(512, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                # create a Gates() class instance
                gates_instance = Gates(x)
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           gates_instance, # initialized gates instance 
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

net = VGG('VGG16')
net = net.to(device)

The summery of network should look something like this

In [4]:
from torchsummary import summary
summary(net, (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 32, 32]           1,792
             Gates-2           [-1, 64, 32, 32]              64
       BatchNorm2d-3           [-1, 64, 32, 32]             128
              ReLU-4           [-1, 64, 32, 32]               0
            Conv2d-5           [-1, 64, 32, 32]          36,928
             Gates-6           [-1, 64, 32, 32]              64
       BatchNorm2d-7           [-1, 64, 32, 32]             128
              ReLU-8           [-1, 64, 32, 32]               0
         MaxPool2d-9           [-1, 64, 16, 16]               0
           Conv2d-10          [-1, 128, 16, 16]          73,856
            Gates-11          [-1, 128, 16, 16]             128
      BatchNorm2d-12          [-1, 128, 16, 16]             256
             ReLU-13          [-1, 128, 16, 16]               0
           Conv2d-14          [-1, 128,

In [None]:
from torchsummary import summary
summary(net, (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 32, 32]           1,792
             Gates-2           [-1, 64, 32, 32]              64
       BatchNorm2d-3           [-1, 64, 32, 32]             128
              ReLU-4           [-1, 64, 32, 32]               0
            Conv2d-5           [-1, 64, 32, 32]          36,928
             Gates-6           [-1, 64, 32, 32]              64
       BatchNorm2d-7           [-1, 64, 32, 32]             128
              ReLU-8           [-1, 64, 32, 32]               0
         MaxPool2d-9           [-1, 64, 16, 16]               0
           Conv2d-10          [-1, 128, 16, 16]          73,856
            Gates-11          [-1, 128, 16, 16]             128
      BatchNorm2d-12          [-1, 128, 16, 16]             256
             ReLU-13          [-1, 128, 16, 16]               0
           Conv2d-14          [-1, 128,

#Freez Gates
2.3 In this cell you will traverse network and make gates non-trainable.  

In [15]:
# first list the layers and later hardcode gates to make non-trainable 
for i, m in enumerate(net.modules()):
  print(i, '->', m)
  print(m.parameters())

0 -> VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Gates()
    (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): ReLU(inplace=True)
    (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): Gates()
    (6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): ReLU(inplace=True)
    (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (9): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (10): Gates()
    (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (12): ReLU(inplace=True)
    (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (14): Gates()
    (15): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (16): ReLU(inplace=True)
    (17): MaxPool2d(kernel_size=2

In [16]:
# This is found by printing the model summary abouve using the s1.module() and then finding the index of convolutional layers
index_of_gates_layers = [3,7,12,16,21,25,29,34,38,42,47,51,55]

for i, m in enumerate(net.modules()):
    # only update if the index matches index of a Gates() layer
    if i in index_of_gates_layers:
      for param in m.parameters():
        param.requires_grad = False
net = net.to(device)

In [7]:
# new model summary to show gates parameters have been frozen
summary(net, (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 32, 32]           1,792
             Gates-2           [-1, 64, 32, 32]              64
       BatchNorm2d-3           [-1, 64, 32, 32]             128
              ReLU-4           [-1, 64, 32, 32]               0
            Conv2d-5           [-1, 64, 32, 32]          36,928
             Gates-6           [-1, 64, 32, 32]              64
       BatchNorm2d-7           [-1, 64, 32, 32]             128
              ReLU-8           [-1, 64, 32, 32]               0
         MaxPool2d-9           [-1, 64, 16, 16]               0
           Conv2d-10          [-1, 128, 16, 16]          73,856
            Gates-11          [-1, 128, 16, 16]             128
      BatchNorm2d-12          [-1, 128, 16, 16]             256
             ReLU-13          [-1, 128, 16, 16]               0
           Conv2d-14          [-1, 128,

#Train the network
This section is not graded

In [8]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

In [9]:
def train(epoch):
    print('\nEpoch: %d' % (epoch+1))
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)
        inputs = Variable(inputs, requires_grad=False)
        targets = Variable(targets)
        net.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
        if(batch_idx % 200 == 0):
          print("Accuracy : ",100.*correct/total," Loss : ", train_loss/(batch_idx+1))
def test(epoch):
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            if(batch_idx % 20 == 0):
              print("Accuracy : ",100.*correct/total," Loss : ", test_loss/(batch_idx+1))

In [None]:
start_epoch = 0
best_acc = 0
for epoch in range(start_epoch, start_epoch+100):
    train(epoch)
    test(epoch)


Epoch: 1
Accuracy :  16.0  Loss :  2.4564666748046875
Accuracy :  41.84577114427861  Loss :  1.5765447747054977
Accuracy :  48.428927680798004  Loss :  1.4077702582328397
Accuracy :  57.0  Loss :  1.1009331941604614
Accuracy :  56.76190476190476  Loss :  1.1796199565842038
Accuracy :  58.0  Loss :  1.1869256583655752
Accuracy :  57.91803278688525  Loss :  1.1912383749836781
Accuracy :  58.19753086419753  Loss :  1.191373849356616

Epoch: 2
Accuracy :  65.0  Loss :  0.9885735511779785
Accuracy :  64.95522388059702  Loss :  0.988233048227889
Accuracy :  66.68827930174564  Loss :  0.9433682168511084
Accuracy :  66.0  Loss :  1.0127038955688477
Accuracy :  66.23809523809524  Loss :  1.0186041196187336
Accuracy :  65.82926829268293  Loss :  1.01512706570509
Accuracy :  65.72131147540983  Loss :  1.0249059112345587
Accuracy :  65.9753086419753  Loss :  1.0140491172119424

Epoch: 3
Accuracy :  77.0  Loss :  0.7989439368247986
Accuracy :  72.8407960199005  Loss :  0.7832491132453899
Accuracy 

KeyboardInterrupt: ignored

In [10]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
# save model parameters
PATH = "/content/gdrive/MyDrive/Deep Learning/A5/Part2_net_gates1"
torch.save(net.state_dict(), PATH)

In [None]:
# load the model
net = VGG('VGG16')
net = net.to(device)
net.load_state_dict(torch.load(PATH))

<All keys matched successfully>

#Theoratical pruning
2.4 In this cell you will analyze network Conv layer weights, rank them based on absolute weight sums of filters and turn off 30% of lower-ranked filters by changing the corresponding gate values from one to zero. This will virtually prune corresponding filters from the network because the outputs of these filters would be zero and they would not contribute to the output decision. 

In [None]:
# list all the layers of the feature attribute in init() of the neural network class definition
net.features

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): Gates()
  (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): ReLU(inplace=True)
  (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): Gates()
  (6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): ReLU(inplace=True)
  (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (9): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (10): Gates()
  (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (12): ReLU(inplace=True)
  (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (14): Gates()
  (15): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (16): ReLU(inplace=True)
  (17): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (18): 

In [None]:
def modify_gate_parameter(index_conv2d, turn_off_ratio):
  # take sum of filters
  a = torch.sum(net.features[index_conv2d].weight, dim=[1,2,3])
  # take the absolute value
  b = torch.abs(a)

  # find index of 30% lower weights
  index_list = b.cpu().detach().numpy().argsort()
  lower_index_list = index_list[ : int(np.ceil(turn_off_ratio*len(index_list))) ]

  # modify gate parameters
  for i in range(len(lower_index_list)):
      # print(index_conv2d+1)
      # print(net.features[index_conv2d+1])
      net.features[index_conv2d+1].weight[lower_index_list[i]]=0

  # print the new weights
  # print(net.features[index_conv2d+1].weight)

In [None]:
index_conv2d_list = [0,4,9,13,18,22,26,31,35,39,44,48,52]
# Modify gates parameters
for i in range(len(index_conv2d_list)):
  modify_gate_parameter(index_conv2d_list[i], 0.3)

Parameter containing:
tensor([0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1.,
        0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 1., 1., 0.,
        0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 0., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 0., 0.], device='cuda:0')
Parameter containing:
tensor([1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1., 1.,
        1., 0., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 0., 1., 1., 1., 1., 1., 0., 1., 0.], device='cuda:0')
Parameter containing:
tensor([1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 1., 0., 0., 1.,
        0., 1., 1., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0.,
        1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 0., 1., 0., 0.,
        0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 1., 0., 1., 0., 1.,
  

#Finetune
Finetune the network for a few epochs (not graded)

In [17]:
def train(epoch):
    print('\nEpoch: %d' % (epoch+1))
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)
        inputs = Variable(inputs, requires_grad=False)
        targets = Variable(targets)
        
        net.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
        if(batch_idx % 200 == 0):
          print("Accuracy : ",100.*correct/total," Loss : ", train_loss/(batch_idx+1))
def test(epoch):
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            if(batch_idx % 20 == 0):
              print("Accuracy : ",100.*correct/total," Loss : ", test_loss/(batch_idx+1))

In [None]:
start_epoch = 0
best_acc = 0
for epoch in range(start_epoch, start_epoch+10):
    train(epoch)
    test(epoch)


Epoch: 1
Accuracy :  45.0  Loss :  2.2768611907958984
Accuracy :  48.39303482587065  Loss :  2.022524858588603
Accuracy :  48.1072319201995  Loss :  2.0293892063107575
Accuracy :  41.0  Loss :  2.5864901542663574
Accuracy :  47.04761904761905  Loss :  2.157105264209566
Accuracy :  47.4390243902439  Loss :  2.1351999974832303
Accuracy :  47.114754098360656  Loss :  2.130868278565954
Accuracy :  47.30864197530864  Loss :  2.1229199583147778

Epoch: 2
Accuracy :  44.0  Loss :  2.327136754989624
Accuracy :  48.149253731343286  Loss :  2.011521842349228
Accuracy :  47.96508728179551  Loss :  2.029207182941294
Accuracy :  42.0  Loss :  2.5634584426879883
Accuracy :  47.76190476190476  Loss :  2.142910463469369
Accuracy :  47.97560975609756  Loss :  2.1244728216310826
Accuracy :  47.59016393442623  Loss :  2.12055097447067
Accuracy :  47.876543209876544  Loss :  2.1121511694825728

Epoch: 3
Accuracy :  51.0  Loss :  1.8707232475280762
Accuracy :  47.82587064676617  Loss :  2.0381989390102784

In [None]:
# save model parameters
PATH = "/content/gdrive/MyDrive/Deep Learning/A5/Part2_net_gates2"
torch.save(net.state_dict(), PATH)

In [14]:
# load the model
net = VGG('VGG16')
PATH = "/content/gdrive/MyDrive/Deep Learning/A5/Part2_net_gates2"
net = net.to(device)
net.load_state_dict(torch.load(PATH))

<All keys matched successfully>

#### Train model for more epochs

In [18]:
start_epoch = 10
best_acc = 0
for epoch in range(start_epoch, start_epoch+60):
    train(epoch)
    test(epoch)


Epoch: 11
Accuracy :  48.0  Loss :  1.8109130859375
Accuracy :  48.233830845771145  Loss :  2.037980154972171
Accuracy :  48.45635910224439  Loss :  2.0158744128862223
Accuracy :  41.0  Loss :  2.576968193054199
Accuracy :  47.285714285714285  Loss :  2.159459795270647
Accuracy :  47.609756097560975  Loss :  2.1370080738532833
Accuracy :  47.32786885245902  Loss :  2.1316490779157546
Accuracy :  47.51851851851852  Loss :  2.1237504614724054

Epoch: 12
Accuracy :  53.0  Loss :  1.785770058631897
Accuracy :  48.33830845771144  Loss :  1.9992235163551064
Accuracy :  47.95511221945137  Loss :  2.024052731116811
Accuracy :  44.0  Loss :  2.5647263526916504
Accuracy :  47.57142857142857  Loss :  2.152383327484131
Accuracy :  47.73170731707317  Loss :  2.13118271711396
Accuracy :  47.47540983606557  Loss :  2.126424623317406
Accuracy :  47.72839506172839  Loss :  2.117811158851341

Epoch: 13
Accuracy :  45.0  Loss :  2.0962471961975098
Accuracy :  47.343283582089555  Loss :  2.06669419855620

KeyboardInterrupt: ignored

#Itrative Pruning
2.4 In this section, you will reinitiate a new network with gates by running corresponding cells. You will do pruning of 10% of the lowest-ranked remaining  weights from each layer every 10th epoch until 60% of the network is pruned.

In [11]:
start_epoch = 0
best_acc = 0

turn_off_ratio = 0.1
gates_turned_off_index = {0:[],4:[],9:[],13:[],18:[],22:[],26:[],31:[],35:[],39:[],44:[],48:[],52:[] }

for epoch in range(start_epoch, start_epoch + 70):

    train(epoch)
    test(epoch)
  
    if(epoch % 10 == 0):

      # save model parameters
      PATH = "/content/gdrive/MyDrive/Deep Learning/A5/Part2_net_iterative_pruning"
      torch.save(net.state_dict(), PATH)

      if(epoch < 65 and epoch > 0):

          # code here
        
          index_conv2d_list = [0,4,9,13,18,22,26,31,35,39,44,48,52]
          # Modify gates parameters
      
          for i in range(len(index_conv2d_list)):
              
              # take sum of filters
              a = torch.sum(net.features[index_conv2d_list[i]].weight, dim=[1,2,3])
              
              # take the absolute value
              b = torch.abs(a)
              
              # find index of 10% lower remaining weights weights
              
              index_list = list(b.cpu().detach().numpy().argsort()) # sort according to weights

              # print("epoch", epoch, "index list:", index_list)
              # now remove lower weights
              for j in range(len(gates_turned_off_index[index_conv2d_list[i]])):
                  index_list.remove(gates_turned_off_index[index_conv2d_list[i]][j]) 
              # print("index list after removal:", index_list)

              lower_index_list = index_list[ : int(np.ceil(turn_off_ratio*len(index_list)) ) ] # make list with lowest absolute sum
              
              # extend
              gates_turned_off_index[index_conv2d_list[i]].extend(lower_index_list)

              # modify gate parameters
              for j in range(len(lower_index_list)):
                  gates_index = index_conv2d_list[i]+1
                  net.features[gates_index].weight[lower_index_list[j]] = 0


Epoch: 1
Accuracy :  14.0  Loss :  2.5531439781188965
Accuracy :  41.60696517412935  Loss :  1.5656945758791112
Accuracy :  48.63840399002494  Loss :  1.3940493632433124
Accuracy :  67.0  Loss :  0.895799994468689
Accuracy :  63.666666666666664  Loss :  1.0323497709773837
Accuracy :  63.24390243902439  Loss :  1.0334583535427
Accuracy :  63.63934426229508  Loss :  1.0256241862891151
Accuracy :  63.53086419753087  Loss :  1.0313786415406216

Epoch: 2
Accuracy :  62.0  Loss :  0.9716184139251709
Accuracy :  65.17910447761194  Loss :  0.9680414908560947
Accuracy :  66.46384039900249  Loss :  0.9398047855667343
Accuracy :  72.0  Loss :  0.7599804401397705
Accuracy :  68.47619047619048  Loss :  0.9437965438479469
Accuracy :  68.41463414634147  Loss :  0.9543773241159392
Accuracy :  68.54098360655738  Loss :  0.9556283130020392
Accuracy :  68.62962962962963  Loss :  0.9451019454885412

Epoch: 3
Accuracy :  60.0  Loss :  1.0269616842269897
Accuracy :  71.98507462686567  Loss :  0.79673240671

#### Print the final weights to show pruning occured

In [13]:
index_conv2d_list = [0,4,9,13,18,22,26,31,35,39,44,48,52]
# Modify gates parameters
for i in range(len(index_conv2d_list)):
  print(net.features[index_conv2d_list[i]+1].weight)

Parameter containing:
tensor([0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1.,
        1., 0., 0., 1., 1., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 1.,
        0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1.,
        0., 0., 1., 1., 1., 1., 1., 0., 1., 0.], device='cuda:0')
Parameter containing:
tensor([0., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0.,
        1., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 1., 1.,
        1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0., 1.,
        1., 1., 1., 1., 0., 1., 0., 1., 1., 0.], device='cuda:0')
Parameter containing:
tensor([0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 0., 1., 1., 0., 0.,
        1., 1., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0.,
        1., 1., 0., 0., 1., 0., 1., 1., 0., 1., 0., 0., 1., 1., 1., 0., 1., 0.,
        0., 1., 1., 1., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0.,
  

#Activations for ranking filters
2.5 So you implemented one matric for ranking of the filters that is megnitude of there weights. In this step you need to do itrative pruning but this time you will rank filters from the layer based on average of there activations. You will do pruning of 10% of the lowest-ranked remaining  weights from each layer every 10th epoch until 60% of the network is pruned. To get the activations you this link is usefull 
[link](https://web.stanford.edu/~nanbhas/blog/forward-hooks-pytorch/), use the forward hooks method to do this. 

In [None]:
def getActivation(name):
    # the hook signature
    activation = {}
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

In [None]:
# use forward hook's method

start_epoch = 0
best_acc = 0

turn_off_ratio = 0.1
gates_turned_off_index = {0:[],4:[],9:[],13:[],18:[],22:[],26:[],31:[],35:[],39:[],44:[],48:[],52:[] }

for epoch in range(start_epoch, start_epoch + 70):

    train(epoch)
    test(epoch)
  
    if(epoch % 10 == 0):

      # save model parameters
      PATH = "/content/gdrive/MyDrive/Deep Learning/A5/Part2_net_iterative_pruning"
      torch.save(net.state_dict(), PATH)

      if(epoch < 65 and epoch > 0):

          # code here
        
          index_conv2d_list = [0,4,9,13,18,22,26,31,35,39,44,48,52]
          # Modify gates parameters
      
          for i in range(len(index_conv2d_list)):
              
              # take sum of filters
              a = torch.sum(net.features[index_conv2d_list[i]].weight, dim=[1,2,3])
              
              # take the absolute value
              b = torch.abs(a)
              
              # find index of 10% lower remaining weights weights
              
              index_list = list(b.cpu().detach().numpy().argsort()) # sort according to weights

              # print("epoch", epoch, "index list:", index_list)
              # now remove lower weights
              for j in range(len(gates_turned_off_index[index_conv2d_list[i]])):
                  index_list.remove(gates_turned_off_index[index_conv2d_list[i]][j]) 
              # print("index list after removal:", index_list)

              lower_index_list = index_list[ : int(np.ceil(turn_off_ratio*len(index_list)) ) ] # make list with lowest absolute sum
              
              # extend
              gates_turned_off_index[index_conv2d_list[i]].extend(lower_index_list)

              # modify gate parameters
              for j in range(len(lower_index_list)):
                  gates_index = index_conv2d_list[i]+1
                  net.features[gates_index].weight[lower_index_list[j]] = 0

#Online filter pruning (Bonus)

Till now we did offline pruning but that is an iterative process where we iteratively try to put some mathematical boundaries around the network and then the network tries to fit into these boundaries. Online pruning on the other hand makes to the process of pruning a part of network convergence itself. 

In this section, we will make the gates learnable and apply L1 norm to them. After training threashold on gates value and remove part of network globally in one go. fine tune once. Example Threashould 0.0005. 

In [None]:
hello