<a href="https://colab.research.google.com/github/anonymized30/FFL/blob/main/pytorch_mnist.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![AnalyticsDojo](https://github.com/rpi-techfundamentals/fall2018-materials/blob/master/fig/final-logo.png?raw=1)](http://rpi.analyticsdojo.com)
<center><h1>Pytorch with the MNIST Dataset - MINST</h1></center>
<center><h3><a href = 'http://rpi.analyticsdojo.com'>rpi.analyticsdojo.com</a></h3></center>


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rpi-techfundamentals/fall2018-materials/blob/master/10-deep-learning/04-pytorch-mnist.ipynb)



From Kaggle: 
"MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike."

[Read more.](https://www.kaggle.com/c/digit-recognizer)


<a title="By Josef Steppan [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:MnistExamples.png"><img width="512" alt="MnistExamples" src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png"/></a>

This code is adopted from the pytorch examples repository. 
It is licensed under BSD 3-Clause "New" or "Revised" License.
Source: https://github.com/pytorch/examples/
LICENSE: https://github.com/pytorch/examples/blob/master/LICENSE

![](https://github.com/rpi-techfundamentals/fall2018-materials/blob/master/10-deep-learning/mnist-comparison.png?raw=1)
Table from [Wikipedia](https://en.wikipedia.org/wiki/MNIST_database)

In [1]:
!pip install torch torchvision



### Pytorch Advantages vs Tensorflow
- Pytorch Enables dynamic computational graphs (which change be changed) while Tensorflow is static. 
- Tensorflow enables easier deployment. 

In [119]:
#Import Libraries


from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from sklearn.decomposition import PCA
import numpy as np
import copy


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
args={}
kwargs={}
args['batch_size']=1000
args['test_batch_size']=1000
args['epochs']=10  #The number of Epochs is the number of times you go through the full dataset. 
args['lr']=0.01 #Learning rate is how fast it will decend. 
args['momentum']=0.5 #SGD momentum (default: 0.5) Momentum is a moving average of our gradients (helps to keep direction).

args['seed']=1 #random seed
args['log_interval']=10
args['cuda']=False


In [5]:
#load the data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['batch_size'], shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['test_batch_size'], shuffle=True, **kwargs)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [6]:


class Net(nn.Module):
    #This defines the structure of the NN.
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()  #Dropout
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        #Convolutional Layer/Pooling Layer/Activation
        x = F.relu(F.max_pool2d(self.conv1(x), 2)) 
        #Convolutional Layer/Dropout/Pooling Layer/Activation
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        #Fully Connected Layer/Activation
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        #Fully Connected Layer/Activation
        x = self.fc2(x)
        #Softmax gets probabilities. 
        return F.log_softmax(x, dim=1)


In [11]:

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        #Variables in Pytorch are differenciable. 
        data, target = Variable(data), Variable(target)
        #This will zero out the gradients for this batch. 
        optimizer.zero_grad()
        output = model(data)
        # Calculate the loss The negative log likelihood loss. It is useful to train a classification problem with C classes.
        loss = F.nll_loss(output, target)
        #dloss/dx for every Variable 
        loss.backward()
        #to do a one-step update on our parameter.
        optimizer.step()
        #Print out the loss periodically. 
        if batch_idx % args['log_interval'] == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        test_loss += F.nll_loss(output, target, size_average=False).item() # sum up batch loss
        pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))




In [145]:
model = Net()
if args['cuda']:
    model.cuda()

optimizer = optim.SGD(model.parameters(), lr=args['lr'], momentum=args['momentum'])



In [146]:
for epoch in range(1, 100 ):
    train(epoch)
    test()






Test set: Average loss: 2.1108, Accuracy: 4979/10000 (50%)


Test set: Average loss: 1.3560, Accuracy: 7389/10000 (74%)


Test set: Average loss: 0.6834, Accuracy: 8512/10000 (85%)


Test set: Average loss: 0.4761, Accuracy: 8820/10000 (88%)


Test set: Average loss: 0.3869, Accuracy: 8994/10000 (90%)


Test set: Average loss: 0.3333, Accuracy: 9103/10000 (91%)


Test set: Average loss: 0.2980, Accuracy: 9180/10000 (92%)


Test set: Average loss: 0.2740, Accuracy: 9199/10000 (92%)


Test set: Average loss: 0.2529, Accuracy: 9275/10000 (93%)


Test set: Average loss: 0.2352, Accuracy: 9312/10000 (93%)


Test set: Average loss: 0.2220, Accuracy: 9360/10000 (94%)


Test set: Average loss: 0.2113, Accuracy: 9367/10000 (94%)


Test set: Average loss: 0.1998, Accuracy: 9417/10000 (94%)


Test set: Average loss: 0.1937, Accuracy: 9411/10000 (94%)


Test set: Average loss: 0.1842, Accuracy: 9438/10000 (94%)


Test set: Average loss: 0.1763, Accuracy: 9463/10000 (95%)


Test set: Average loss:

In [118]:
test()




Test set: Average loss: 0.2539, Accuracy: 9302/10000 (93%)



In [147]:
temp = copy.deepcopy(model)

In [148]:
def compress_and_reconstruct(x):
  n = int(np.sqrt(min(x.shape)) + np.log(min(x.shape)) + 1)
  n = n if n > 1 else 1
  f = 0
  pca = PCA(n_components=n)
  pca.fit(x)

  X_pca = pca.transform(x)
  X_projected = pca.inverse_transform(X_pca)
  if f == 1:
    X_projected = X_projected[0]
  return X_projected

In [149]:
model = copy.deepcopy(temp)
update = model.state_dict()
for layer in update.keys():
  if len(update[layer].shape)>=2:
    x = update[layer].detach().numpy()
    shape = x.shape
    x = x.reshape(shape[0], -1)
    x_rec = compress_and_reconstruct(x)
    x = x_rec.reshape(shape)
    x = torch.from_numpy(x)
    update[layer] = x

In [150]:
model.load_state_dict(update)

<All keys matched successfully>

In [151]:
test()




Test set: Average loss: 0.1561, Accuracy: 9495/10000 (95%)



In [152]:
def flatten_updates(updates, layers = None):
    m = len(updates)
    flattened_updates = [[] for i in range(m)]
    if layers is not None:
        layers = layers
    else:
        layers = updates[0].keys()

    for i in range(m):
        for layer in layers:
            x = updates[i][layer].cpu().squeeze().numpy().flatten()
            flattened_updates[i]+= list(x)
    return flattened_updates

In [155]:
m1 = flatten_updates([temp.state_dict()])[0]
m2 = flatten_updates([model.state_dict()])[0]

In [158]:
n1 = np.linalg.norm(m1)

In [159]:
n2 = np.linalg.norm(m2)

In [160]:
np.dot(m1, m2)/(n1*n2)

0.88437915

In [161]:
np.arccos(0.88437915) * 180/np.pi

27.824784139809633