## Introduction 

In this assignment, you will be working with 
- Tensors
- Autograd
- Creating your First NN using PyTorch

In [None]:
from __future__ import print_function
%matplotlib inline

import torch

## Tensors

Tensors are similar to NumPy’s ndarrays, with the addition being that
Tensors can also be used on a GPU to accelerate computing.


You can think of Tensors as higher dimensional matrices

## Create a matrix

For your first task, construct a 5x3 matrix, uninitialized:

You may use torch.empty()

In [None]:
##TODO
a = torch.empty(5,3)
print(a)

Next, let's try constructing a randomly initialized matrix:

Hint: Use Torch.rand()

In [None]:
##TODO
a = torch.rand(5,3)
print(a)

If you were able to do that, this should be an easy task as well: Construct a matrix filled zeros and of dtype long:


Hint: Use torch.zeros()

In [None]:
##TODO
a = torch.zeros(5,3)
print(a)

Check the size of the Tensor

In [None]:
##TODO
print(a.size())
print(a.data.numpy().shape)

Before we move on to the operations, you might ask yourself why would you want to know all of these functions?

Recall that based on how Tensors are initialised affects training heavily. (Think Kaiming He et al, etc)

Ultimately your weights will be stored in Tensors

## Operation:

Now, lets try creating 2 tensors and adding them together.

In [None]:
##TODO
#from torch import Variable
import numpy as np
a = np.array([1,1])
b = np.array([2,2])
#numpy add
print(a+b)
print("---------")
a_tensor = torch.from_numpy(a)
b_tensor = torch.from_numpy(b)
print(a_tensor+b_tensor)

Often you would be needed to Reshape your tensors.

That can be done using view():

In [None]:
x = torch.randn(4, 4)
y = None ##TODO
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

## NumPy Bridge

Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

The Torch Tensor and NumPy array will share their underlying memory
locations (if the Torch Tensor is on CPU), and changing one will change
the other.

Converting a Torch Tensor to a NumPy Array

Let's try create a tensor in torch


In [None]:
##TODO
b = np.array([2,3,4,5])
b=b.reshape(2,2)
print(b)
c = torch.from_numpy(b)
print(c)
c = 2*c
print(c)
print(b)

Now we will convert it to numpy by calling ```.numpy()``` on it and assigning it to another variable

In [None]:
##TODO

We can do the opposite as well, let's try doing the opposite (Numpy->Torch)

Hint: Use, torch.from_numpy()

In [None]:
import numpy as np
a = np.ones(5)
##TODO: Create b as a torch tensor using a
np.add(a, 1, out=a)
print(a)
#print(b)


Autograd: Automatic Differentiation
===================================



The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Let us see this in more simple terms with some examples.

Tensor
--------

``torch.Tensor`` is the central class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all operations on it. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

To stop a tensor from tracking history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with
``requires_grad=True``, but for which we don't need the gradients.

There’s one more class which is very important for autograd
implementation - a ``Function``.

``Tensor`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor`` (except for Tensors created by the user - their
``grad_fn is None``).

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``gradient``
argument that is a tensor of matching shape.



In [None]:
import torch

Create a tensor and set ``requires_grad=True`` to track computation with it



In [None]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

Create a 4x5x1x2?

Do you think that's possible or useful?

In [None]:
##TODO Usually you cant print the tensor bc it is too big. Which is why the notation 
#was invented in the first place. tensor is notation for multidimensional matrices
#
x = torch.ones(4,5,1,2, requires_grad=True)
print(x.size())

Next, lets Do a tensor operation:

Create y by adding 2 to x

In [None]:
##TODO
y = x+2
print(y.size())

## Task 2
Can you do this with x2 as well?

In [None]:
y = x**2
#lousy choice of constant bc you dont see a difference in sum w/1. 
print(x.sum(),y.sum())

Create ``y`` using an operation, so it has a ``grad_fn``.



In [None]:

print(y.grad_fn)
y = y+3 * 5
#note how each operation adds a grad_fn or a gradient function object
print(y.size(), y.grad_fn)

Do more operations on ``y``



Multiply y with itself and assign it to z
Set the output tp the mean of z and print out the same.

In [None]:
##TODO
z = y.mean()
print(z)


``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.



Next, Initialise a as a 2x2 tensor 
- Perform a quadratic operation on a with itself 
- Enable Gradients
- Store a quadratic function of a in b
- Call the gradient of b

In [None]:
##TODO
from torch.autograd import Variable

torch.manual_seed(7)
a=torch.rand(2,2, requires_grad=True)
print(a)
print(a.requires_grad)
b = (a*a).sum()
print(b,b.requires_grad)



To call backprop, you need to call ```var.backward()```

Let's backprop now.
Because ``out`` contains a single scalar, ``out.backward()`` is
equivalent to ``out.backward(torch.tensor(1.))``.



In [None]:
##TODO
b.backward()

In [None]:
print(b.grad_fn)
print(b.grad)

In [None]:
#leaf nodes 


## Neural Networks

Neural networks can be constructed using the ``torch.nn`` package.

Now that you had a glimpse of ``autograd``, ``nn`` depends on
``autograd`` to define models and differentiate them.
An ``nn.Module`` contains layers, and a method ``forward(input)``\ that
returns the ``output``.


It is a simple feed-forward network. It takes the input, feeds it
through several layers one after the other, and then finally gives the
output.

A typical training procedure for a neural network is as follows:

- Define the neural network that has some learnable parameters (or
  weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
  ``weight = weight - learning_rate * gradient``
  


In [8]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # TODO: Define 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=6,out_channels=16,kernel_size=5,stride=1,padding=1)
        self.fc1 = nn.Linear(in_features=400, out_features=120, bias=True)
        self.fc2 = nn.Linear(in_features=120, out_features=84, bias=True)
        self.fc3 = nn.Linear(in_features=84, out_features=10, bias=True)
    

    def forward(self, x):
        # TODO  Max pooling over a (2, 2) window
        print("x size:",x.size())
        x = self.conv1(x)
        print("after first conv:",x.size())
        x = F.relu(x)
        print("first relu:",x.size())
        x = F.max_pool2d(x,2,2)
        print("first maxpool",x.size())
        x = self.conv2(x)
        print("second conv:",x.size())
        x = F.max_pool2d(x,2,2)
        print("second maxpool:",x.size())
        x = F.relu(x)
        print("second relu:",x.size())
        # TODO: Pass it the fc layers through relu and pass out the linear matrix at the end
        x = x.view(x.size(0), -1)
        print("after reshape:",x.size())
        x = self.fc1(x)
        print("after fc1:",x.size())
        x = F.relu(x)
        print("third relu:",x.size())
        x = self.fc2(x)
        print("after fc2:",x.size())
        x = F.relu(x)
        print("fourth relu:",x.size())
        x = self.fc3(x)
        print("fc3:",x.size())
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


## TODO: Create an instance of the net 
net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


You just have to define the ``forward`` function, and the ``backward``
function (where gradients are computed) is automatically defined for you
using ``autograd``.
You can use any of the Tensor operations in the ``forward`` function.

The learnable parameters of a model are returned by ``net.parameters()``



In [9]:
##TODO assign the list of learnable parameters params to a variable 
#not a list, a generator
for param in net.parameters():
    print(type(param.data), param.size())


#he wants a lst for len to work
params= list(net.parameters())
print(type(params))
print(len(params))
print(params[0].size())  # conv1's .weight

<class 'torch.Tensor'> torch.Size([6, 1, 5, 5])
<class 'torch.Tensor'> torch.Size([6])
<class 'torch.Tensor'> torch.Size([16, 6, 5, 5])
<class 'torch.Tensor'> torch.Size([16])
<class 'torch.Tensor'> torch.Size([120, 400])
<class 'torch.Tensor'> torch.Size([120])
<class 'torch.Tensor'> torch.Size([84, 120])
<class 'torch.Tensor'> torch.Size([84])
<class 'torch.Tensor'> torch.Size([10, 84])
<class 'torch.Tensor'> torch.Size([10])
<class 'list'>
10
torch.Size([6, 1, 5, 5])


Let try a random 32x32 input.
Note: expected input size of this net (LeNet) is 32x32. To use this net on
MNIST dataset, please resize the images from the dataset to 32x32.



In [13]:
## TODO: create random input
## TODO: pass it through the net
input_test = torch.randn(1,1,28,28)
out= net(input_test)
print(out)

x size: torch.Size([1, 1, 28, 28])
after first conv: torch.Size([1, 6, 26, 26])
first relu: torch.Size([1, 6, 26, 26])
first maxpool torch.Size([1, 6, 13, 13])
second conv: torch.Size([1, 16, 11, 11])
second maxpool: torch.Size([1, 16, 5, 5])
second relu: torch.Size([1, 16, 5, 5])
after reshape: torch.Size([1, 400])
after fc1: torch.Size([1, 120])
third relu: torch.Size([1, 120])
after fc2: torch.Size([1, 84])
fourth relu: torch.Size([1, 84])
fc3: torch.Size([1, 10])
tensor([[ 0.0830,  0.0102,  0.1306,  0.0311,  0.0789, -0.0258,  0.0200,  0.0008,
         -0.1391,  0.0872]], grad_fn=<AddmmBackward>)


Zero the gradient buffers of all parameters and backprops with random
gradients:



In [14]:
net.zero_grad()
out.backward(torch.randn(1, 10))

In [15]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
## TODO: Define the criterion as MSELoss()


loss = criterion(output, target)
print(loss)

AttributeError: 'function' object has no attribute 'size'

Backprop
--------
To backpropagate the error all we have to do is to ``loss.backward()``.
You need to clear the existing gradients though, else gradients will be
accumulated to existing gradients.


Now we shall call ``loss.backward()``, and have a look at conv1's bias
gradients before and after the backward.



In [None]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

##TODO: Call loss.backward()


print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)


Update the weights
------------------
The simplest update rule used in practice is the Stochastic Gradient
Descent (SGD):

     ``weight = weight - learning_rate * gradient``

We can implement this using simple python code:


    learning_rate = 0.01
    for f in net.parameters():
        f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various different
update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.
To enable this, we built a small package: ``torch.optim`` that
implements all these methods. Using it is very simple:



In [None]:
import torch.optim as optim

# TODO: create your optimizer

# TODO: in your training loop:
# TODO: zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

In [4]:
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = self.conv1(x)
        print("after conv1:",x.size())
        x = F.relu(x)
        print("after first relu:",x.size())
        x = F.max_pool2d(x, 2, 2)
        print("after first maxpool:",x.size())
        x = self.conv2(x)
        print("after second conv:",x.size())
        x = F.relu(x)
        print("after second relu",x.size())
        x = F.max_pool2d(x, 2, 2)
        print("after second maxpool:",x.size())
        x = x.view(-1, 4*4*50)
        print("after reshape:",x.size())
        x = self.fc1(x)
        print("after fc1:",x.size())
        x = F.relu(x)
        print("after relu:",x.size())
        x = self.fc2(x)
        print("after second connected layer")
        x = F.log_softmax(x, dim=1)
        print("after log_softmax:",x.size())
        return x
    
def train(args, model, device, train_loader, optimizer, epoch):
    print("a")
    model.train()
    print("b")
    print(type(data))
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

def test(args, model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

def main():
    # Training settings
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batch-size', type=int, default=64, metavar='N',
                        help='input batch size for training (default: 64)')
    parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs', type=int, default=10, metavar='N',
                        help='number of epochs to train (default: 10)')
    parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
                        help='learning rate (default: 0.01)')
    parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
                        help='SGD momentum (default: 0.5)')
    parser.add_argument('--no-cuda', action='store_true', default=False,
                        help='disables CUDA training')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument('--log-interval', type=int, default=10, metavar='N',
                        help='how many batches to wait before logging training status')
    
    parser.add_argument('--save-model', action='store_true', default=False,
                        help='For Saving the current Model')
    args = parser.parse_args()
    use_cuda = not args.no_cuda and torch.cuda.is_available()

    torch.manual_seed(args.seed)

    device = torch.device("cuda" if use_cuda else "cpu")

    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=args.batch_size, shuffle=True, **kwargs)
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=False, transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=args.test_batch_size, shuffle=True, **kwargs)


    model = Net().to(device)
    optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(args, model, device, test_loader)

    if (args.save_model):
        torch.save(model.state_dict(),"mnist_cnn.pt")

In [5]:
model = Net()
train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])))

In [6]:
import sys
for batch_idx, (data, target) in enumerate(train_loader):
        #data, target = data.to(device), target.to(device)
        print(type(data), type(target))
        print("data,target size:",data.size(),target.size())
        #print(data, target)
        #break;
        #optimizer.zero_grad()
        output = model(data)
        print(type(output),output.size())
        break
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

<class 'torch.Tensor'> <class 'torch.Tensor'>
data,target size: torch.Size([1, 1, 28, 28]) torch.Size([1])
after conv1: torch.Size([1, 20, 24, 24])
after first relu: torch.Size([1, 20, 24, 24])
after first maxpool: torch.Size([1, 20, 12, 12])
after second conv: torch.Size([1, 50, 8, 8])
after second relu torch.Size([1, 50, 8, 8])
after second maxpool: torch.Size([1, 50, 4, 4])
after reshape: torch.Size([1, 800])
after fc1: torch.Size([1, 500])
after relu: torch.Size([1, 500])
after second connected layer
after log_softmax: torch.Size([1, 10])
<class 'torch.Tensor'> torch.Size([1, 10])


In [None]:
model = Net()
x = torch.rand(1,1,32,32)
print(x.size())
output = model(x)
#model.train()