Now we have a bit of fun. We see that `torch` has trainable tensors like `tensorflow` but is flexible like `numpy`. Now we combine these traits to build a basic feedforward neural network.

For this, we use the `torch.nn` package.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

![lecun mnist architecture](https://pytorch.org/tutorials/_images/mnist.png)
We will build this thing.

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__() # initializes from the superclass
        ## our net will be as in the image above
        self.conv1 = nn.Conv2d(1, 6, 5) # arguments are 1 input channel, 6 output channels, 5x5 kernel size
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # forward propagation
        # note that in the architecture we have subsampling layers. this is implemented as 2d maxpools
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2)) #2x2 max pooling layer
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
        # now we're at the fc layers
        # here, x is a 2d image, but we gotta flatten it for input into the fc layers
        x = x.view(-1, self.flatten(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def flatten(self, x):
        shape = x.size()[1:]
        dims = 1
        for d in shape:
            dims *= d
        return dims

In [3]:
model = Net()
print(model)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


Note that we didn't define a backpropagation function. That's because `torch` does it for you with its `autograd` magic.

We can rip out of the parameters for the network using `.parameters()`.

In [4]:
params = list(model.parameters())
print(len(params))
print(params[0].size()) # params for first convolutional layer

10
torch.Size([6, 1, 5, 5])


In [5]:
input = torch.randn(1, 1, 32, 32)
out = model(input) # syntactic sugar for model.forward(input)
print(out)

tensor([[-0.1150,  0.1102, -0.0547,  0.0554, -0.0507, -0.0807,  0.0716, -0.0418,
          0.0664,  0.0718]], grad_fn=<AddmmBackward>)


**Note:** Why did we take a tensor of dimension $(1,1,32,32)$ as input, as opposed to just $(1,32,32)$? `torch.nn` only takes in *minibatches* of data, so the first value refers to how many samples there are.

To actually train this, we need some objective/loss function to optimize.

In [6]:
target = torch.randn(10)
target = target.view(1,-1)
criterion = nn.MSELoss()

loss = criterion(out, target)
print(loss)

tensor(0.9534, grad_fn=<MseLossBackward>)


Now to backprop. We first clear out all the gradient buffers.

In [7]:
model.zero_grad()

In [8]:
print('conv1.bias.grad before backward')
print(model.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(model.conv1.bias.grad)

conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([ 0.0066, -0.0103, -0.0038,  0.0122, -0.0099,  0.0366])


This loads up all the gradients coming from the loss function. Now we can act upon these gradients to do SGD. The thing is that we don't need to rely on custom built optimizers like Adam or RMSProp. We just do it ourselves, with the gradients and parameters in full view.

In [None]:
learning_rate = 0.01
for f in model.parameters():
    # perform SGD
    f.data.sub_(f.grad.data * learning_rate)