In [1]:
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
from model import create_fcn as create_fcn

## Data Processing
The data has been arranged in train and test directories. train.npy and test.npy contain the input images for the test and train sets. Likewise, train_cat.npy and test_cat.npy contain the corresponding labels.
We first load the data using `np.load()` and check out the shapes of numpy arrays using `np.shape`. 

In [2]:
train_data = np.load('norb/train.npy')
train_labels = np.load('norb/train_cat.npy')

test_data = np.load('norb/test.npy')
test_labels = np.load('norb/test_cat.npy')

# Let's print the shapes of these numpy arrays
print(train_data.shape, train_labels.shape)
print(test_data.shape, test_labels.shape)


# Let's also check the data type of these variables
print(type(train_data))
for i in range(10):
    print(train_labels[i])

(29160, 1, 108, 108) (29160,)
(29160, 1, 108, 108) (29160,)
<class 'numpy.ndarray'>
0
1
2
3
4
5
0
1
2
3


Notice that the test data is quite large in size. We'll deal with a small subset of the test set for validation and use the rest for testing later. For this, we'll have to slice the test set in its first dimension. 

In [3]:
# Choosing a subset (first 1000) of the test set for validation purposes.
test_data = test_data[:1000]
test_labels = test_labels[:1000]

# Let's verify
print(test_data.shape, test_labels.shape)

(1000, 1, 108, 108) (1000,)


### Converting our data from numpy arrays to PyTorch tensors.
So far, we've been working with numpy arrays. For performing further operations, like forward prop, accessing cuda, we should convert the numpy arrays into PyTorch tensors. It's simple: use `torch.from_numpy()` for this. Typecasting can be accomplished by simply calling the corresponding functions: `x.float()` or `x.long()`.

In [4]:
# Converting to PyTorch tensors
train_data = torch.from_numpy(train_data).float()
test_data = torch.from_numpy(test_data).float()

# If we're planning to use cross entropy loss, the data type of the
# targets needs to be 'long'.
train_labels = torch.from_numpy(train_labels).long()
test_labels = torch.from_numpy(test_labels).long()

print(type(train_data))

<class 'torch.Tensor'>


In [5]:
# When dealing with PyTorch tensors, it is recommended to use x.size()
# instead of x.shape to find the shape/size of the tensor
print(train_data.size())
print(train_data.size(0))

torch.Size([29160, 1, 108, 108])
29160


### Converting into Cuda tensors
Since we're going to use GPUs, the variables first need to be converted to cuda-type. For doing this, use `x = x.cuda()`. This, in effect, loads the tensors into your GPUs memory. This operation should be used very judiciously because if mishandled the data transfer itself could introduce major time delays. More on this later.

In [6]:
# Convert the data and labels into cuda Variables now: x = x.cuda()
train_data = train_data.cuda()
test_data = test_data.cuda()

train_labels = train_labels.cuda()
test_labels = test_labels.cuda()

# Let's do a sanity check
print(train_data.type())

torch.cuda.FloatTensor


Did you notice the slight execution delay in this operation? Yes, that's the time it took to transfer the data into GPUs memory. Also, notice that the tensor type now is `torch.cuda.FloatTensor`. To further verify that the data is actually physically existing in the GPUs memory, go to your terminal and run `$ nvidia-smi`. You should be able to see a python process listed using approx. 2GB of GPU memory. We're inching closer towards training our network.

__Note__: Converting the entire data into cuda variable is NOT a good practice.
We're still able to do it here because our data is small and can fit in
the GPU memory. When working with larger datasets (will see tomorrow) and,
bigger networks, it is strongly advised to convert only the minibatches into cuda just
before they're fed to the network.

### Introducing `torch.autograd.Variable`
So far, we've been dealing with PyTorch tensors very plainly. However, for them to be usable for deep learning operations, we also need to keep track of things like gradients of a tensor, if they are needed, for automatic gradient propagation. For this, we convert our tensors into objects of `torch.autograd.Variable` class. As we'll see, doing this brings our tensors to 'life', ready to handle the excruciatingly painful optimizations and backpropagation!!!

In [7]:
# Convert a tensor to a Variable object by simply asking it to track the gradients
train_data.requires_grad_(True)
test_data.requires_grad_(True)

# The targets/labels do not require gradients
train_labels.requires_grad_(False)
test_labels.requires_grad_(False)
print(train_labels)
print(train_labels.requires_grad)

tensor([ 0,  1,  2,  ...,  3,  4,  5], device='cuda:0')
False


### Creating the Network and setting up the Loss Function
We have created a sample network architecture for you in the file model.py. Check out its `create_fcn()` function. For initializing the losses, one may choose from a large variety of [losses available](https://pytorch.org/docs/stable/nn.html#loss-functions).

In [8]:
# Declaring some network hyperparameters
D_in = 108*108
D_out = 6

# create_fcn function is written in model.py.
model = create_fcn(D_in, D_out)
# Initialise a loss function.
# eg. if we wanted an MSE Loss: loss_fn = nn.MSELoss()
# Please search the PyTorch doc for cross-entropy loss function
# loss_fn = nn.CrossEntropyLoss()
loss_fn = nn.CrossEntropyLoss()

Now, convert the model and the loss funtion into cuda types too. This is similar to what we did with the tensors.

In [9]:
model.cuda()
loss_fn.cuda()

CrossEntropyLoss()

### Introducing Optimizer
An optimizer is the basic engine that performs gradient descent with all its variants and hyperparameters. We need this module to be care-free about weight updates backpropagation. PyTorch provides a wide range of optimizers buil-in. Check out [this link](https://pytorch.org/docs/stable/optim.html) to explore them. Assuming we're using Adam optimizer, we can initialize it by calling `torch.optim.Adam()`. Note that while initializing, it needs to know all the network's parameters (weights and biases). This can be provided by using `model.parameters()`.

In [10]:
learning_rate = 0.0001
# Initializing the optimizer with hyperparameters.
# Please play with SGD, RMSProp, Adagrad, etc.
# Note that different optimizers may require differen hyperparameter values
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [11]:
# Initializing some training parameters
batch_size = 324

# number of batches in one epoch
n_batch = train_data.shape[0] // batch_size 
accuracy = 0.0
n_epoch = 10

### Entering the Training Loop
We'll now enter the training loop and do the following:
* Create a minibatch of size `batch_size` from the train data
* Forward propagate the minibatch through the network
* Compute the loss using the lost function defined previously
* Backpropagate the loss through the network (thanks to `torch.autograd`)
* Update the weights of the model using the `optimizer`
* Finally, compute the performance statistics

In [12]:
# Before we enter the training loop, remember that the shape of our
# input data is (29160, 1, 108, 108). This shape is perfect if using
# a CNN. But, for a fully-connected network, the model expects an 
# input of (minibatch_size, input_dim). Therefore, the (1, 108, 108)
# dimensional image needs to be flattened out before feeding to the 
# network.

train_data = train_data.reshape(-1, 108*108)
test_data = test_data.reshape(-1, 108*108)

for t in range(n_epoch):
     for m in range(n_batch):
         inp = train_data[m * batch_size: (m+1) * batch_size]
         tar = train_labels[m * batch_size: (m+1) * batch_size ]
 
         # Add random perturbations in this functions. Define
         # this function if you wish to use it.
         # inp = add_noise(inp)
 
         # Compute the network's output: Forward Prop
         pred = model(inp)
 
         # Compute the network's loss
         loss = loss_fn(pred, tar)
 
         # Zero the gradients of all the network's parameters
         optimizer.zero_grad()
 
         # Computer the network's gradients: Backward Prop
         loss.backward()
 
         # Update the network's parameters based on the computed
         # gradients
         optimizer.step()
 
         print(t, m, loss.item(), accuracy)
 
     # Validation after every 2nd epoch
     if t % 2 == 0:
         # Forward pass
         output = model(test_data)
 
         # get the index of the max log-probability
         pred = output.data.max(1)[1]
 
         correct = pred.eq(test_labels).sum()
         accuracy = correct.item() / 1000
         print("\n*****************************************\n")
         print(accuracy)
         print("\n*****************************************\n")


0 0 18.262638092041016 0.0
0 1 155.90501403808594 0.0
0 2 196.83798217773438 0.0
0 3 152.5719451904297 0.0
0 4 114.1688232421875 0.0
0 5 78.82913208007812 0.0
0 6 82.3402099609375 0.0
0 7 62.92122268676758 0.0
0 8 48.19945526123047 0.0
0 9 30.81851577758789 0.0
0 10 24.476716995239258 0.0
0 11 27.280046463012695 0.0
0 12 27.58415985107422 0.0
0 13 22.627365112304688 0.0
0 14 22.379323959350586 0.0
0 15 17.835065841674805 0.0
0 16 14.883783340454102 0.0
0 17 10.166007041931152 0.0
0 18 7.2017927169799805 0.0
0 19 7.916727542877197 0.0
0 20 10.896110534667969 0.0
0 21 7.887431621551514 0.0
0 22 10.274028778076172 0.0
0 23 8.752635955810547 0.0
0 24 7.139326095581055 0.0
0 25 7.727548122406006 0.0
0 26 7.699992656707764 0.0
0 27 7.140280723571777 0.0
0 28 5.672860145568848 0.0
0 29 4.5704264640808105 0.0
0 30 4.390462875366211 0.0
0 31 5.461024284362793 0.0
0 32 5.375275611877441 0.0
0 33 4.947575092315674 0.0
0 34 4.124166011810303 0.0
0 35 3.580554723739624 0.0
0 36 3.8129467964172363 0

3 10 1.9939695596694946 0.257
3 11 2.0041122436523438 0.257
3 12 2.020509719848633 0.257
3 13 2.0290138721466064 0.257
3 14 1.9785935878753662 0.257
3 15 1.989641785621643 0.257
3 16 2.133155584335327 0.257
3 17 1.9716650247573853 0.257
3 18 2.0593690872192383 0.257
3 19 2.0269999504089355 0.257
3 20 1.9240705966949463 0.257
3 21 1.7536725997924805 0.257
3 22 2.2867319583892822 0.257
3 23 2.150430917739868 0.257
3 24 2.040879011154175 0.257
3 25 1.9149091243743896 0.257
3 26 1.9041975736618042 0.257
3 27 1.9951214790344238 0.257
3 28 1.9908701181411743 0.257
3 29 2.165639638900757 0.257
3 30 2.074413776397705 0.257
3 31 2.073884963989258 0.257
3 32 1.9952707290649414 0.257
3 33 2.0119850635528564 0.257
3 34 1.9042611122131348 0.257
3 35 2.081861972808838 0.257
3 36 2.3063032627105713 0.257
3 37 1.9877722263336182 0.257
3 38 2.0353612899780273 0.257
3 39 1.9655901193618774 0.257
3 40 2.0798308849334717 0.257
3 41 2.091127634048462 0.257
3 42 1.9187829494476318 0.257
3 43 1.9496454000473

6 20 1.7432078123092651 0.276
6 21 1.5640472173690796 0.276
6 22 1.8642103672027588 0.276
6 23 1.786619782447815 0.276
6 24 1.7553298473358154 0.276
6 25 1.7380467653274536 0.276
6 26 1.6830824613571167 0.276
6 27 1.731202483177185 0.276
6 28 1.7254077196121216 0.276
6 29 1.7743494510650635 0.276
6 30 1.7170919179916382 0.276
6 31 1.7563060522079468 0.276
6 32 1.6645082235336304 0.276
6 33 1.7497055530548096 0.276
6 34 1.7057596445083618 0.276
6 35 1.7760869264602661 0.276
6 36 1.737547755241394 0.276
6 37 1.7040284872055054 0.276
6 38 1.6998168230056763 0.276
6 39 1.7020879983901978 0.276
6 40 1.7167904376983643 0.276
6 41 1.7241568565368652 0.276
6 42 1.7569421529769897 0.276
6 43 1.6904391050338745 0.276
6 44 1.6898781061172485 0.276
6 45 1.7231801748275757 0.276
6 46 1.659539818763733 0.276
6 47 1.7212063074111938 0.276
6 48 1.7010724544525146 0.276
6 49 1.7331684827804565 0.276
6 50 1.6820929050445557 0.276
6 51 1.665595293045044 0.276
6 52 1.682934045791626 0.276
6 53 1.641401529

9 22 1.6727015972137451 0.307
9 23 1.601629614830017 0.307
9 24 1.5963634252548218 0.307
9 25 1.5189048051834106 0.307
9 26 1.5652315616607666 0.307
9 27 1.5828133821487427 0.307
9 28 1.5669399499893188 0.307
9 29 1.6210089921951294 0.307
9 30 1.582269549369812 0.307
9 31 1.6595406532287598 0.307
9 32 1.5691709518432617 0.307
9 33 1.6524970531463623 0.307
9 34 1.6485130786895752 0.307
9 35 1.6219627857208252 0.307
9 36 1.595784306526184 0.307
9 37 1.5948957204818726 0.307
9 38 1.6385635137557983 0.307
9 39 1.5939081907272339 0.307
9 40 1.6111139059066772 0.307
9 41 1.6085941791534424 0.307
9 42 1.6252349615097046 0.307
9 43 1.600885033607483 0.307
9 44 1.6436967849731445 0.307
9 45 1.5398772954940796 0.307
9 46 1.5506510734558105 0.307
9 47 1.6348074674606323 0.307
9 48 1.5811620950698853 0.307
9 49 1.6153044700622559 0.307
9 50 1.6089204549789429 0.307
9 51 1.5259758234024048 0.307
9 52 1.5560765266418457 0.307
9 53 1.565075397491455 0.307
9 54 1.612939476966858 0.307
9 55 1.513331651

That's our introduction to neural networks using PyTorch. Tomorrow, we'll try solving a more challenging problem with bigger dataset and more complicated network in a more principled manner! Hope to see you all tomorrow!