In [1]:
 import torch
 import torch.nn as nn
 import torch.utils.data as data
 from names_loader import NameData
 from model import RNN
 from torch.autograd import Variable

## Lab 3 Introduction
So far, we've worked on an image classification task on Norb dataset and a semantic segmentation task on PASCAL VOC 2007 dataset. In today's lab, we'll build on the known concepts to construct a Recurrent Neural Network (RNN). The problem we'll try to solve is a toy problem: given the last name of a person, predict the country of origin!

For this, we have provided a dataset in `data/` directory. The idea is to build an RNN that sees one __letter__ at a time and when all the letters are seen, we ask it to predict the country of origin of the name. We have, like last lab, three files to write: the model, the dataloader, and this training file. 

### Setting the dataloader
We have written a dataset class for you (if you want to go back early) in `names_loader.py`. You may choose to write your own dataset class if you wish. Let's try to create two dataloader objects, one for training and one for testing. Once that is done, have a look into what the dataloader produces. You'll find the input of size `(batch_size, 18, 57)`. 18 is the maximum length of the names in the dataset. If a name is shorter than 18 letters, it is left padded. 57 is the number of characters in the alphabet. The last dimension is a one-hot vector of the character.

In [2]:
# Initializing the dataset objects
dataset_train = NameData('./data', 'train')
dataset_val = NameData('./data', 'val')

# Initializing the dataloader object. 
dataloader_train = data.DataLoader(
                dataset_train, batch_size = 8, 
                shuffle = True, num_workers = 4)

dataloader_val = data.DataLoader(
                dataset_val, batch_size = 1, 
                shuffle = False, num_workers = 4)

print(dataset_train.n_categories)

18


In [3]:
# Let's investigate the output of the dataloader
for i, (input, target) in enumerate(dataloader_train):
    print(input.size())
    if i > 10:
        break

torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])
torch.Size([8, 18, 57])


### Loading the model, criterion and the optimizer
Let's try to load the network now. This should be routine by now! Also convert the model and criterion into cuda variables.

In [4]:
# Initialize the network. Hidden size: 1024.
# 57 is the length of the one-hot-encoded input at each timestep
model = RNN(57, 1024, dataset_train.n_categories)
criterion = nn.NLLLoss()

# Convert model and criterion into cuda here
model.cuda()
criterion.cuda()

# Print the RNN
print(model)

RNN(
  (i2h): Linear(in_features=1081, out_features=1024, bias=True)
  (i2o): Linear(in_features=1081, out_features=18, bias=True)
  (softmax): LogSoftmax()
)


Let's also initialize an optimizer for the task. You're free to make your hyperparameter decisions in this regard

In [5]:
optimizer = torch.optim.SGD(model.parameters(), 0.005)

We're ready to train our first RNN. But before we do that, we need to write the train function that iterates over the data, forward props, computes the losses, backprops and  finally updates. 

In [6]:
def train(epoch, dataloader, model, criterion, optimizer, categories, split = 'train'):
    # Useful for some book-keeping 
    loss_meter, acc_meter, count = 0, 0, 0

    # Call model.eval if we're doing validation 
    if split == 'valid' or split == 'test':
         model = model.eval()

    for i, (input, target) in enumerate(dataloader):
        input = Variable(input.float()).cuda()
        target = Variable(target.reshape(-1,)).long().cuda()
    
        # Initializing the hidden state
        batch_size = input.size(0)
        hidden = Variable(model.init_hidden(batch_size)).cuda()

        # seq_len = input.size(1)
        model.zero_grad()
        
        for f in range(input.size(1)):
            output, hidden = model(input[:,f,:], hidden)

        loss = criterion(output, target)
        acc = accuracy(output, target)
        loss_meter += loss.data.cpu().numpy()
        acc_meter += acc

        if split == 'train':
            optimizer.zero_grad()
            loss.backward()
            # A must-do step to avoid the exploding gradient problem.
            # We're restricting the norm of the the gradients to less than 5.
            # The effects of this may not be visible in this toy problem, but
            # can be seen when dealing with more complicated problems.
            torch.nn.utils.clip_grad_norm_(model.parameters(), 5)
            optimizer.step()

        count += 1
        # print('loss at epoch ', str(epoch), ' iteration ', str(i), ' is: ', loss.data.cpu().numpy())
        if i % 500 == 0:
            print(split + ' epoch ', epoch, ' iteration ', i, ' loss is : ', 
                  loss_meter / count, ' accuracy is  ', acc_meter / count)

    print(split + ' loss at epoch ', str(epoch), ' is: ', loss_meter / count)



Let's also write a funciton `accuracy` that computes the accuracy of our predictions.

In [7]:
 def accuracy(pred, gt):
     pred = pred.argmax(1)
     correct, count = 0, 0
     for i in range(pred.size(0)):
         if pred[i] == gt[i]:
             correct += 1
         count += 1
     accuracy = correct / count
     return accuracy

Now is the time to enter the training loop. We'll iterate for `n_epoch` times and validate after every second epoch:

In [8]:
n_epoch = 6
categories = dataset_train.all_categories

for i in range(n_epoch):
    train(i, dataloader_train, model, criterion, optimizer, categories, 'train')
    if i % 2 == 1:
     print('***************** Validation Loop *********************')
     train(i, dataloader_val, model, criterion, optimizer, categories, 'val')


  output = self.softmax(output)


train epoch  0  iteration  0  loss is :  2.907675266265869  accuracy is   0.0
train epoch  0  iteration  500  loss is :  2.0793430933695354  accuracy is   0.4593313373253493
train epoch  0  iteration  1000  loss is :  1.9381074958390647  accuracy is   0.466033966033966
train epoch  0  iteration  1500  loss is :  1.8790211497665166  accuracy is   0.46085942704863425
train loss at epoch  0  is:  1.8377416158517201
train epoch  1  iteration  0  loss is :  1.4721016883850098  accuracy is   0.625
train epoch  1  iteration  500  loss is :  1.6903273558426284  accuracy is   0.46457085828343314
train epoch  1  iteration  1000  loss is :  1.664608210057288  accuracy is   0.47115384615384615
train epoch  1  iteration  1500  loss is :  1.6444629468296783  accuracy is   0.47759826782145237
train loss at epoch  1  is:  1.6243074260870616
***************** Validation Loop *********************
val epoch  1  iteration  0  loss is :  0.2869863510131836  accuracy is   1.0
val epoch  1  iteration  500  