build and train deep learning architectures, such as RNN, for an NLP task. The task at hand is emotion classification which is a multi-class problem.

---

### Steps:

- Load the Data
- Implementing Model
- Pretesting Model
- Setup Training
- Traing Model
- Storing Model


### Load the Data
Instead of reloading the data, we restore it from the previous phase.




In [0]:
import torch
import pickle
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import time

# helper functions
def convert_to_pickle(item, directory):
    pickle.dump(item, open(directory,"wb"))


def load_from_pickle(directory):
    return pickle.load(open(directory,"rb"))

In [7]:
# READ YOUR DATA FROM GOOGLE DRIVE
from google.colab import drive
drive.mount('/gdrive')

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


In [0]:
# data instance
class MyData(Dataset):
    def __init__(self, X, y):
        self.data = X
        self.target = y
        self.length = [ np.sum(1 - np.equal(x, 0)) for x in X]
        
    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]
        x_len = self.length[index]
        return x, y, x_len
    
    def __len__(self):
        return len(self.data)

In [0]:
data_folder = "/gdrive/My Drive/NLP_PyTorch/"

train_dataset = load_from_pickle(data_folder + "train_dataset")
test_dataset = load_from_pickle(data_folder + "test_dataset")
val_dataset = load_from_pickle(data_folder + "val_dataset")

In [5]:
train_dataset.batch_size

64

### Implementing Model

After the data has been preprocessed, transformed and prepared it is now time to construct the model or the so-called computation graph that will be used to train our classification models. We are going to use a gated recurrent neural network (GRU), which is considered a more efficient version of a basic RNN. The figure below shows a high-level overview of the model details. 

![alt txt](https://github.com/omarsar/nlp_pytorch_tensorflow_notebooks/blob/master/img/gru-model.png?raw=true)

In [0]:
class EmoGRU(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_units, batch_sz, output_size):
        super(EmoGRU, self).__init__()
        self.batch_sz = batch_sz
        self.hidden_units = hidden_units
        self.embedding_dim = embedding_dim
        self.vocab_size = vocab_size
        self.output_size = output_size
        
        # layers
        self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim)
        self.dropout = nn.Dropout(p=0.5)
        self.gru = nn.GRU(self.embedding_dim, self.hidden_units)
        self.fc = nn.Linear(self.hidden_units, self.output_size)
    
    def initialize_hidden_state(self, device):
        return torch.zeros((1, self.batch_sz, self.hidden_units)).to(device)
    
    def forward(self, x, lens, device):
        x = self.embedding(x)
        self.hidden = self.initialize_hidden_state(device)
        output, self.hidden = self.gru(x, self.hidden) # max_len X batch_size X hidden_units
        out = output[-1, :, :] 
        out = self.dropout(out)
        out = self.fc(out)
        return out, self.hidden

### Pretesting models

In [0]:
# parameters
TRAIN_BUFFER_SIZE = 40000 # len(input_tensor_train)
VAL_BUFFER_SIZE = 5000 # len(input_tensor_val)
TEST_BUFFER_SIZE = 5000 # len(input_tensor_test)
BATCH_SIZE = 64
TRAIN_N_BATCH = TRAIN_BUFFER_SIZE // BATCH_SIZE
VAL_N_BATCH = VAL_BUFFER_SIZE // BATCH_SIZE
TEST_N_BATCH = TEST_BUFFER_SIZE // BATCH_SIZE

embedding_dim = 256
units = 1024
vocab_inp_size = 27291 # len(inputs.word2idx)
target_size = 6 # num_emotions

In [0]:
# sort batch function to be able to use with pad_packed_sequence        
# batch elements ordered decreasingle by their length

def sort_batch(X, y, lengths):                                     # made for machine translation task, not need for classification, but very useful
    "sort the batch by length"
    
    lengths, indx = lengths.sort(dim=0, descending=True)
    X = X[indx]
    y = y[indx]
    return X.transpose(0,1), y, lengths # transpose (batch x seq) to (seq x batch)

`pad_packed_sequence` is a utility function to efficiently and automatically pad your data of variable length sequences.

In [13]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = EmoGRU(vocab_inp_size, embedding_dim, units, BATCH_SIZE, target_size)
model.to(device)

# obtain one sample from the data iterator
it = iter(train_dataset)
x, y, x_len = next(it)

# sort the batch first to be able to use with pac_pack sequence
xs, ys, lens = sort_batch(x, y, x_len)

print("Input size: ", xs.size())

output, _ = model(xs.to(device), lens, device)
print(output.size())

Input size:  torch.Size([69, 64])
torch.Size([64, 6])


### Setup Training
Now that we have tested the model, it is time to train it. We will define out optimization algorithm, learnin rate, and other necessary information to train the model.

In [0]:
# Enabling cuda
use_cuda = True if torch.cuda.is_available() else False
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = EmoGRU(vocab_inp_size, embedding_dim, units, BATCH_SIZE, target_size)
model.to(device)

# loss criterion and optimizer for training
criterion = nn.CrossEntropyLoss() # the same as log_softmax + NLLLoss
optimizer = torch.optim.Adam(model.parameters())

def loss_function(y, prediction):
    """ CrossEntropyLoss expects outputs and class indices as target """
    # convert from one-hot encoding to class indices
    target = torch.max(y, 1)[1]
    loss = criterion(prediction, target) 
    return loss   #TODO: refer the parameter of these functions as the same
    
def accuracy(target, logit):
    ''' Obtain accuracy for training round '''
    target = torch.max(target, 1)[1] # convert from one-hot encoding to class indices
    corrects = (torch.max(logit, 1)[1].data == target).sum()
    accuracy = 100.0 * corrects / len(logit)
    return accuracy

### Training Model

Now we finally train the model.

In [15]:
EPOCHS = 10

for epoch in range(EPOCHS):
    start = time.time()
    
    ### Initialize hidden state
    # TODO: do initialization here.
    total_loss = 0
    train_accuracy, val_accuracy = 0, 0
    
    ### Training
    for (batch, (inp, targ, lens)) in enumerate(train_dataset):
        loss = 0
        predictions, _ = model(inp.permute(1 ,0).to(device), lens, device) # TODO:don't need _   
              
        loss += loss_function(targ.to(device), predictions)
        batch_loss = (loss / int(targ.shape[1]))        
        total_loss += batch_loss
        
        optimizer.zero_grad()                     # standard code in PyTorch model
        loss.backward()                           # standard code in PyTorch model
        optimizer.step()                          # standard code in PyTorch model
        
        batch_accuracy = accuracy(targ.to(device), predictions)
        train_accuracy += batch_accuracy
        
        if batch % 100 == 0:
            print('Epoch {} Batch {} Val. Loss {:.4f}'.format(epoch + 1,
                                                         batch,
                                                         batch_loss.cpu().detach().numpy()))
            
    ### Validating
    for (batch, (inp, targ, lens)) in enumerate(val_dataset):        
        predictions,_ = model(inp.permute(1, 0).to(device), lens, device)        
        batch_accuracy = accuracy(targ.to(device), predictions)
        val_accuracy += batch_accuracy
    
    print('Epoch {} Loss {:.4f} -- Train Acc. {:.4f} -- Val Acc. {:.4f}'.format(epoch + 1, 
                                                             total_loss / TRAIN_N_BATCH, 
                                                             train_accuracy / TRAIN_N_BATCH,
                                                             val_accuracy / VAL_N_BATCH))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Val. Loss 0.3010
Epoch 1 Batch 100 Val. Loss 0.2606
Epoch 1 Batch 200 Val. Loss 0.2146
Epoch 1 Batch 300 Val. Loss 0.1203
Epoch 1 Batch 400 Val. Loss 0.0841
Epoch 1 Batch 500 Val. Loss 0.0365
Epoch 1 Batch 600 Val. Loss 0.0399
Epoch 1 Loss 0.1420 -- Train Acc. 67.2050 -- Val Acc. 91.8670
Time taken for 1 epoch 31.093199253082275 sec

Epoch 2 Batch 0 Val. Loss 0.0325
Epoch 2 Batch 100 Val. Loss 0.0229
Epoch 2 Batch 200 Val. Loss 0.0076
Epoch 2 Batch 300 Val. Loss 0.0272
Epoch 2 Batch 400 Val. Loss 0.0168
Epoch 2 Batch 500 Val. Loss 0.0226
Epoch 2 Batch 600 Val. Loss 0.0316
Epoch 2 Loss 0.0264 -- Train Acc. 93.0675 -- Val Acc. 92.7484
Time taken for 1 epoch 31.039151191711426 sec

Epoch 3 Batch 0 Val. Loss 0.0197
Epoch 3 Batch 100 Val. Loss 0.0343
Epoch 3 Batch 200 Val. Loss 0.0300
Epoch 3 Batch 300 Val. Loss 0.0167
Epoch 3 Batch 400 Val. Loss 0.0323
Epoch 3 Batch 500 Val. Loss 0.0292
Epoch 3 Batch 600 Val. Loss 0.0246
Epoch 3 Loss 0.0204 -- Train Acc. 94.1650 -- Val Acc.

### Stopping the Model

How do we know when to stop the model. We can use a technique called `early stopping`, not covered here, but widely used in deep learning, to control the convergence of models.

### Store the Model


In [16]:
torch.save(model, "/gdrive/My Drive/NLP_PyTorch/emogru")

  "type " + obj.__name__ + ". It won't be checked "


---

###Implementing more deep learning model
Implement a model similar to the one above. Try to use an LSTM instead of an GRU. Go into the pytorch documentation and research quick ways to improve the model, like adding a `Dropout` [layer](https://pytorch.org/docs/stable/_modules/torch/nn/modules/dropout.html). Anything to make your model faster and better. Also, add additional layers (i.e., make it deeper) to improve the model potential.

---

