# Exercise 5 (NLP): Very Deep Learning
**Natural language processing (NLP)** is the ability of a computer program to understand human language as it is spoken. It involves a pipeline of steps and by the end of the exercise, we would be able to classify the sentiment of a given review as POSITIVE or NEGATIVE.

In [None]:
import numpy as np

# read data from reviews and labels file.
with open('data/reviews.txt', 'r') as f:
    reviews_ = f.readlines()
with open('data/labels.txt', 'r') as f:
    
    labels = f.readlines()

In [None]:
# One of the most important task is to visualize data before starting with any ML task. 
for i in range(5):
    print(labels[i] + "\t: " + reviews_[i][:100] + "...")

positive
	: bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life...
negative
	: story of a man who has unnatural feelings for a pig . starts out with a opening scene that is a terr...
positive
	: homelessness  or houselessness as george carlin stated  has been an issue for years but never a plan...
negative
	: airport    starts as a brand new luxury    plane is loaded up with valuable paintings  such belongin...
positive
	: brilliant over  acting by lesley ann warren . best dramatic hobo lady i have ever seen  and love sce...


We can see there are a lot of punctuation marks like fullstop(.), comma(,), new line (\n) and so on and we need to remove it. 

Here is a list of all the punctuation marks that needs to be removed 
```
(!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~)
```


## Task 1: Remove all the punctuation marks from the reviews.
Many ways of doing it: Regex, Spacy, import string from punctuation.

In [None]:
# Make everything lower case to make the whole dataset even. 
reviews = ''.join(reviews_).lower()

# complete the function below to remove punctuations and save it in no_punct_text
from string import punctuation
def text_without_punct(reviews):
    no_punct_text = ''.join([c for c in reviews if c not in punctuation])
    return no_punct_text

no_punct_text = text_without_punct(reviews)
reviews_split = no_punct_text.split('\n')

# split the formatted no_punct_text into words
def split_in_words(reviews_split):
    reviews_split = ' '.join(reviews_split)
    return reviews_split.split()

words = split_in_words(reviews_split)

In [None]:
# once you are done print the ten words that should yield the following output
print(words[:10])

# print the total length of the words
print(len(words))

# Total number of unique words
print(len(set(words)))

['bromwell', 'high', 'is', 'a', 'cartoon', 'comedy', 'it', 'ran', 'at', 'the']
6020196
74072


The next step is to create a vocabulary. This way every word is mapped to an integer number.
```
Example: 1: hello, 2: I, 3: am, 4: Robo and so on...
```


In [None]:
# Lets create a vocab out of it
from collections import Counter

## Let's keep a count of all the words and let's see how many words are there. 
def word_count(words):
    return Counter(words)

counts = word_count(words)

# If you did everything correct, this is what you should get as output. 
print(counts['wonderful'])
print(counts['bad'])

1658
9308


## Task 2: Word to Integer and Integer to word
Map every word to an integer value and vice-versa. 


In [None]:
# define a vocabulary for the words
def vocabulary(counts):
    return counts.most_common(len(words))

vocab = vocabulary(counts)
print(len(vocab))
vocab[1]

74072


('and', 164107)

In [None]:
# map each vocab word to an integer. Also, start the indexing with 1 as we will use 
# '0' for padding and we dont want to mix the two.
def vocabulary_to_integer(vocab):
    vocab_to_int = {w:i+1 for i, (w,c) in enumerate(vocab)}
    return vocab_to_int

vocab_to_int = vocabulary_to_integer(vocab)

# verify if the length is same and if 'and' is mapped to the correct integer value.
print(len(vocab_to_int))
vocab_to_int['and']

74072


2

In [None]:
# Let's see what positve words in positive reviews we have and what we have in negative reviews. 

positive_counts = Counter()
negative_counts = Counter()

for i in range(len(reviews_)):
    if(labels[i] == 'positive\n'):
        for word in reviews_[i].split(" "):
            positive_counts[word] += 1
    else:
        for word in reviews_[i].split(" "):
            negative_counts[word] += 1

In [None]:
labels[:10]

['positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n',
 'positive\n',
 'negative\n']

In [None]:
positive_counts.most_common()[:10]

[('', 537968),
 ('the', 173324),
 ('.', 159654),
 ('and', 89722),
 ('a', 83688),
 ('of', 76855),
 ('to', 66746),
 ('is', 57245),
 ('in', 50215),
 ('br', 49235)]

In [None]:
negative_counts.most_common()[:10]

[('', 548962),
 ('.', 167538),
 ('the', 163389),
 ('a', 79321),
 ('and', 74385),
 ('of', 69009),
 ('to', 68974),
 ('br', 52637),
 ('is', 50083),
 ('it', 48327)]

The above is just to show the most common words in the positive and negative sentences. However, there are a lot of unnecessary words like `the`, `a`, `was`, and so on. Can you find a way to show the relevant words and not these words? 

```
Hint: Stop Words removal or normalizing each term.
```

In [None]:
words[:10]

['bromwell', 'high', 'is', 'a', 'cartoon', 'comedy', 'it', 'ran', 'at', 'the']

In [None]:
[vocab_to_int[word] for word in words[:10]]

[21025, 308, 6, 3, 1050, 207, 8, 2138, 32, 1]

In [None]:
vocab_to_int['bromwell']

21025

## One hot encoding

We need one hot encoding for the labels. Think of a reason why we need one hot encoded labels for classes?

## Task 3: Create one hot encoding for the labels. 

* Write the one hot encoding logic in the `one_hot` function.
* Use 1 for positive label and 0 for negative label.
* Save all the values in the `encoded_labels` function.

In [None]:
print(labels[:10])

['positive\n', 'negative\n', 'positive\n', 'negative\n', 'positive\n', 'negative\n', 'positive\n', 'negative\n', 'positive\n', 'negative\n']


In [None]:
# 1 for positive label and 0 for negative label
def one_hot(labels):
  ohe = [1 if label =='positive\n' else 0 for label in labels]
  ohe = np.array(ohe)
  return ohe

encoded_labels = one_hot(labels)

# Print the length of your label and uncomment next line only if the encoded_labels size is 25001.
# If you dont get the intuition behind this step, print encoded_labels to see it.
# encoded_labels = encoded_labels[:25000]
print(len(encoded_labels))
print(encoded_labels[:10])

25000
[1 0 1 0 1 0 1 0 1 0]


In [None]:
reviews_ints = []
for review in reviews_split:
    reviews_ints.append([vocab_to_int[word] for word in review.split()])

# This step is to see if any review is empty and we remove it. Otherwise the input will be all zeroes.
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))

Zero-length reviews: 1
Maximum review length: 2514


In [None]:
print('Number of reviews before removing outliers: ', len(reviews_ints))

## remove any reviews/labels with zero length from the reviews_ints list.

# get indices of any reviews with length 0
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]

# remove 0-length reviews and their labels
reviews_ints = [reviews_ints[ii] for ii in non_zero_idx]
encoded_labels = np.array([encoded_labels[ii] for ii in non_zero_idx])

print('Number of reviews after removing outliers: ', len(reviews_ints))

len(encoded_labels)

Number of reviews before removing outliers:  25001
Number of reviews after removing outliers:  25000


25000

## Task 4: Padding the data

> Define a function that returns an array `features` that contains the padded data, of a standard size, that we'll pass to the network. 
* The data should come from `review_ints`, since we want to feed integers to the network. 
* Each row should be `seq_length` elements long. 
* For reviews shorter than `seq_length` words, **left pad** with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. 
* For reviews longer than `seq_length`, use only the first `seq_length` words as the feature vector.

As a small example, if the `seq_length=10` and an input review is: 
```
[117, 18, 128]
```
The resultant, padded sequence should be: 

```
[0, 0, 0, 0, 0, 0, 0, 117, 18, 128]
```

**Your final `features` array should be a 2D array, with as many rows as there are reviews, and as many columns as the specified `seq_length`.**

In [None]:
# Write the logic for padding the data
def pad_features(reviews_ints, seq_length):
  features = np.zeros((len(reviews_ints), seq_length), dtype = int)
    
  for i, review in enumerate(reviews_ints):
    review_len = len(review)
        
    if review_len <= seq_length:
      zeroes = list(np.zeros(seq_length-review_len))
      new = zeroes+review
    elif review_len > seq_length:
      new = review[0:seq_length]
        
    features[i,:] = np.array(new)
  return features

# Verify if everything till now is correct. 

seq_length = 200

features = pad_features(reviews_ints, seq_length=seq_length)

## test statements - do not change - ##
assert len(features)==len(reviews_ints), "Your features should have as many rows as reviews."
assert len(features[0])==seq_length, "Each feature row should contain seq_length values."

# print first 10 values of the first 30 batches 
print(features[:30,:10])

[[    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [22382    42 46418    15   706 17139  3389    47    77    35]
 [ 4505   505    15     3  3342   162  8312  1652     6  4819]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [   54    10    14   116    60   798   552    71   364     5]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    1   330   578    34     3   162   748  2731     9   325]
 [    9    11 10171  5305  1946   689   444    22   280   673]
 [    0     0     0     0     0     0     0     0     0

Now we have everything ready. It's time to split our dataset into `Train`, `Test` and `Validate`. 

Read more about the train-test-split here : https://cs230-stanford.github.io/train-dev-test-split.html

## Task 5: Lets create train, test and val split in the ratio of 8:1:1.  

Hint: Either use shuffle and slicing in Python or use train-test-val split in Sklearn. 

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(features, encoded_labels, test_size=0.2, random_state=1)

In [None]:
train_frac = 0.8
val_frac = 0.1
test_frac = 0.1

from sklearn.model_selection import train_test_split
def train_test_val_split(features, encoded_labels):
  x_train, x_test, y_train, y_test = train_test_split(features, encoded_labels, test_size=1 - train_frac)

  # Split the current (20%) test set again into 10% test and 10% validation sets. 
  x_val, x_test, y_val, y_test = train_test_split(x_test, y_test, test_size=test_frac/(test_frac + val_frac)) 

  return x_train, x_val, x_test, y_train, y_val, y_test

# def train_test_val_labels(encoded_labels):
#     pass

train_x, val_x, test_x, train_y, val_y, test_y = train_test_val_split(features, encoded_labels)

# train_y, val_y, test_y = train_test_val_labels(encoded_labels)

## print out the shapes of your resultant feature data
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		(2500, 200)


## DataLoaders and Batching

After creating training, test, and validation data, we can create DataLoaders for this data by following two steps:
1. Create a known format for accessing our data, using [TensorDataset](https://pytorch.org/docs/stable/data.html#) which takes in an input set of data and a target set of data with the same first dimension, and creates a dataset.
2. Create DataLoaders and batch our training, validation, and test Tensor datasets.

```
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, batch_size=batch_size)
```

This is an alternative to creating a generator function for batching our data into full batches.

### Task 6: Create a generator function for the dataset. 
See the above link for more info.

In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets for train, test and val
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
val_data = TensorDataset(torch.from_numpy(val_x), torch.from_numpy(val_y))
test_data = TensorDataset(torch.from_numpy(test_x), torch.from_numpy(test_y))

# dataloaders
batch_size = 50 

# make sure to SHUFFLE your training data. Keep Shuffle=True.
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
val_loader = DataLoader(val_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

# obtain one batch of training data and label. 
dataiter = iter(train_loader)
sample_x, sample_y = dataiter.next()

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)

# Check if GPU is available.
train_on_gpu=torch.cuda.is_available()

if(train_on_gpu):
    print('Training on GPU.')
else:
    print('No GPU available, training on CPU.')

Sample input size:  torch.Size([50, 200])
Sample input: 
 tensor([[   0,    0,    0,  ...,  911,  610,  135],
        [   0,    0,    0,  ...,  430,  120,   24],
        [   0,    0,    0,  ...,    8,   35, 1348],
        ...,
        [  40,   47,   77,  ...,  250,    5,   29],
        [1011,  674,  332,  ..., 1047,   42,    1],
        [   0,    0,    0,  ...,    3, 2136,  152]])

Sample label size:  torch.Size([50])
Sample label: 
 tensor([0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0,
        0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0,
        1, 0])
Training on GPU.


## Creating the Model 

Here we are creating a simple RNN in PyTorch and pass the output to the a Linear layer and Sigmoid at the end to get the probability score and prediction as POSITIVE or NEGATIVE. 

The network is very similar to the CNN network created in Exercise 2. 

More info available at: https://pytorch.org/docs/0.3.1/nn.html?highlight=rnn#torch.nn.RNN

Read about the parameters that the RNN takes and see what will happen when `batch_first` is set as `True`.

In [None]:
import torch.nn as nn

class SentimentRNN(nn.Module):
    """
    The RNN model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, hidden_dim, n_layers, drop_prob=0):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentRNN, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # RNN layer
        self.rnn = nn.RNN(vocab_size, hidden_dim, n_layers, 
                            dropout=drop_prob, batch_first=True)
        
        # linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()
        

    def forward(self, x, hidden):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        batch_size = x.size(0)

        # RNN out layer
        rnn_out, hidden = self.rnn(x, hidden)
    
        # stack up lstm outputs
        rnn_out = rnn_out.view(-1, self.hidden_dim)
        
        # dropout and fully-connected layer
        out = self.dropout(rnn_out)
        out = self.fc(out)
        # sigmoid function
        sig_out = self.sig(out)
        
        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1] # get last batch of labels
        
        # return last sigmoid output and hidden state
        return sig_out, hidden
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden    


## Task 7 : Know the shape

Given a batch of 64 and input size as 1 and a sequence length of 200 to a RNN with 2 stacked layers and 512 hidden layers, find the shape of input data (x) and the hidden dimension (hidden) specified in the forward pass of the network. Note, the batch_first is kept to be True. 



In [None]:
# Instantiate the model w/ hyperparams
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
hidden_dim = 256
n_layers = 1

net = SentimentRNN(vocab_size, output_size, hidden_dim, n_layers)
print(net)

SentimentRNN(
  (rnn): RNN(74073, 256, batch_first=True)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)



## Task 8: LSTM 

Before we start creating the LSTM, it is important to understand LSTM and to know why we prefer LSTM over a Vanilla RNN for this task. 
> Here are some good links to know about LSTM:
* [Colah Blog](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
* [Understanding LSTM](http://blog.echen.me/2017/05/30/exploring-lstms/)
* [RNN effectiveness](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)


Now create a class named SentimentLSTM with `n_layers=2`, and rest all hyperparameters same as before. Also, create an embedding layer and feed the output of the embedding layer as input to the LSTM model. Dont forget to add a regularizer (dropout) layer after the LSTM layer with p=0.4 to prevent overfitting. 

In [None]:
import torch.nn as nn

class SentimentLSTM(nn.Module):
    """
    The LSTM model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentLSTM, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # define embedding, LSTM, dropout and Linear layers here
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, 
                            dropout=drop_prob, batch_first=True)
        
        # dropout layer
        self.dropout = nn.Dropout(drop_prob)

        # linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()
        

    def forward(self, x, hidden):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        batch_size = x.size(0)

        # embeddings and lstm_out
        embeds = self.embedding(x)
        lstm_out, hidden = self.lstm(embeds, hidden)
    
        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        
        # dropout and fully-connected layer
        out = self.dropout(lstm_out)
        out = self.fc(out)
        # sigmoid function
        sig_out = self.sig(out)
        
        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1] # get last batch of labels
        
        # return last sigmoid output and hidden state
        return sig_out, hidden    
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden

## Instantiate the network

Here, we'll instantiate the network. First up, defining the hyperparameters.

* `vocab_size`: Size of our vocabulary or the range of values for our input, word tokens.
* `output_size`: Size of our desired output; the number of class scores we want to output (pos/neg).
* `embedding_dim`: Number of columns in the embedding lookup table; size of our embeddings.
* `hidden_dim`: Number of units in the hidden layers of our LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `n_layers`: Number of LSTM layers in the network. Typically between 1-3

In [None]:
# Instantiate the model with these hyperparameters
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 300
hidden_dim = 256
n_layers = 2

net = SentimentLSTM(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)
print(net)

# loss and optimization functions
lr=0.001

criterion = nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

SentimentLSTM(
  (embedding): Embedding(74073, 300)
  (lstm): LSTM(300, 256, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)


### Task 9: Loss Functions
We are using `BCELoss (Binary Cross Entropy Loss)` since we have two output classes. 

Can Cross Entropy Loss be used instead of BCELoss? 

If no, why not? If yes, how?

Is `NLLLoss()` and last layer as `LogSoftmax()` is same as using `CrossEntropyLoss()` with a Softmax final layer? Can you get the mathematical intuition behind it?

In [None]:
from tensorboardX import SummaryWriter

writer = SummaryWriter(".")
#Training and Validation

epochs = 4 # 3-4 is approx where I noticed the validation loss stop decreasing

counter = 0
print_every = 100
clip=5 # gradient clipping

# move model to GPU, if available
if(train_on_gpu):
    net.cuda()

net.train()
# train for some number of epochs
for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1

        if(train_on_gpu):
            inputs, labels = inputs.cuda(), labels.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()

        # get the output from the model
        output, h = net(inputs, h)

        # calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()

        # loss stats
        if counter % print_every == 0:
            # Get validation loss
            val_h = net.init_hidden(batch_size)
            val_losses = []
            net.eval()
            for inputs, labels in val_loader:

                # Creating new variables for the hidden state, otherwise
                # we'd backprop through the entire training history
                val_h = tuple([each.data for each in val_h])

                if(train_on_gpu):
                    inputs, labels = inputs.cuda(), labels.cuda()

                output, val_h = net(inputs, val_h)
                val_loss = criterion(output.squeeze(), labels.float())

                val_losses.append(val_loss.item())

            net.train()
            writer.add_scalar('Train Loss', loss.item(), counter)
            writer.add_scalar('Val Loss', val_loss.item(), counter)

            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.6f}...".format(loss.item()),
                  "Val Loss: {:.6f}".format(np.mean(val_losses)))

Epoch: 1/4... Step: 100... Loss: 0.006300... Val Loss: 0.953706
Epoch: 1/4... Step: 200... Loss: 0.003650... Val Loss: 1.012819
Epoch: 1/4... Step: 300... Loss: 0.020759... Val Loss: 0.853428
Epoch: 1/4... Step: 400... Loss: 0.003493... Val Loss: 0.862832
Epoch: 2/4... Step: 500... Loss: 0.072776... Val Loss: 0.905352
Epoch: 2/4... Step: 600... Loss: 0.001030... Val Loss: 0.967910
Epoch: 2/4... Step: 700... Loss: 0.002339... Val Loss: 0.949192
Epoch: 2/4... Step: 800... Loss: 0.006935... Val Loss: 0.904492
Epoch: 3/4... Step: 900... Loss: 0.001341... Val Loss: 1.055652
Epoch: 3/4... Step: 1000... Loss: 0.009274... Val Loss: 1.063849
Epoch: 3/4... Step: 1100... Loss: 0.000660... Val Loss: 0.950976
Epoch: 3/4... Step: 1200... Loss: 0.004356... Val Loss: 1.044279
Epoch: 4/4... Step: 1300... Loss: 0.082209... Val Loss: 1.129178
Epoch: 4/4... Step: 1400... Loss: 0.017529... Val Loss: 0.898144
Epoch: 4/4... Step: 1500... Loss: 0.006740... Val Loss: 1.009609
Epoch: 4/4... Step: 1600... Loss: 

In [None]:
# Get test data loss and accuracy

test_losses = [] # track loss
num_correct = 0

if(train_on_gpu):
    net.cuda()

# init hidden state
h = net.init_hidden(batch_size)

net.eval()
# iterate over test data
for inputs, labels in test_loader:

  # Creating new variables for the hidden state, otherwise
  # we'd backprop through the entire training history
  h = tuple([each.data for each in h])

  if(train_on_gpu):
      inputs, labels = inputs.cuda(), labels.cuda()
  
  # get predicted outputs
  output, h = net(inputs, h)
  
  # calculate loss
  test_loss = criterion(output.squeeze(), labels.float())
  test_losses.append(test_loss.item())
  
  # convert output probabilities to predicted class (0 or 1)
  pred = torch.round(output.squeeze())  # rounds to the nearest integer
  
  # compare predictions to true label
  correct_tensor = pred.eq(labels.float().view_as(pred))
  correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
  num_correct += np.sum(correct)
  
# avg test loss
print("Test loss: {:.3f}".format(np.mean(test_losses)))

# accuracy over all test data
test_acc = num_correct/len(test_loader.dataset)
print("Test accuracy: {:.3f}".format(test_acc))

Test loss: 0.361
Test accuracy: 0.854


## Inference
Once we are done with training and validating, we can improve training loss and validation loss by playing around with the hyperparameters. Can you find a better set of hyperparams? Play around with it. 

### Task 10: Prediction Function
Now write a prediction function to predict the output for the test set created. Save the results in a CSV file with one column as the reviews and the prediction in the next column. Calculate the accuracy of the test set.

In [None]:
def tokenize_review(test_review):
  test_review = test_review.lower() # lowercase

  # get rid of punctuation
  test_text = text_without_punct(test_review)
  test_text = test_text.split('\n')

  # splitting by spaces
  test_words = split_in_words(test_text)

  # tokens
  test_ints = []
  test_ints.append([vocab_to_int[word] for word in test_words])

  return test_ints

# test code to see if tokenization is working correctly and generate tokenized review
test_review = "I'm not too sure about this one."
test_ints = tokenize_review(test_review)
print(test_ints)

[[3795, 24, 98, 249, 44, 11, 30]]


In [None]:
# test sequence padding
seq_length = 200
features = pad_features(test_ints, seq_length)
print(features)

# test conversion to tensor and pass into your model
feature_tensor = torch.from_numpy(features)
print(feature_tensor.size())

def predict(net, test_review, sequence_length=200):
    
  net.eval()
  
  # tokenize review
  test_ints = tokenize_review(test_review)
  
  # pad tokenized sequence
  seq_length=sequence_length
  features = pad_features(test_ints, seq_length)
  
  # convert to tensor to pass into your model
  feature_tensor = torch.from_numpy(features)
  
  batch_size = feature_tensor.size(0)
  
  # initialize hidden state
  h = net.init_hidden(batch_size)
  
  if(train_on_gpu):
      feature_tensor = feature_tensor.cuda()
  
  # get the output from the model
  output, h = net(feature_tensor, h)
  
  # convert output probabilities to predicted class (0 or 1)
  pred = torch.round(output.squeeze()) 
  # printing output value, before rounding
  print('Prediction value, pre-rounding: {:.6f}'.format(output.item()))
  
  # print custom response
  if(pred.item()==1):
      print("Positive review detected!")
  else:
      print("Negative review detected.")


[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  76
  717 354]]
torch.Size([1, 200])


In [63]:
test_review_neg = "I'm not too sure about this one."
test_review_pos = "Was a bit skeptical at first. But the ending was great and I liked it!"
seq_length = 200 
predict(net, test_review_neg, seq_length)
predict(net, test_review_pos, seq_length)

Prediction value, pre-rounding: 0.000245
Negative review detected.
Prediction value, pre-rounding: 0.998541
Positive review detected!


In [None]:
# To predict on the CSV
import csv
def predict():
    net.eval()
    test_h = net.init_hidden(batch_size)
    number_of_samples =0
    correct_samples =0
    with open('out.csv', 'w') as outFile:
        writer = csv.writer(outFile)
        first_row =["text","label","predict"]
        writer.writerow(first_row)
        for inputs, labels in test_loader:
            number_of_samples += len(inputs)
            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            test_h = tuple([each.data for each in test_h])

            if(train_on_gpu):
                inputs, labels = inputs.cuda(), labels.cuda()

            output, test_h = net(inputs, test_h)
            #print(output[0],labels.float()[0])
            for i in range(len(output)):
                row =[reviews_[i+22500],str(labels[i].item())]
                if output[i]>=0.5:
                    row.extend("1")
                else:
                    row.extend("0")    
                writer.writerow(row)
                if output[i] >=0.5 and labels[i] ==1:
                    correct_samples+=1
                if  output[i] <0.5 and labels[i] ==0:
                    correct_samples+=1

        outFile.close()
        print("test_accuracy =",100*(correct_samples /number_of_samples),"%" )            

print("saved to out.csv")
predict()