<a href="https://colab.research.google.com/github/Dianna22/pytorch-sentiment-analysis/blob/master/MLP_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2 - Updated Sentiment Analysis

In the previous notebook, we got the fundamentals down for sentiment analysis. In this notebook, we'll actually get decent results.

We will use:
- packed padded sequences
- pre-trained word embeddings
- different RNN architecture
- bidirectional RNN
- multi-layer RNN
- regularization
- a different optimizer

This will allow us to achieve ~84% test accuracy.

## Preparing Data

As before, we'll set the seed, define the `Fields` and get the train/valid/test splits.

We'll be using *packed padded sequences*, which will make our RNN only process the non-padded elements of our sequence, and for any padded element the `output` will be a zero tensor. To use packed padded sequences, we have to tell the RNN how long the actual sequences are. We do this by setting `include_lengths = True` for our `TEXT` field. This will cause `batch.text` to now be a tuple with the first element being our sentence (a numericalized tensor that has been padded) and the second element being the actual lengths of our sentences.

In [0]:
import torch
from torchtext import data
from torchtext import datasets

SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(tokenize = 'spacy', include_lengths = True)
LABEL = data.LabelField(dtype = torch.float)

We then load the IMDb dataset.

In [0]:
from torchtext import datasets

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

In [7]:
print(len(test_data.examples))

25000


Then create the validation set from our training set.

In [0]:
import random

train_data, valid_data = train_data.split(random_state = random.seed(SEED))

In [9]:
print(len(train_data.examples))
print(len(test_data.examples))
print(len(valid_data.examples))



17500
25000
7500


Next is the use of pre-trained word embeddings. Now, instead of having our word embeddings initialized randomly, they are initialized with these pre-trained vectors.
We get these vectors simply by specifying which vectors we want and passing it as an argument to `build_vocab`. `TorchText` handles downloading the vectors and associating them with the correct words in our vocabulary.

Here, we'll be using the `"glove.6B.100d" vectors"`. `glove` is the algorithm used to calculate the vectors, go [here](https://nlp.stanford.edu/projects/glove/) for more. `6B` indicates these vectors were trained on 6 billion tokens and `100d` indicates these vectors are 100-dimensional.

You can see the other available vectors [here](https://github.com/pytorch/text/blob/master/torchtext/vocab.py#L113).

The theory is that these pre-trained vectors already have words with similar semantic meaning close together in vector space, e.g. "terrible", "awful", "dreadful" are nearby. This gives our embedding layer a good initialization as it does not have to learn these relations from scratch.

**Note**: these vectors are about 862MB, so watch out if you have a limited internet connection.

By default, TorchText will initialize words in your vocabulary but not in your pre-trained embeddings to zero. We don't want this, and instead initialize them randomly by setting `unk_init` to `torch.Tensor.normal_`. This will now initialize those words via a Gaussian distribution.

In [0]:
MAX_VOCAB_SIZE = 25_000

TEXT.build_vocab(train_data, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = "glove.6B.300d", 
                 unk_init = torch.Tensor.normal_)

LABEL.build_vocab(train_data)

As before, we create the iterators, placing the tensors on the GPU if one is available.

Another thing for packed padded sequences all of the tensors within a batch need to be sorted by their lengths. This is handled in the iterator by setting `sort_within_batch = True`.

In [0]:
BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE,
    sort_within_batch = True,
    device = device)

## Build the Model

The model features the most drastic changes.

### Different RNN Architecture

We'll be using a different RNN architecture called a Long Short-Term Memory (LSTM). Why is an LSTM better than a standard RNN? Standard RNNs suffer from the [vanishing gradient problem](https://en.wikipedia.org/wiki/Vanishing_gradient_problem). LSTMs overcome this by having an extra recurrent state called a _cell_, $c$ - which can be thought of as the "memory" of the LSTM - and the use use multiple _gates_ which control the flow of information into and out of the memory. For more information, go [here](https://colah.github.io/posts/2015-08-Understanding-LSTMs/). We can simply think of the LSTM as a function of $x_t$, $h_t$ and $c_t$, instead of just $x_t$ and $h_t$.

$$(h_t, c_t) = \text{LSTM}(x_t, h_t, c_t)$$

Thus, the model using an LSTM looks something like (with the embedding layers omitted):

![](https://github.com/Dianna22/pytorch-sentiment-analysis/blob/master/assets/sentiment2.png?raw=1)

The initial cell state, $c_0$, like the initial hidden state is initialized to a tensor of all zeros. The sentiment prediction is still, however, only made using the final hidden state, not the final cell state, i.e. $\hat{y}=f(h_T)$.

### Bidirectional RNN

The concept behind a bidirectional RNN is simple. As well as having an RNN processing the words in the sentence from the first to the last (a forward RNN), we have a second RNN processing the words in the sentence from the **last to the first** (a backward RNN). At time step $t$, the forward RNN is processing word $x_t$, and the backward RNN is processing word $x_{T-t+1}$. 

In PyTorch, the hidden state (and cell state) tensors returned by the forward and backward RNNs are stacked on top of each other in a single tensor. 

We make our sentiment prediction using a concatenation of the last hidden state from the forward RNN (obtained from final word of the sentence), $h_T^\rightarrow$, and the last hidden state from the backward RNN (obtained from the first word of the sentence), $h_T^\leftarrow$, i.e. $\hat{y}=f(h_T^\rightarrow, h_T^\leftarrow)$   

The image below shows a bi-directional RNN, with the forward RNN in orange, the backward RNN in green and the linear layer in silver.  

![](https://github.com/Dianna22/pytorch-sentiment-analysis/blob/master/assets/sentiment3.png?raw=1)

### Multi-layer RNN

Multi-layer RNNs (also called *deep RNNs*) are another simple concept. The idea is that we add additional RNNs on top of the initial standard RNN, where each RNN added is another *layer*. The hidden state output by the first (bottom) RNN at time-step $t$ will be the input to the RNN above it at time step $t$. The prediction is then made from the final hidden state of the final (highest) layer.

The image below shows a multi-layer unidirectional RNN, where the layer number is given as a superscript. Also note that each layer needs their own initial hidden state, $h_0^L$.

![](https://github.com/Dianna22/pytorch-sentiment-analysis/blob/master/assets/sentiment4.png?raw=1)

### Regularization

Although we've added improvements to our model, each one adds additional parameters. Without going into overfitting into too much detail, the more parameters you have in in your model, the higher the probability that your model will overfit (memorize the training data, causing  a low training error but high validation/testing error, i.e. poor generalization to new, unseen examples). To combat this, we use regularization. More specifically, we use a method of regularization called *dropout*. Dropout works by randomly *dropping out* (setting to 0) neurons in a layer during a forward pass. The probability that each neuron is dropped out is set by a hyperparameter and each neuron with dropout applied is considered indepenently. One theory about why dropout works is that a model with parameters dropped out can be seen as a "weaker" (less parameters) model. The predictions from all these "weaker" models (one for each forward pass) get averaged together withinin the parameters of the model. Thus, your one model can be thought of as an ensemble of weaker models, none of which are over-parameterized and thus should not overfit.

### Implementation Details

Another addition to this model is that we are not going to learn the embedding for the `<pad>` token. This is because we want to explitictly tell our model that padding tokens are irrelevant to determining the sentiment of a sentence. This means the embedding for the pad token will remain at what it is initialized to (we initialize it to all zeros later). We do this by passing the index of our pad token as the `padding_idx` argument to the `nn.Embedding` layer.

To use an LSTM instead of the standard RNN, we use `nn.LSTM` instead of `nn.RNN`. Also, note that the LSTM returns the `output` and a tuple of the final `hidden` state and the final `cell` state, whereas the standard RNN only returned the `output` and final `hidden` state. 

As the final hidden state of our LSTM has both a forward and a backward component, which will be concatenated together, the size of the input to the `nn.Linear` layer is twice that of the hidden dimension size.

Implementing bidirectionality and adding additional layers are done by passing values for the `num_layers` and `bidirectional` arguments for the RNN/LSTM. 

Dropout is implemented by initializing an `nn.Dropout` layer (the argument is the probability of dropping out each neuron) and using it within the `forward` method after each layer we want to apply dropout to. **Note**: never use dropout on the input or output layers (`text` or `fc` in this case), you only ever want to use dropout on intermediate layers. The LSTM has a `dropout` argument which adds dropout on the connections between hidden states in one layer to hidden states in the next layer. 

As we are passing the lengths of our sentences to be able to use packed padded sequences, we have to add a second argument, `text_lengths`, to `forward`. 

Before we pass our embeddings to the RNN, we need to pack them, which we do with `nn.utils.rnn.packed_padded_sequence`. This will cause our RNN to only process the non-padded elements of our sequence. The RNN will then return `packed_output` (a packed sequence) as well as the `hidden` and `cell` states (both of which are tensors). Without packed padded sequences, `hidden` and `cell` are tensors from the last element in the sequence, which will most probably be a pad token, however when using packed padded sequences they are both from the last non-padded element in the sequence. 

We then unpack the output sequence, with `nn.utils.rnn.pad_packed_sequence`, to transform it from a packed sequence to a tensor. The elements of `output` from padding tokens will be zero tensors (tensors where every element is zero). Usually, we only have to unpack output if we are going to use it later on in the model. Although we aren't in this case, we still unpack the sequence just to show how it is done.

The final hidden state, `hidden`, has a shape of _**[num layers * num directions, batch size, hid dim]**_. These are ordered: **[forward_layer_0, backward_layer_0, forward_layer_1, backward_layer 1, ..., forward_layer_n, backward_layer n]**. As we want the final (top) layer forward and backward hidden states, we get the top two hidden layers from the first dimension, `hidden[-2,:,:]` and `hidden[-1,:,:]`, and concatenate them together before passing them to the linear layer (after applying dropout). 

#### Vocabulary


In [0]:
corpus = {} # doc_label: {word: word_freq_df}
df = {}
for instance in train_data.examples:
  for word in instance.text:
    word = word.lower()
    df[word] = df.get(word,0)+1
    corpus[instance.label] = corpus.get(instance.label, {}) # {word: df}
    corpus[instance.label][word] = corpus[instance.label].get(word, 0) + 1


In [0]:
def compute_tf_idf(word, doc, doc_freq):
  tf = doc_freq[doc][word]/sum(doc_freq[doc].values())
  df = len([doc for doc in corpus.keys() if word if corpus[doc]])
  idf = 2/df
  return tf * idf

In [14]:
import math

tf_idf = {} # {doc(label): {word:score}}
N = len(corpus)
print(N, "classes")
for doc in corpus.keys():
  tf_idf[doc] = tf_idf.get(doc,{})
  for word in corpus[doc]:
    word = word.lower()
    tf_idf[doc][word] = compute_tf_idf(word, doc, corpus)

2 classes


In [0]:
sorted_words = {doc: [key for key in sorted(tf_idf[doc].keys(), key=tf_idf[doc].get, reverse=True)] for doc in tf_idf.keys()} 

In [139]:
tf_idf['pos']['yes']

0.00019762163403552114

In [17]:
max(tf_idf['neg'].values())

0.04741636929535363

In [18]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [0]:
from nltk.corpus import stopwords
import string
stop_words = set(stopwords.words('english'))

In [0]:
def get_vocab(sentiment):
  ignore_set = set(string.punctuation).union(set('0123456789'))
  return [word for word in sorted_words[sentiment] if word not in stop_words and not set(word).intersection(ignore_set)]

In [0]:
neg_vocab = get_vocab('neg')
pos_vocab = get_vocab('pos')

In [142]:
 analyser.polarity_scores("yes")

{'compound': 0.4019, 'neg': 0.0, 'neu': 0.0, 'pos': 1.0}

In [22]:
from nltk import sentiment



In [23]:
pip install vaderSentiment



In [0]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

In [0]:
n_vocab = [word for word in neg_vocab if analyser.polarity_scores(word)['neu'] != 1]
p_vocab = [word for word in pos_vocab if analyser.polarity_scores(word)['neu'] != 1]

True

In [0]:
VOCAB_LIST = list(set(n_vocab[:100]+p_vocab[:100]))

1

In [27]:
len(VOCAB_LIST)

136

In [0]:
VOCAB = [TEXT.vocab.stoi[word] for word in VOCAB_LIST]

In [0]:
a=['like',
 'good',
 'bad',
 'well',
 'better',
 'great',
 'worst',
 'funny',
 'want',
 'love',
 'pretty',
 'best',
 'horror',
 'original',
 'interesting',
 'kind',
 'awful',
 'comedy',
 'boring',
 'terrible',
 'poor',
 'hard',
 'stupid',
 'sure',
 'waste',
 'low',
 'worse',
 'wrong',
 'special',
 'dead',
 'played',
 'worth',
 'horrible',
 'play',
 'fun',
 'problem',
 'death',
 'help',
 'nice',
 'killer',
 'fan',
 'plays',
 'unfortunately',
 'true',
 'care',
 'crap',
 'friends',
 'ridiculous',
 'annoying',
 'top',
 'evil',
 'laugh',
 'playing',
 'save',
 'kill',
 'god',
 'lost',
 'yes',
 'truly',
 'war',
 'lack',
 'ok',
 'killed',
 'dull',
 'hand',
 'seriously',
 'recommend',
 'enjoy',
 'lame',
 'hope',
 '\x96',
 'hell',
 'certainly',
 'cut',
 'friend',
 'please',
 'stop',
 'jokes',
 'liked',
 'avoid',
 'scary',
 'beautiful',
 'leave',
 'badly',
 'violence',
 'b',
 'humor',
 'silly',
 'fight',
 'hero',
 'entertaining',
 'sorry',
 'huge',
 'interest',
 'alone',
 'disappointed',
 'weak',
 'matter',
 'feeling',
 'definitely',
 'like',
 'good',
 'great',
 'well',
 'love',
 'best',
 'better',
 'funny',
 'bad',
 'comedy',
 'fun',
 'excellent',
 'beautiful',
 'played',
 'pretty',
 'want',
 'horror',
 'interesting',
 'true',
 'plays',
 'wonderful',
 'original',
 'hard',
 'sure',
 'play',
 'kind',
 'perfect',
 'war',
 'worth',
 'enjoy',
 'nice',
 'definitely',
 'amazing',
 'loved',
 'top',
 'help',
 'truly',
 'special',
 'recommend',
 'favorite',
 'friends',
 'fan',
 'death',
 'playing',
 'brilliant',
 'liked',
 'enjoyed',
 'fine',
 'friend',
 'strong',
 'lost',
 'hope',
 'certainly',
 'entertaining',
 'dead',
 'humor',
 'evil',
 'yes',
 'fantastic',
 'supporting',
 '\x96',
 'wrong',
 'murder',
 'low',
 'chance',
 'greatest',
 'feeling',
 'hand',
 'happy',
 'romantic',
 'laugh',
 'hilarious',
 'matter',
 'number',
 'fight',
 'important',
 'enjoyable',
 'miss',
 'superb',
 'novel',
 'leave',
 'killer',
 'care',
 'wish',
 'hero',
 'violence',
 'problem',
 'perfectly',
 'thriller',
 'serious',
 'easy',
 'strange',
 'easily',
 'surprised',
 'alone',
 'sad',
 'powerful',
 'stop',
 'crime',
 'interest']

In [30]:
str([ f"{len(a)-i} {word}" for i, word in enumerate(a)]).replace("'","").replace(",","")

'[200 like 199 good 198 bad 197 well 196 better 195 great 194 worst 193 funny 192 want 191 love 190 pretty 189 best 188 horror 187 original 186 interesting 185 kind 184 awful 183 comedy 182 boring 181 terrible 180 poor 179 hard 178 stupid 177 sure 176 waste 175 low 174 worse 173 wrong 172 special 171 dead 170 played 169 worth 168 horrible 167 play 166 fun 165 problem 164 death 163 help 162 nice 161 killer 160 fan 159 plays 158 unfortunately 157 true 156 care 155 crap 154 friends 153 ridiculous 152 annoying 151 top 150 evil 149 laugh 148 playing 147 save 146 kill 145 god 144 lost 143 yes 142 truly 141 war 140 lack 139 ok 138 killed 137 dull 136 hand 135 seriously 134 recommend 133 enjoy 132 lame 131 hope 130 \\x96 129 hell 128 certainly 127 cut 126 friend 125 please 124 stop 123 jokes 122 liked 121 avoid 120 scary 119 beautiful 118 leave 117 badly 116 violence 115 b 114 humor 113 silly 112 fight 111 hero 110 entertaining 109 sorry 108 huge 107 interest 106 alone 105 disappointed 104 wea

#### Model

In [0]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx, explanation_vocab):
        
        super().__init__()
        self.emb_dim = embedding_dim
        
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        self.rnn = nn.LSTM(embedding_dim, 
                           hidden_dim, 
                           num_layers=n_layers, 
                           bidirectional=bidirectional, 
                           dropout=dropout)
        self.gen = nn.LSTM(embedding_dim, 
                           hidden_dim, 
                           num_layers=n_layers, 
                           bidirectional=bidirectional, 
                           dropout=dropout)

        self.fc = nn.Linear(hidden_dim * 2, output_dim)

        self.lin = nn.Linear(self.emb_dim, 512)

        self.expl_emb = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        # self.gen = CNN(embedding_dim, 100, [3,4,5],1)
        self.gen_lin = nn.Linear(2*hidden_dim, len(explanation_vocab))
        self.gen_softmax = nn.Softmax(2)      
        self.explanations = torch.tensor(explanation_vocab, device=self.device)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text, text_lengths):
        batch_size = text.size()[1]
        
        #text = [sent len, batch size]
        
        embedded = self.dropout(self.embedding(text))
        
        #embedded = [sent len, batch size, emb dim]
        
        #pack sequence
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths)

        ##GEN
        # # [sent len, batch, 2*hidden_dim]
        # expl_activ, (_, _) = self.gen(embedded)
        # expl_activ = nn.Dropout(0.4)(expl_activ)
        # [sent, batch, vocab]
        expl_activ = self.lin(embedded)
        lin_activ = self.gen_lin(expl_activ)
        expl_activ = nn.Dropout(0.2)(lin_activ)
        # [sent, batch, vocab]
        expl_distribution = self.gen_softmax(lin_activ)
        
        # [batch, sent, vocab]
        expl_distribution = torch.transpose(expl_distribution, 0, 1)

        # [vocab_size, emb_dim]
        v_emb = self.expl_emb(self.explanations)

        #[batch, vocab_size, emd_dim]
        vocab_emb = v_emb.repeat(batch_size,1,1)

        
        # [batch,sent, emb_dim]
        expl = torch.bmm(expl_distribution, vocab_emb)

        #[batch, 1, emb_dim]
        context_vector = torch.mean(expl, dim=1).unsqueeze(1)

        sep = torch.rand((batch_size,1,self.emb_dim), device=self.device)
        # [batch, 1+1, emb_dim]
        sep_vocab = torch.cat((sep, context_vector),1)

        #[batch, sent, emb]
        x = torch.transpose(embedded,0,1)

        # [batch, sent_len+2, emb_dim]
        concat_input = torch.cat((x,sep_vocab),1) 

        #[sent_len+1, batch, emb_dim]
        final_input = torch.transpose(concat_input,0,1)
        
        packed_all = nn.utils.rnn.pack_padded_sequence(final_input, text_lengths+2)
        
        packed_output, (hidden, cell) = self.rnn(packed_all)
        
        #unpack sequence
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)

        #output = [sent len, batch size, hid dim * num directions]
        #output over padding tokens are zero tensors
        
        #hidden = [num layers * num directions, batch size, hid dim]
        #cell = [num layers * num directions, batch size, hid dim]
        
        #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
        #and apply dropout
        
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
                
        #hidden = [batch size, hid dim * num directions]
            
        return self.fc(hidden), expl_distribution

Like before, we'll create an instance of our RNN class, with the new parameters and arguments for the number of layers, bidirectionality and dropout probability.

To ensure the pre-trained vectors can be loaded into the model, the `EMBEDDING_DIM` must be equal to that of the pre-trained GloVe vectors loaded earlier.

We get our pad token index from the vocabulary, getting the actual string representing the pad token from the field's `pad_token` attribute, which is `<pad>` by default.

In [32]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 300
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = RNN(INPUT_DIM, 
            EMBEDDING_DIM, 
            HIDDEN_DIM, 
            OUTPUT_DIM, 
            N_LAYERS, 
            BIDIRECTIONAL, 
            DROPOUT, 
            PAD_IDX,
            VOCAB)
print(len(VOCAB))

136


We'll print out the number of parameters in our model. 

Notice how we have almost twice as many parameters as before!

In [33]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 20,665,081 trainable parameters


The final addition is copying the pre-trained word embeddings we loaded earlier into the `embedding` layer of our model.

We retrieve the embeddings from the field's vocab, and check they're the correct size, _**[vocab size, embedding dim]**_ 

In [34]:
pretrained_embeddings = TEXT.vocab.vectors

print(pretrained_embeddings.shape)

torch.Size([25002, 300])


We then replace the initial weights of the `embedding` layer with the pre-trained embeddings.

**Note**: this should always be done on the `weight.data` and not the `weight`!

In [35]:
model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[-0.1117, -0.4966,  0.1631,  ..., -1.4447,  0.8402, -0.8668],
        [ 0.1032, -1.6268,  0.5729,  ...,  0.3180, -0.1626, -0.0417],
        [ 0.0466,  0.2132, -0.0074,  ...,  0.0091, -0.2099,  0.0539],
        ...,
        [-0.2064, -0.3348, -1.0158,  ..., -1.3281, -0.5336,  1.0823],
        [ 0.9746,  0.5796, -1.6229,  ...,  0.2855,  1.7823, -0.0436],
        [ 0.4945, -1.1232, -0.0642,  ..., -0.3861, -1.4883, -0.5126]])

As our `<unk>` and `<pad>` token aren't in the pre-trained vocabulary they have been initialized using `unk_init` (an $\mathcal{N}(0,1)$ distribution) when building our vocab. It is preferable to initialize them both to all zeros to explicitly tell our model that, initially, they are irrelevant for determining sentiment. 

We do this by manually setting their row in the embedding weights matrix to zeros. We get their row by finding the index of the tokens, which we have already done for the padding index.

**Note**: like initializing the embeddings, this should be done on the `weight.data` and not the `weight`!

In [36]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

print(model.embedding.weight.data)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0466,  0.2132, -0.0074,  ...,  0.0091, -0.2099,  0.0539],
        ...,
        [-0.2064, -0.3348, -1.0158,  ..., -1.3281, -0.5336,  1.0823],
        [ 0.9746,  0.5796, -1.6229,  ...,  0.2855,  1.7823, -0.0436],
        [ 0.4945, -1.1232, -0.0642,  ..., -0.3861, -1.4883, -0.5126]])


We can now see the first two rows of the embedding weights matrix have been set to zeros. As we passed the index of the pad token to the `padding_idx` of the embedding layer it will remain zeros throughout training, however the `<unk>` token embedding will be learned.

## Train the Model

#### Train prep


Now to training the model.

The only change we'll make here is changing the optimizer from `SGD` to `Adam`. SGD updates all parameters with the same learning rate and choosing this learning rate can be tricky. `Adam` adapts the learning rate for each parameter, giving parameters that are updated more frequently lower learning rates and parameters that are updated infrequently higher learning rates. More information about `Adam` (and other optimizers) can be found [here](http://ruder.io/optimizing-gradient-descent/index.html).

To change `SGD` to `Adam`, we simply change `optim.SGD` to `optim.Adam`, also note how we do not have to provide an initial learning rate for Adam as PyTorch specifies a sensibile default initial learning rate.

In [0]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters())

The rest of the steps for training the model are unchanged.

We define the criterion and place the model and criterion on the GPU (if available)...

In [0]:
criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

We implement the function to calculate accuracy...

In [0]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """

    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

We define a function for training our model. 

As we have set `include_lengths = True`, our `batch.text` is now a tuple with the first element being the numericalized tensor and the second element being the actual lengths of each sequence. We separate these into their own variables, `text` and `text_lengths`, before passing them to the model.

**Note**: as we are now using dropout, we must remember to use `model.train()` to ensure the dropout is "turned on" while training.

In [0]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in tqdm.tqdm(iterator):
        
        optimizer.zero_grad()
        
        text, text_lengths = batch.text
        
        predictions, expl = model(text, text_lengths)
        predictions = predictions.squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

Then we define a function for testing our model, again remembering to separate `batch.text`.

**Note**: as we are now using dropout, we must remember to use `model.eval()` to ensure the dropout is "turned off" while evaluating.

In [0]:
def evaluate(model, iterator, criterion, filename="eval.expl"):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
        with open(filename, "w") as f:
            for batch in iterator:
                text, text_lengths = batch.text
                
                predictions, expl = model(text, text_lengths)
                predictions = predictions.squeeze(1)
                for i in range(predictions.size()[0]):
                  if torch.sigmoid(predictions[i]).round()!=batch.label[i]:
                    print(" ".join([TEXT.vocab.itos[t] for t in text[:,i]]), file=f)
                    print(torch.sigmoid(predictions[i]), file=f)
                    print(batch.label[i], file=f)
                    print(decode_expl(expl[i]), file=f)
                    print("**", file=f)

                
                loss = criterion(predictions, batch.label)
                
                acc = binary_accuracy(predictions, batch.label)

                epoch_loss += loss.item()
                epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

And also create a nice function to tell us how long our epochs are taking.

In [0]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

Finally, we train our model...

#### Train

In [115]:
import os
print( os.getcwd() )
print( os.listdir(os.getcwd()) )

/content
['.config', 'eval-3.expl', '.data', 'eval-1.expl', 'eval-0.expl', 'tut2-model-dr.pt', '.vector_cache', 'eval-2.expl', 'eval.expl', 'eval-4.expl', 'sample_data']


In [0]:
from google.colab import files
files.download( "eval.expl" ) 

In [0]:
# pip install tqdm

In [113]:
evaluate(model, test_iterator, criterion)

KeyboardInterrupt: ignored

In [107]:
import tqdm
N_EPOCHS = 15
tr_loss = []
val_loss= []
train_accs, val_accs = [], []
best_valid_loss = float('inf')
prev=123
cont=True
patience=3
for epoch in  tqdm.tqdm(range(N_EPOCHS)):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion, f"eval-{epoch}.expl")

    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut2-model-dr.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
    train_accs.append(train_acc)
    val_accs.append(valid_acc)
    tr_loss.append(train_loss)
    val_loss.append(valid_loss)
    if valid_loss > prev:
      if cont:
        patience -=1
      else:
        cont=True
    else:
      cont = False
    if patience == 0:
      break
    prev = valid_loss











  0%|          | 0/15 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A










  0%|          | 0/274 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A










  0%|          | 1/274 [00:00<00:39,  6.98it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|          | 2/274 [00:00<00:45,  5.96it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|          | 3/274 [00:00<00:41,  6.54it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|▏         | 4/274 [00:00<00:37,  7.28it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 6/274 [00:00<00:30,  8.72it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 7/274 [00:00<00:38,  7.00it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 9/274 [00:01<00:33,  8.00it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▍         | 11/274 [00:01<00:34,  7.53it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▍         | 13/274 [00:01<00:32,  8.10it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▌ 

yes
dead
brilliant
yes
dead
brilliant
dead
yes
brilliant
dead
yes
brilliant
yes
dead
brilliant
yes
easily
dead
dead
yes
brilliant
yes
dead
easily
yes
dead
brilliant
yes
dead
easily
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
dead
yes
brilliant
dead
yes
brilliant
yes
dead
brilliant
dead
yes
brilliant
dead
yes
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
easily
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
dead
yes
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
easily
dead
yes
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
dead
yes
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead











  7%|▋         | 1/15 [00:59<13:50, 59.30s/it][A[A[A[A[A[A[A[A[A[A










  0%|          | 0/274 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A










  0%|          | 1/274 [00:00<00:40,  6.70it/s][A[A[A[A[A[A[A[A[A[A[A

Epoch: 01 | Epoch Time: 0m 59s
	Train Loss: 0.060 | Train Acc: 97.91%
	 Val. Loss: 0.397 |  Val. Acc: 89.01%













  1%|          | 2/274 [00:00<00:47,  5.76it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|          | 3/274 [00:00<00:43,  6.26it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|▏         | 4/274 [00:00<00:42,  6.30it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 5/274 [00:00<00:40,  6.69it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 7/274 [00:01<00:44,  6.05it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 8/274 [00:01<00:40,  6.54it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 9/274 [00:01<00:52,  5.06it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▎         | 10/274 [00:01<00:53,  4.92it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▍         | 12/274 [00:01<00:42,  6.22it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▍         | 13/274 [00:02<00:39,  6.67it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▌         | 14/274 [00:02<00:36,  7.17it/s][A[A[A[A[A[A[A[A[A

easily
brilliant
dead
easily
dead
brilliant
easily
brilliant
dead
easily
dead
brilliant
easily
brilliant
dead
easily
yes
brilliant
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
dead
brilliant
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
bril











 13%|█▎        | 2/15 [01:59<12:52, 59.44s/it][A[A[A[A[A[A[A[A[A[A










  0%|          | 0/274 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A

easily
brilliant
dead
easily
brilliant
dead
Epoch: 02 | Epoch Time: 0m 59s
	Train Loss: 0.053 | Train Acc: 98.30%
	 Val. Loss: 0.417 |  Val. Acc: 88.76%













  0%|          | 1/274 [00:00<00:39,  6.96it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|          | 3/274 [00:00<00:37,  7.28it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|▏         | 4/274 [00:00<00:35,  7.65it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 5/274 [00:00<00:33,  7.99it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 6/274 [00:00<00:41,  6.38it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 7/274 [00:01<00:51,  5.20it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 8/274 [00:01<00:46,  5.69it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 9/274 [00:01<00:47,  5.64it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▍         | 11/274 [00:01<00:39,  6.67it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▍         | 13/274 [00:01<00:35,  7.33it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▌         | 15/274 [00:01<00:31,  8.29it/s][A[A[A[A[A[A[A[A[A[

easily
dead
brilliant
easily
brilliant
dead
easily
dead
brilliant
easily
dead
yes
easily
yes
brilliant
easily
brilliant
dead
easily
brilliant
dead
easily
dead
brilliant
easily
dead
brilliant
easily
brilliant
dead
easily
dead
brilliant
easily
dead
brilliant
easily
brilliant
dead
easily
dead
brilliant
easily
dead
brilliant
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
yes
easily
brilliant
dead
easily
brilliant
dead
easily
dead
brilliant
easily
brilliant
dead
easily
brilliant
dead
easily
dead
brilliant
easily
brilliant
yes
easily
brilliant
yes
easily
dead
brilliant
easily
brilliant
powerful
easily
dead
brilliant
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
dead
powerful
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
yes
easily
dead
brilliant
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
brilliant
dead
easily
dead
brilliant
easily
brilliant
d











 20%|██        | 3/15 [02:58<11:52, 59.40s/it][A[A[A[A[A[A[A[A[A[A










  0%|          | 0/274 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A










  0%|          | 1/274 [00:00<00:37,  7.22it/s][A[A[A[A[A[A[A[A[A[A[A

Epoch: 03 | Epoch Time: 0m 59s
	Train Loss: 0.041 | Train Acc: 98.63%
	 Val. Loss: 0.435 |  Val. Acc: 88.84%













  1%|          | 3/274 [00:00<00:31,  8.61it/s][A[A[A[A[A[A[A[A[A[A[A










  1%|▏         | 4/274 [00:00<00:31,  8.56it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 5/274 [00:00<00:35,  7.55it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 6/274 [00:00<01:00,  4.43it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 7/274 [00:01<01:05,  4.09it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 8/274 [00:01<00:55,  4.83it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 9/274 [00:01<00:46,  5.67it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▎         | 10/274 [00:01<00:56,  4.68it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▍         | 12/274 [00:01<00:45,  5.71it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▌         | 14/274 [00:02<00:37,  6.97it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▌         | 15/274 [00:02<00:41,  6.21it/s][A[A[A[A[A[A[A[A[A

yes
powerful
brilliant
yes
dead
powerful
yes
brilliant
powerful
yes
brilliant
dead
yes
easily
brilliant
yes
powerful
brilliant
yes
brilliant
dead
yes
brilliant
powerful
yes
powerful
brilliant
yes
brilliant
powerful
yes
powerful
dead
yes
brilliant
dead
yes
brilliant
powerful
yes
brilliant
dead
yes
brilliant
powerful
yes
brilliant
powerful
yes
brilliant
powerful
yes
brilliant
powerful
yes
powerful
brilliant
yes
dead
brilliant
yes
brilliant
powerful
yes
brilliant
powerful
yes
dead
brilliant
yes
brilliant
powerful
yes
brilliant
powerful
yes
dead
powerful
yes
powerful
brilliant
yes
brilliant
powerful
yes
powerful
brilliant
yes
brilliant
powerful
yes
brilliant
powerful
yes
brilliant
powerful
yes
powerful
brilliant
yes
powerful
brilliant
yes
brilliant
powerful
yes
powerful
brilliant
yes
brilliant
powerful
yes
brilliant
dead
yes
brilliant
powerful
yes
brilliant
powerful
yes
brilliant
dead
yes
powerful
brilliant
yes
brilliant
powerful
yes
brilliant
powerful
yes
brilliant
powerful
yes
brilliant












 27%|██▋       | 4/15 [03:57<10:53, 59.43s/it][A[A[A[A[A[A[A[A[A[A










  0%|          | 0/274 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A










  0%|          | 1/274 [00:00<00:43,  6.34it/s][A[A[A[A[A[A[A[A[A[A[A

yes
powerful
brilliant
Epoch: 04 | Epoch Time: 0m 59s
	Train Loss: 0.039 | Train Acc: 98.66%
	 Val. Loss: 0.451 |  Val. Acc: 88.91%













  1%|          | 3/274 [00:00<00:40,  6.69it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 5/274 [00:00<00:39,  6.87it/s][A[A[A[A[A[A[A[A[A[A[A










  2%|▏         | 6/274 [00:00<00:35,  7.48it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 7/274 [00:00<00:34,  7.83it/s][A[A[A[A[A[A[A[A[A[A[A










  3%|▎         | 9/274 [00:01<00:29,  8.94it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▎         | 10/274 [00:01<00:33,  7.83it/s][A[A[A[A[A[A[A[A[A[A[A










  4%|▍         | 11/274 [00:01<00:45,  5.74it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▍         | 13/274 [00:01<00:39,  6.56it/s][A[A[A[A[A[A[A[A[A[A[A










  5%|▌         | 15/274 [00:01<00:32,  7.86it/s][A[A[A[A[A[A[A[A[A[A[A










  6%|▌         | 17/274 [00:01<00:28,  9.02it/s][A[A[A[A[A[A[A[A[A[A[A










  7%|▋         | 19/274 [00:02<00:26,  9.69it/s][A[A[A[A[A[A[A[A[

yes
dead
powerful
dead
yes
brilliant
yes
dead
brilliant
dead
yes
brilliant
yes
easily
dead
dead
yes
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
dead
yes
brilliant
dead
yes
brilliant
dead
yes
brilliant
brilliant
dead
yes
yes
dead
brilliant
dead
yes
brilliant
dead
yes
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
brilliant
dead
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
dead
yes
brilliant
yes
dead
brilliant
dead
yes
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
brilliant
dead
yes
brilliant
dead
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
dead
yes
brilliant
yes
brilliant
dead
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brilliant
yes
dead
brillia

...and get our new and vastly improved test accuracy!

In [0]:
model.load_state_dict(torch.load('tut2-model-dr.pt'))

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

In [0]:
pip install matplotlib

In [0]:
import numpy as np
import matplotlib.pyplot as plt

# Create plots with pre-defined labels.
plt.plot(tr_loss, 'k', label='Train loss')
plt.plot(val_loss, 'c', label='Val loss')
plt.legend(loc='upper center', shadow=True, fontsize='x-large')
plt.show()

In [0]:
plt.plot(train_accs, 'k', label='Train acc')
plt.plot(val_accs, 'k:', label='Val acc')
plt.legend(loc='upper center', shadow=True, fontsize='x-large')
plt.show()

## User Input

We can now use our model to predict the sentiment of any sentence we give it. As it has been trained on movie reviews, the sentences provided should also be movie reviews.

When using a model for inference it should always be in evaluation mode. If this tutorial is followed step-by-step then it should already be in evaluation mode (from doing `evaluate` on the test set), however we explicitly set it to avoid any risk.

Our `predict_sentiment` function does a few things:
- sets the model to evaluation mode
- tokenizes the sentence, i.e. splits it from a raw string into a list of tokens
- indexes the tokens by converting them into their integer representation from our vocabulary
- gets the length of our sequence
- converts the indexes, which are a Python list into a PyTorch tensor
- add a batch dimension by `unsqueeze`ing 
- converts the length into a tensor
- squashes the output prediction from a real number between 0 and 1 with the `sigmoid` function
- converts the tensor holding a single value into an integer with the `item()` method

We are expecting reviews with a negative sentiment to return a value close to 0 and positive reviews to return a value close to 1.

In [0]:
import spacy
nlp = spacy.load('en')

def decode_expl(expl):
  for e in expl:
    # print(torch.argsort(e, 0, True)[:10])
    tops = torch.argsort(e, 0, True)
    # vals, idx = e.max(0)
    # print(TEXT.vocab.itos[VOCAB[tops[0]]])
    # print(TEXT.vocab.itos[VOCAB[tops[1]]])
    # print(TEXT.vocab.itos[VOCAB[tops[2]]])
  # print("**")
  # print(expl.size())
  expl = torch.mean(expl,0)
  tops = torch.argsort(expl, 0, True)
  # # print(tops[:5])
  return [TEXT.vocab.itos[VOCAB[tops[0]]], TEXT.vocab.itos[VOCAB[tops[1]]],TEXT.vocab.itos[VOCAB[tops[2]]]]

def predict_sentiment(model, sentence):
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    length = [len(indexed)]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    length_tensor = torch.LongTensor(length)
    pred, expl = model(tensor, length_tensor)
    print(decode_expl(expl[0]))
    prediction = torch.sigmoid(pred)
    return prediction.item()

In [0]:
torch.save(model, "model-sa-2-dr.pt")

In [0]:
model_loaded = torch.load("model-sa-2-dr.pt")

An example negative review...

In [0]:
a=torch.tensor([[1,2,3],[0.3,5,5]])
print(a.max(1))

In [0]:
# TEXT.vocab.itos[VOCAB[141]]

In [123]:
predict_sentiment(model, "This film is terrible")

['yes', 'brilliant', 'dead']


0.0001698145642876625

An example positive review...

In [122]:
predict_sentiment(model, "glorious robust crucial")

['yes', 'bad', 'sorry']


0.9999889135360718

In [132]:
predict_sentiment(model, "I m going out tonight")

['yes', 'dead', 'easily']


0.9941065311431885

In [119]:
predict_sentiment(model, "A boat builder in a sleepy town in Maine is going out of business , and the lives of all of the ( soon to be <unk> and families are <unk> . The biggest disappointment is that the two stars -- Bates and Bridges -- have only bit <unk> /><br />Interesting , but not something you would see <unk> /><br / > <pad>")

['yes', 'brilliant', 'dead']


0.0001233762304764241

In [120]:
predict_sentiment(model, "This film is great")

['yes', 'dead', 'brilliant']


0.9999399185180664

## Next Steps

We've now built a decent sentiment analysis model for movie reviews! In the next notebook we'll implement a model that gets comparable accuracy with far fewer parameters and trains much, much faster.