# 情感分析

## 第一步：导入豆瓣电影数据集，只有训练集和测试集

- TorchText中的一个重要概念是`Field`。`Field`决定了你的数据会被怎样处理。在我们的情感分类任务中，我们所需要接触到的数据有文本字符串和两种情感，"pos"或者"neg"。
- `Field`的参数制定了数据会被怎样处理。
- 我们使用`TEXT` field来定义如何处理电影评论，使用`LABEL` field来处理两个情感类别。
- 我们的`TEXT` field带有`tokenize='spacy'`，这表示我们会用[spaCy](https://spacy.io) tokenizer来tokenize英文句子。如果我们不特别声明`tokenize`这个参数，那么默认的分词方法是使用空格。
- 安装spaCy
```
pip install -U spacy
python -m spacy download en
```
- `LABEL`由`LabelField`定义。这是一种特别的用来处理label的`Field`。我们后面会解释dtype。
- 更多关于`Fields`，参见https://github.com/pytorch/text/blob/master/torchtext/data/field.py
- 和之前一样，我们会设定random seeds使实验可以复现。


- TorchText支持很多常见的自然语言处理数据集。
- 下面的代码会自动下载IMDb数据集，然后分成train/test两个`torchtext.datasets`类别。数据被前面的`Fields`处理。IMDb数据集一共有50000电影评论，每个评论都被标注为正面的或负面的。

<font color=red><b>先了解下Spacy库：[spaCy介绍和使用教程](https://juejin.im/post/5971a4b9f265da6c42353332?utm_source=gold_browser_extension%5D)</b></font>  
<font color=red><b>再了解下torchtext库：[torchtext介绍和使用教程](https://blog.csdn.net/u012436149/article/details/79310176)：这个新手必看，不看下面代码听不懂</b></font> 

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import torch
from torchtext import data

SEED = 1234
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True  #在程序刚开始加这条语句可以提升一点训练速度，没什么额外开销。

#首先，我们要创建两个Field 对象：这两个对象包含了我们打算如何预处理文本数据的信息。
TEXT = data.Field(tokenize='moses')#torchtext.data.Field: 用来定义字段的处理方法（文本字段，标签字段）
LABEL = data.LabelField(dtype=torch.float)#LabelField是Field类的一个特殊子集，专门用于处理标签。 

In [3]:
#这一步花时间很长 数据集很大
from torchtext import datasets
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL) # 1比1划分

In [4]:
print(vars(train_data.examples[0]))#分词包有小bug，不过能跑起来就不纠结了

{'text': ['magellan33', 'said', ':', '&quot;', 'You', 'can', 'only', 'do', 'so', 'much', 'when', 'the', 'two', 'stars', 'of', 'the', 'show', 'can', 'only', 'be', 'seen', 'by', 'one', 'fellow', 'cast', 'member', '.', '&quot;', '&lt;', 'br', '/', '&gt;', '&lt;', 'br', '/', '&gt;', 'I', 'assume', ',', 'then', ',', 'that', 'you', 'never', 'heard', 'of', '&quot;', 'Topper', '&quot;', '.', '&lt;', 'br', '/', '&gt;', '&lt;', 'br', '/', '&gt;', 'Which', ',', 'in', 'addition', 'to', 'the', 'two', 'stars', 'who', 'could', 'only', 'be', 'seen', 'by', 'one', 'member', 'of', 'the', 'cast', ',', 'had', 'a', 'dog', ',', 'ditto', '.', '&lt;', 'br', '/', '&gt;', '&lt;', 'br', '/', '&gt;', 'This', 'was', 'the', 'kind', 'of', 'program', 'that', 'had', '&quot;', 'Not', 'Gonna', 'Make', 'It', '&quot;', 'written', 'allover', 'it', 'from', 'the', 'first', 'episode', '-', 'it', 'was', 'like', 'an', 'arcade', 'video', 'game', 'where', 'you', 'actually', 'have', 'to', 'read', 'the', 'instructions', 'to', 'play'

## 第二步：训练集划分为训练集和验证集

- 由于只有train/test这两个分类，所以我们需要创建一个新的validation set。我们可以使用`.split()`创建新的分类。
- 默认的数据分割是 70、30，如果我们声明`split_ratio`，可以改变split之间的比例，`split_ratio=0.8`表示80%的数据是训练集，20%是验证集。
- 我们还声明`random_state`这个参数，确保我们每次分割的数据集都是一样的。

In [5]:
import random
train_data, valid_data = train_data.split(random_state=random.seed(SEED)) #默认split_ratio=0.7

In [6]:
print(f'Number of training examples: {len(train_data)}')
print(f'Number of validation examples: {len(valid_data)}')
print(f'Number of testing examples: {len(test_data)}')

Number of training examples: 17500
Number of validation examples: 7500
Number of testing examples: 25000


## 第三步：用训练集建立vocabulary，就是把每个单词一一映射到一个数字。

- 下一步我们需要创建 _vocabulary_ 。_vocabulary_ 就是把每个单词一一映射为一个one-hot vector。
![](assets/sentiment5.png)
- 我们使用最常见的25k个单词来构建我们的单词表，用`max_size`这个参数可以做到这一点。
- 所有其他的单词都用`<unk>`来表示。

In [9]:
#从预训练的词向量（vectors） 中，将当前corpus语料库词汇表的词向量抽取出来。
#预训练的 vectors 来自glove模型，每个单词有100维。glove模型训练的词向量参数来自很大的语料库，
#而我们的电影评论的语料库小一点，所以词向量需要更新，glove的词向量适合用做初始化参数。
#下载太慢，暂时不用
#TEXT.build_vocab(train_data, max_size=25000, vectors="glove.6B.100d", unk_init=torch.Tensor.normal_)
TEXT.build_vocab(train_data, max_size=25000)
LABEL.build_vocab(train_data) 

In [10]:
print(f"Unique tokens in TEXT vocabulary: {len(TEXT.vocab)}")
print(f"Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}")

Unique tokens in TEXT vocabulary: 25002
Unique tokens in LABEL vocabulary: 2


In [11]:
print(TEXT.vocab.freqs.most_common(20))

[('the', 201933), (',', 192197), ('.', 187602), ('and', 108789), ('a', 108756), ('of', 100171), ('to', 93102), ('/', 75846), ('is', 74246), ('&gt;', 71179), ('&lt;', 71131), ('br', 71096), ('in', 60895), ('I', 56738), ('it', 53984), ('that', 48950), ('&quot;', 46206), ('&apos;s', 43355), ('this', 42283), ('was', 33294)]


In [12]:
print(TEXT.vocab.stoi) #语料库单词频率越高，索引越靠前。前两个默认为unk和pad。



In [13]:
print(TEXT.vocab.itos[:10]) #查看TEXT单词表前10个

['<unk>', '<pad>', 'the', ',', '.', 'and', 'a', 'of', 'to', '/']


## 第四步：创建iterators，每个itartion都会返回一个batch的样本。

- 最后一步数据的准备是创建iterators。每个itartion都会返回一个batch的examples。
- 我们会使用`BucketIterator`。`BucketIterator`会把长度差不多的句子放到同一个batch中，确保每个batch中不出现太多的padding。
- 严格来说，我们这份notebook中的模型代码都有一个问题，也就是我们把`<pad>`也当做了模型的输入进行训练。更好的做法是在模型中把由`<pad>`产生的输出给消除掉。在这节课中我们简单处理，直接把`<pad>`也用作模型输入了。由于`<pad>`数量不多，模型的效果也不差。
- 如果我们有GPU，指定每个iteration返回的tensor都在GPU(device)上。

In [14]:
BATCH_SIZE = 64
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')

#相当于把样本划分batch，只是多做了一步，把相等长度的单词尽可能的划分到一个batch，不够长的就用padding。
train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size=BATCH_SIZE,
    device=device)

In [30]:
next(iter(train_iterator)).label.shape

torch.Size([64])

In [18]:
next(iter(train_iterator)).text 

tensor([[   15,   221, 24020,  ...,    15,  4057,   140],
        [  247,    44,    86,  ...,   161,   903,  2722],
        [   20,   136,    22,  ...,   126,     2,     3],
        ...,
        [    1,     1,     1,  ...,     1,     1,     1],
        [    1,     1,     1,  ...,     1,     1,     1],
        [    1,     1,     1,  ...,     1,     1,     1]], device='cuda:0')

In [21]:
#多运行一次可以发现一条评论的单词长度会变
#每列是一个句子，1表示pad
next(iter(train_iterator)).text.shape

torch.Size([1019, 64])

In [25]:
[TEXT.vocab.itos[i] for i in next(iter(train_iterator)).text[:,0]][:30]

['Oh',
 ',',
 'this',
 'is',
 'such',
 'a',
 'glorious',
 'musical',
 '.',
 'There',
 '&apos;s',
 'a',
 'bit',
 'of',
 'miscasting',
 '--',
 'Frank',
 'Sinatra',
 'is',
 'sorely',
 'miscast',
 'as',
 'the',
 'Jewish',
 'Nathan',
 'Detroit',
 ',',
 'though',
 'it',
 'only']

## 第五步：创建Word Averaging模型

### Word Averaging模型

- 我们首先介绍一个简单的Word Averaging模型。这个模型非常简单，我们把每个单词都通过`Embedding`层投射成word embedding vector，然后把一句话中的所有word vector做个平均，就是整个句子的vector表示了。接下来把这个sentence vector传入一个`Linear`层，做分类即可。

![](assets/sentiment8.png)

- 我们使用[`avg_pool2d`](https://pytorch.org/docs/stable/nn.html?highlight=avg_pool2d#torch.nn.functional.avg_pool2d)来做average pooling。我们的目标是把sentence length那个维度平均成1，然后保留embedding这个维度。
- `avg_pool2d`的kernel size是 (`embedded.shape[1]`, 1)，所以句子长度的那个维度会被压扁。

![](assets/sentiment9.png)

![](assets/sentiment10.png)

![](assets/sentiment11.png)


In [26]:
import torch.nn as nn
import torch.nn.functional as F

class WordAVGModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, output_dim, pad_idx):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        # output_dim输出的维度，二分类=1就可以了
        #（batch size, embedding_dim）*（embedding_dim, output_dim）=（batch size,output_dim）
        self.fc = nn.Linear(embedding_dim, output_dim)
        
    def forward(self, text):
        embedded = self.embedding(text) 
        # text下面会指定，为一个batch的数据[seq_len,batch_size]
        # embedded = [seq len, batch size, emb dim]
        # 假设[sent len, batch size, emb dim]=（1000，64，100）
        #这个代码我猜测进行了运算：（text：1000，64，25000）*（self.embedding：1000，25000，100）= （1000，64，100）
        
        # 0,1维度互换，[batch size, sent len, emb dim]
        embedded = embedded.permute(1, 0, 2) 
        # [batch size, embedding_dim] 把单词长度的维度压扁为1，并降成2维
        pooled = F.avg_pool2d(embedded, (embedded.shape[1], 1)).squeeze(1)
        
        #压缩维度至1维，不然跟batch.label维度对不上(output_dim=1)
        return self.fc(pooled).squeeze(1)

In [27]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
OUTPUT_DIM = 1
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token] 

model = WordAVGModel(INPUT_DIM, EMBEDDING_DIM, OUTPUT_DIM, PAD_IDX)

In [28]:
#统计trainale参数，可以不用管
def count_parameters(model): 
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 2,500,301 trainable parameters


## 第六步：初始化参数

In [None]:
#把上面vectors="glove.6B.100d"取出的词向量作为初始化参数，数量为25000*100个参数
#pretrained_embeddings = TEXT.vocab.vectors
#model.embedding.weight.data.copy_(pretrained_embeddings) #遇到_的语句直接替换，不需要另外赋值=

In [29]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token] #UNK_IDX=0
#词汇表25002个单词，前两个unk和pad也需要初始化
model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

## 第七步：训练模型

In [31]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters()) #定义优化器
criterion = nn.BCEWithLogitsLoss() #BCE:binary cross entropy,二分类损失函数;logits:针对返回的logits计算loss

model = model.to(device)
criterion = criterion.to(device)

计算预测的准确率

In [32]:
def binary_accuracy(preds, y): #计算准确率
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this yields 0.8
    """

    #round predictions to the closest integer(0,1)
    rounded_preds = torch.round(torch.sigmoid(preds))
    
    correct = (rounded_preds == y).float()
    acc = correct.sum()/len(correct) #len(correct)==len(y)
    return acc

In [38]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    total_len = 0
    
    #model.train()代表了训练模式
    #这步一定要加，是为了区分model训练和测试的模式的。
    #有时候训练时会用到dropout、归一化等方法，但是测试的时候不能用dropout等方法。
    model.train() 

    for batch in iterator: #iterator为train_iterator

        predictions = model(batch.text).
        
        loss = criterion(predictions, batch.label)
        acc = binary_accuracy(predictions, batch.label)
        
        optimizer.zero_grad()
        loss.backward() #反向传播
        optimizer.step() #梯度下降

        #计算train_iterator所有样本的数量，不出意外应该是17500
        total_len += len(batch.label)
        
        #loss.item()已经本身除以了len(batch.label)
        #所以得再乘一次，得到一个batch的损失，累加得到所有样本损失。
        epoch_loss += loss.item() * len(batch.label)

        #（acc.item()：一个batch的正确率） *batch数 = 正确数
        #train_iterator所有batch的正确数累加。     
        epoch_acc += acc.item() * len(batch.label)
        
    #epoch_loss / total_len ：train_iterator所有batch的损失
    #epoch_acc / total_len ：train_iterator所有batch的正确率
    return epoch_loss / total_len, epoch_acc / total_len

In [39]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    total_len = 0
    
    model.eval()
    
    with torch.no_grad():
        for batch in iterator: 
            #iterator为valid_iterator
            #只需计算指标，没有反向传播和梯度下降
            predictions = model(batch.text).squeeze(1)
            loss = criterion(predictions, batch.label)
            acc = binary_accuracy(predictions, batch.label)
            
            epoch_loss += loss.item() * len(batch.label)
            epoch_acc += acc.item() * len(batch.label)
            total_len += len(batch.label)
            
    model.train()   
    
    return epoch_loss / total_len, epoch_acc / total_len

In [40]:
import time 
#查看每个epoch的时间
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

## 第八步：查看模型运行结果

In [None]:
N_EPOCHS = 10
best_valid_loss = float('inf') #无穷大

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss: #只要模型效果变好，就存模型
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'wordavg-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

## 第九步：预测结果

In [None]:
model.load_state_dict(torch.load("wordavg-model.pt"))

In [None]:
#spacy有bug，以下代码跑不了
'''import spacy
nlp = spacy.load('en')

def predict_sentiment(sentence):
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]#分词
    indexed = [TEXT.vocab.stoi[t] for t in tokenized] 
    #sentence的索引
    
    tensor = torch.LongTensor(indexed).to(device) #seq_len
    tensor = tensor.unsqueeze(1) 
    #seq_len * batch_size(=1)
    
    prediction = torch.sigmoid(model(tensor))
    #tensor与text一样的tensor
    
    return prediction.item()'''

In [None]:
predict_sentiment("I love This film bad ")

In [None]:
predict_sentiment("This film is great")

## RNN模型

- 下面我们尝试把模型换成一个**recurrent neural network** (RNN)。RNN经常会被用来encode一个sequence
$$h_t = \text{RNN}(x_t, h_{t-1})$$
- 我们使用最后一个hidden state $h_T$来表示整个句子。
- 然后我们把$h_T$通过一个线性变换$f$，然后用来预测句子的情感。

![](assets/sentiment1.png)

![](assets/sentiment7.png)

In [50]:
class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, 
                 n_layers, bidirectional, dropout, pad_idx):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=n_layers, 
                           bidirectional=bidirectional, dropout=dropout)
        self.fc = nn.Linear(hidden_dim*2, output_dim) # *2: bidirectional
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        embedded = self.dropout(self.embedding(text)) #[sent len, batch size, emb dim]
        output, (hidden, cell) = self.rnn(embedded) #不传入hidden，hidden默认为全0
        #output = [sent len, batch size, hid dim * num directions]
        #hidden(h) = [num layers * num directions, batch size, hid dim]
        #cell(c) = [num layers * num directions, batch size, hid dim]
        
        #bidirectional
        #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) of the last hidden layers
        #layer first,bidirectional 
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)) # [batch size, hid dim * 2]
        
        return self.fc(hidden).squeeze(1)

In [52]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, 
            N_LAYERS, BIDIRECTIONAL, DROPOUT, PAD_IDX)

In [53]:
print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 4,810,857 trainable parameters


In [54]:
#model.embedding.weight.data.copy_(pretrained_embeddings)
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

#print(model.embedding.weight.data)

## 训练RNN模型

In [55]:
optimizer = optim.Adam(model.parameters())
model = model.to(device)

In [56]:
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
    start_time = time.time()
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'lstm-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

RuntimeError: CUDA out of memory. Tried to allocate 3.13 GiB (GPU 0; 11.78 GiB total capacity; 5.32 GiB already allocated; 2.28 GiB free; 7.05 GiB reserved in total by PyTorch)

In [41]:
batch = next(iter(train_iterator))

In [42]:
outputs, (hidden,cell) = model.rnn(model.embedding(batch.text))

In [43]:
outputs.shape

torch.Size([702, 64, 512])

In [44]:
hidden.shape

torch.Size([4, 64, 256])

In [46]:
hidden[-1,:,:].shape

torch.Size([64, 256])

In [45]:
torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1).shape

torch.Size([64, 512])

You may have noticed the loss is not really decreasing and the accuracy is poor. This is due to several issues with the model which we'll improve in the next notebook.

Finally, the metric we actually care about, the test loss and accuracy, which we get from our parameters that gave us the best validation loss.

In [None]:
model.load_state_dict(torch.load('lstm-model.pt'))
test_loss, test_acc = evaluate(model, test_iterator, criterion)
print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

## CNN模型

In [1]:
class CNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, n_filters, 
                 filter_sizes, output_dim, dropout, pad_idx):
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        self.convs = nn.ModuleList([
                                    nn.Conv2d(in_channels = 1, out_channels = n_filters, 
                                              kernel_size = (fs, embedding_dim)) 
                                    for fs in filter_sizes
                                    ])
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        text = text.permute(1, 0) # [batch size, sent len, vocab size]
        embedded = self.embedding(text) # [batch size, sent len, emb dim]
        embedded = embedded.unsqueeze(1) # [batch size, 1, sent len, emb dim]
        conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]            
        #conv_n = [batch size, n_filters, sent len - filter_sizes[n],1]
        #conved_n = [batch size, n_filters, sent len - filter_sizes[n]]
        pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
        #pooled_n = [batch size, n_filters]
        cat = self.dropout(torch.cat(pooled, dim=1))
        #cat = [batch size, n_filters * len(filter_sizes)]
        return self.fc(cat)

NameError: name 'nn' is not defined

In [None]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
N_FILTERS = 100
FILTER_SIZES = [3,4,5]
OUTPUT_DIM = 1
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model = CNN(INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_IDX)
model.embedding.weight.data.copy_(pretrained_embeddings)

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)
model = model.to(device)

In [None]:
optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
criterion = criterion.to(device)

N_EPOCHS = 5

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'CNN-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

In [None]:
model.load_state_dict(torch.load('CNN-model.pt'))
test_loss, test_acc = evaluate(model, test_iterator, criterion)
print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')