<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ</b></h3>

---

# Задание 3

## Классификация текстов

В этом задании вам предстоит попробовать несколько методов, используемых в задаче классификации, а также понять насколько хорошо модель понимает смысл слов и какие слова в примере влияют на результат.

In [204]:
import pandas as pd
import numpy as np
import torch

from torchtext.legacy import datasets
from torchtext.legacy.data import Field, LabelField, BucketIterator, dataset

from torchtext.vocab import Vectors, GloVe

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
from tqdm.autonotebook import tqdm

В этом задании мы будем использовать библиотеку torchtext. Она довольна проста в использовании и поможет нам сконцентрироваться на задаче, а не на написании Dataloader-а.

In [205]:
TEXT = Field(sequential=True, lower=True, include_lengths=True)  # Поле текста
LABEL = LabelField(dtype=torch.float)  # Поле метки

In [206]:
SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

Датасет на котором мы будем проводить эксперементы это комментарии к фильмам из сайта IMDB.

In [207]:
train, test = datasets.IMDB.splits(TEXT, LABEL)  # загрузим датасет
train, valid = train.split(random_state=random.seed(SEED))  # разобьем на части

In [208]:
TEXT.build_vocab(train)
LABEL.build_vocab(train)

In [209]:
device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, valid_iter, test_iter = BucketIterator.splits(
    (train, valid, test), 
    batch_size = 64,
    sort_within_batch = True,
    device = device)

In [210]:
print(device)

cuda


In [211]:
type(train_iter)

torchtext.legacy.data.iterator.BucketIterator

## RNN

Для начала попробуем использовать рекурентные нейронные сети. На семинаре вы познакомились с GRU, вы можете также попробовать LSTM. Можно использовать для классификации как hidden_state, так и output последнего токена.

In [None]:
# LSTM

In [212]:
class RNNBaseline(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        super().__init__()

        self.bidirectional = bidirectional
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.dropout = dropout
        
        
        self.rnn = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=n_layers, 
                           dropout = dropout, bidirectional = bidirectional)  # YOUR CODE GOES HERE
        
        if self.bidirectional:
          self.fc = nn.Linear(2*hidden_dim, output_dim) # YOUR CODE GOES HERE
          
        else:
          self.fc = nn.Linear(hidden_dim, output_dim)

        
        
    def forward(self, text, text_lengths):
        
        #text = [sent len, batch size]
        
        embedded = self.embedding(text)
        
        #embedded = [sent len, batch size, emb dim]
        
        #pack sequence
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu())
        
        # cell arg for LSTM, remove for GRU
        packed_output, (hidden, cell) = self.rnn(packed_embedded)
        #unpack sequence
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)  

        #output = [sent len, batch size, hid dim * num directions]
        #output over padding tokens are zero tensors
        if self.bidirectional:
            hidden = torch.cat([hidden[-2,:,:], hidden[-1,:,:]], dim=1)
        else:
            # only one direction, take the last one
            hidden = hidden[-1,:,:]


        #hidden = [num layers * num directions, batch size, hid dim]
        #cell = [num layers * num directions, batch size, hid dim]
        
        #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
        #and apply dropout
        
        hidden = nn.Dropout(p=self.dropout)(hidden) # YOUR CODE GOES HERE
                
        #hidden = [batch size, hid dim * num directions] or [batch_size, hid dim * num directions]
            
        return self.fc(hidden)

Поиграйтесь с гиперпараметрами

In [213]:
vocab_size = len(TEXT.vocab)
emb_dim = 100
hidden_dim = 256
output_dim = 1
n_layers = 2
bidirectional = True
dropout = 0.2
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
patience=3

In [221]:
vocab_size = len(TEXT.vocab)
emb_dim = 100
hidden_dim = 200
output_dim = 1
n_layers = 3
bidirectional = True
dropout = 0.2
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
patience=3

In [222]:
model = RNNBaseline(
    vocab_size=vocab_size,
    embedding_dim=emb_dim,
    hidden_dim=hidden_dim,
    output_dim=output_dim,
    n_layers=n_layers,
    bidirectional=bidirectional,
    dropout=dropout,
    pad_idx=PAD_IDX
)

In [223]:
model = model.to(device)

In [224]:
optimizer = torch.optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
max_epochs = 20

Обучите сетку! Используйте любые вам удобные инструменты, Catalyst, PyTorch Lightning или свои велосипеды.

In [225]:
import numpy as np


def training_val(model, criterion, optimizer, max_epochs, train_iter, valid_iter, max_grad_norm=3,device=device):
  cur_patience = 0
  min_loss = np.inf

  for epoch in range(1, max_epochs + 1):
      train_loss = 0.0
      model.train()
      pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
      pbar.set_description(f"Epoch {epoch}")

      for it, batch in pbar:

        #reset gradients
        optimizer.zero_grad()

        #retrieve text and its length from batch, send it to gpu if pssbl

        text, text_lengths = batch.text
        labels = batch.label
        input_embeds = text.to(device)
        labels = torch.unsqueeze(labels, 1).to(device)

        prediction = model(input_embeds, text_lengths)
        loss = criterion(prediction, labels)
        train_loss += loss
        loss.backward()
        if max_grad_norm is not None:
          torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
        optimizer.step()


      train_loss /= len(train_iter)
      val_loss = 0.0
      model.eval()
      with torch.no_grad():

        num_iter = 0

        pbar = tqdm(enumerate(valid_iter), total=len(valid_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        for it, batch in pbar:
          
          text, text_lengths = batch.text
          labels = batch.label

          input_embeds = text.to(device)
          labels = torch.unsqueeze(labels, 1).to(device)

          prediction = model(input_embeds, text_lengths)
          val_loss += criterion(prediction, labels)

          num_iter += 1

        val_loss /= len(valid_iter)


        if val_loss < min_loss:
          min_loss = val_loss
          best_model = model.state_dict()
        else:
          cur_patience += 1
          if cur_patience == patience:
            cur_patience = 0
            break
    
      print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, train_loss, val_loss))
  model.load_state_dict(best_model)

In [226]:
def test_rnn (model, criterion, test_iter, threshold = 0.5,device=device):

  f1 = 0
  tp = 0
  tn= 0 
  fp = 0
  fn = 0
  with torch.no_grad():
    correct=0
    num_objs=0

    pbar = tqdm(enumerate(test_iter), total=len(test_iter), leave=False)
    model.eval()
    for it, batch in pbar:
      text, text_lengths = batch.text
      labels = batch.label
      input_embeds = text.to(device)

      labels = torch.unsqueeze(labels, 1).to(device)
      prediction = model(input_embeds, text_lengths)

      num_objs += len(labels)

      pred = torch.sigmoid(prediction)
      pred = (pred > threshold).to(torch.float)

      pred = pred.reshape(-1,1).to(device)
      labl = labels.reshape(-1,1).to(device)

      correct += (labl == pred).sum()


      tp += (labl * pred).sum().to(torch.float32)
      tn += ((1 - labl) * (1 - pred)).sum().to(torch.float32)
      fp += ((1 - labl) * pred).sum().to(torch.float32)
      fn += (labl * (1 - pred)).sum().to(torch.float32)
      
    
  f1 = tp / (tp + 0.5*(fp + fn))
  acc = correct / num_objs

  print(f"F1 score: {f1:.4}, accuracy: {acc}")


Посчитайте f1-score вашего классификатора на тестовом датасете.

**Ответ**:

In [None]:
print(device)

cpu


In [227]:
training_val(model=model,
                 criterion=criterion, 
                 optimizer=optimizer, 
                 max_epochs=20, 
                 train_iter=train_iter,
                 valid_iter=valid_iter)

  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.6742078065872192, Validation Loss: 0.6441646814346313


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.5321886539459229, Validation Loss: 0.4583098888397217


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.3147236406803131, Validation Loss: 0.3972834050655365


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.17886105179786682, Validation Loss: 0.4311824142932892


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.08379177749156952, Validation Loss: 0.4645198583602905


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

In [228]:
test_rnn(model=model,
                 criterion=criterion, 
                 optimizer=optimizer,
                 max_epochs = 2,
                 test_iter = test_iter)

  0%|          | 0/391 [00:00<?, ?it/s]

F1 score: 0.8443, accuracy: 0.835319995880127


F1 с RNN-LSTM без игры с параметрами: 0.8108

После небольшого изменения параметров: 0.844

## CNN

![](https://www.researchgate.net/publication/333752473/figure/fig1/AS:769346934673412@1560438011375/Standard-CNN-on-text-classification.png)

Для классификации текстов также часто используют сверточные нейронные сети. Идея в том, что как правило сентимент содержат словосочетания из двух-трех слов, например "очень хороший фильм" или "невероятная скука". Проходясь сверткой по этим словам мы получим какой-то большой скор и выхватим его с помощью MaxPool. Далее идет обычная полносвязная сетка. Важный момент: свертки применяются не последовательно, а параллельно. Давайте попробуем!

In [119]:
TEXT = Field(sequential=True, lower=True, batch_first=True)  # batch_first тк мы используем conv  
LABEL = LabelField(batch_first=True, dtype=torch.float)

train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split(random_state=random.seed(SEED))

TEXT.build_vocab(trn)
LABEL.build_vocab(trn)

device = "cuda" if torch.cuda.is_available() else "cpu"

In [121]:
train_iter, val_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 256, 256),
        sort=False,
        sort_key= lambda x: len(x.src),
        sort_within_batch=False,
        device=device,
        repeat=False,
)

Вы можете использовать Conv2d с `in_channels=1, kernel_size=(kernel_sizes[0], emb_dim))` или Conv1d c `in_channels=emb_dim, kernel_size=kernel_size[0]`. Но хорошенько подумайте над shape в обоих случаях.

In [122]:
class CNN(nn.Module):
    def __init__(
        self,
        vocab_size,
        emb_dim,
        out_channels,
        kernel_sizes,
        dropout=0.5,
    ):
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, emb_dim)
        self.conv_0 = nn.Conv1d(emb_dim, out_channels=out_channels ,kernel_size = kernel_sizes[0], padding = 1)  # YOUR CODE GOES HERE
        
        self.conv_1 = nn.Conv1d(emb_dim, out_channels=out_channels ,kernel_size = kernel_sizes[1], padding = 1) # YOUR CODE GOES HERE
        
        self.conv_2 = nn.Conv1d(emb_dim, out_channels=out_channels ,kernel_size = kernel_sizes[2], padding = 1)  # YOUR CODE GOES HERE
        
        self.fc = nn.Linear(len(kernel_sizes) * out_channels, 1)
        
        self.dropout = nn.Dropout(dropout)
        
        
    def forward(self, text):
        
        embedded = self.embedding(text)
        
        embedded = embedded.permute(0, 2, 1)  # may be reshape here
        
        conved_0 = F.relu(self.conv_0(embedded))  # may be reshape here
        conved_1 = F.relu(self.conv_1(embedded))  # may be reshape here
        conved_2 = F.relu(self.conv_2(embedded))  # may be reshape here
        
        pooled_0 = F.max_pool1d(conved_0, conved_0.shape[2]).squeeze(2)
        pooled_1 = F.max_pool1d(conved_1, conved_1.shape[2]).squeeze(2)
        pooled_2 = F.max_pool1d(conved_2, conved_2.shape[2]).squeeze(2)
        
        cat = self.dropout(torch.cat((pooled_0, pooled_1, pooled_2), dim=1))
            
        return self.fc(cat)

In [123]:
kernel_sizes = [3, 4, 5]
vocab_size = len(TEXT.vocab)
out_channels=64
dropout = 0.5
dim = 300

model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=out_channels, kernel_sizes=kernel_sizes, dropout=dropout)

In [124]:
model.to(device)

CNN(
  (embedding): Embedding(202264, 300)
  (conv_0): Conv1d(300, 64, kernel_size=(3,), stride=(1,), padding=(1,))
  (conv_1): Conv1d(300, 64, kernel_size=(4,), stride=(1,), padding=(1,))
  (conv_2): Conv1d(300, 64, kernel_size=(5,), stride=(1,), padding=(1,))
  (fc): Linear(in_features=192, out_features=1, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

In [125]:
opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss(reduction='sum')

In [126]:
max_epochs = 30

Обучите!

In [165]:
import numpy as np

def cnn_tr_val(model, criterion, optimizer, max_epochs, train_iter, val_iter, max_grad_norm=None, device=device):
  
  cur_patience = 0
  min_loss = np.inf
  
  for epoch in range(1, max_epochs + 1):
    train_loss = 0.0
    model.train()
    pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
    pbar.set_description(f"Epoch {epoch}")
    for it, batch in pbar: 

        optimizer.zero_grad()
        text = batch.text
        input_embeds = text.to(device)
        labels = torch.unsqueeze(batch.label, 1).to(device)

        prediction = model(input_embeds)
        loss = criterion(prediction, labels)

        loss.backward()
        train_loss += loss.item()

        if max_grad_norm is not None:
          torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
        optimizer.step()        

    train_loss /= len(train_iter)


    val_loss = 0.0
    val_num = 0
    model.eval()
    with torch.no_grad():
      pbar = tqdm(enumerate(val_iter), total=len(val_iter), leave=False)
      pbar.set_description(f"Epoch {epoch}")
      for it, batch in pbar:
        text = batch.text
        input_embeds = text.to(device)
        labels = torch.unsqueeze(batch.label, 1).to(device)
        prediction = model(input_embeds)
        loss = criterion(prediction, labels)
        val_loss += loss.item()
        val_num += len(labels)

      val_loss /= val_num
      if val_loss < min_loss:
          min_loss = val_loss
          best_model = model.state_dict()
      else:
          cur_patience += 1
          if cur_patience == patience:
              cur_patience = 0
              break
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, train_loss, val_loss))
  model.load_state_dict(best_model)

In [167]:
cnn_tr_val(model=model, criterion=loss_func, optimizer=opt, max_grad_norm=None,train_iter=train_iter, val_iter=val_iter, device=device,max_epochs=20)

  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.18464822583172444, Validation Loss: 0.8355679646809896


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.1902768118103055, Validation Loss: 0.854855263264974


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.12680906512482215, Validation Loss: 0.8767903198242187


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

In [175]:
def test_cnn(model, criterion, test_iter, threshold = 0.5,device=device):
  tp = 0
  tn= 0 
  fp = 0
  fn = 0
  with torch.no_grad():
    correct=0
    num_objs=0

    pbar = tqdm(enumerate(test_iter), total=len(test_iter), leave=False)
    model.eval()
    for it, batch in pbar:
      text = batch.text
      input_embeds = text.to(device)
      labels = torch.unsqueeze(batch.label, 1).to(device)
      prediction = model(input_embeds)

      num_objs += len(labels)

      pred = torch.sigmoid(prediction)
      pred = (pred > threshold).to(torch.float)

      pred = pred.reshape(-1,1).to(device)
      labl = labels.reshape(-1,1).to(device)

      correct += (labl == pred).sum()

      tp += (labl * pred).sum().to(torch.float32)
      tn += ((1 - labl) * (1 - pred)).sum().to(torch.float32)
      fp += ((1 - labl) * pred).sum().to(torch.float32)
      fn += (labl * (1 - pred)).sum().to(torch.float32)
      
    
  f1 = tp / (tp + 0.5*(fp + fn))
  acc = correct / num_objs

  print(f"F1 score: {f1:.4}, accuracy: {acc}")


In [176]:
test_cnn(model=model, criterion=loss_func,
        test_iter=test_iter, device=device)

  0%|          | 0/98 [00:00<?, ?it/s]

F1 score: 0.8188, accuracy: 0.8146799802780151


Посчитайте f1-score вашего классификатора.

**Ответ**: 0.8188

## Интерпретируемость

Посмотрим, куда смотрит наша модель. Достаточно запустить код ниже.

In [177]:
!pip install -q captum

[?25l[K     |▎                               | 10 kB 25.4 MB/s eta 0:00:01[K     |▌                               | 20 kB 23.1 MB/s eta 0:00:01[K     |▊                               | 30 kB 12.5 MB/s eta 0:00:01[K     |█                               | 40 kB 9.5 MB/s eta 0:00:01[K     |█▏                              | 51 kB 5.4 MB/s eta 0:00:01[K     |█▍                              | 61 kB 6.0 MB/s eta 0:00:01[K     |█▋                              | 71 kB 5.7 MB/s eta 0:00:01[K     |██                              | 81 kB 6.4 MB/s eta 0:00:01[K     |██▏                             | 92 kB 4.9 MB/s eta 0:00:01[K     |██▍                             | 102 kB 5.3 MB/s eta 0:00:01[K     |██▋                             | 112 kB 5.3 MB/s eta 0:00:01[K     |██▉                             | 122 kB 5.3 MB/s eta 0:00:01[K     |███                             | 133 kB 5.3 MB/s eta 0:00:01[K     |███▎                            | 143 kB 5.3 MB/s eta 0:00:01[K  

In [178]:
from captum.attr import LayerIntegratedGradients, TokenReferenceBase, visualization

PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)

In [179]:
def forward_with_softmax(inp):
    logits = model(inp)
    return torch.softmax(logits, 0)[0][1]

def forward_with_sigmoid(input):
    return torch.sigmoid(model(input))


# accumalate couple samples in this array for visualization purposes
vis_data_records_ig = []

def interpret_sentence(model, sentence, min_len = 7, label = 0):
    model.eval()
    text = [tok for tok in TEXT.tokenize(sentence)]
    if len(text) < min_len:
        text += ['pad'] * (min_len - len(text))
    indexed = [TEXT.vocab.stoi[t] for t in text]

    model.zero_grad()

    input_indices = torch.tensor(indexed, device=device)
    input_indices = input_indices.unsqueeze(0)
    
    # input_indices dim: [sequence_length]
    seq_length = min_len

    # predict
    pred = forward_with_sigmoid(input_indices).item()
    pred_ind = round(pred)

    # generate reference indices for each sample
    reference_indices = token_reference.generate_reference(seq_length, device=device).unsqueeze(0)

    # compute attributions and approximation delta using layer integrated gradients
    attributions_ig, delta = lig.attribute(input_indices, reference_indices, \
                                           n_steps=5000, return_convergence_delta=True)

    print('pred: ', LABEL.vocab.itos[pred_ind], '(', '%.2f'%pred, ')', ', delta: ', abs(delta))

    add_attributions_to_visualizer(attributions_ig, text, pred, pred_ind, label, delta, vis_data_records_ig)
    
def add_attributions_to_visualizer(attributions, text, pred, pred_ind, label, delta, vis_data_records):
    attributions = attributions.sum(dim=2).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    attributions = attributions.cpu().detach().numpy()

    # storing couple samples in an array for visualization purposes
    vis_data_records.append(visualization.VisualizationDataRecord(
                            attributions,
                            pred,
                            LABEL.vocab.itos[pred_ind],
                            LABEL.vocab.itos[label],
                            LABEL.vocab.itos[1],
                            attributions.sum(),       
                            text,
                            delta))

In [180]:
interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

pred:  pos ( 0.99 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([1.8254e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([2.8724e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([0.0002], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.01 ) , delta:  tensor([3.5079e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([6.5488e-05], device='cuda:0', dtype=torch.float64)


In [186]:
interpret_sentence(model, 'Deep Learning is a suspensful thriller', label=1)
interpret_sentence(model, 'DL\'s something you better not try alone', label=0)

pred:  pos ( 0.97 ) , delta:  tensor([0.0003], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.11 ) , delta:  tensor([5.8515e-05], device='cuda:0', dtype=torch.float64)


Попробуйте добавить свои примеры!

In [187]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.69,It was a fantastic performance ! pad
,,,,
pos,neg (0.00),pos,1.49,Best film ever pad pad pad pad
,,,,
pos,neg (0.00),pos,1.37,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,1.0,It was a horrible movie pad pad
,,,,
neg,neg (0.01),pos,1.61,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.69,It was a fantastic performance ! pad
,,,,
pos,neg (0.00),pos,1.49,Best film ever pad pad pad pad
,,,,
pos,neg (0.00),pos,1.37,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,1.0,It was a horrible movie pad pad
,,,,
neg,neg (0.01),pos,1.61,I've never watched something as bad pad
,,,,


## Эмбеддинги слов

Вы ведь не забыли, как мы можем применить знания о word2vec и GloVe. Давайте попробуем!

In [190]:
TEXT.build_vocab(trn, vectors=GloVe())# YOUR CODE GOES HERE
# подсказка: один из импортов пока не использовался, быть может он нужен в строке выше :)
LABEL.build_vocab(trn)

word_embeddings = TEXT.vocab.vectors

kernel_sizes = [3, 4, 5]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300

.vector_cache/glove.840B.300d.zip: 2.18GB [07:14, 5.01MB/s]                            
100%|█████████▉| 2196016/2196017 [05:01<00:00, 7281.51it/s]


In [193]:
train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split(random_state=random.seed(SEED))

device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, val_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 256, 256),
        sort=False,
        sort_key= lambda x: len(x.src),
        sort_within_batch=False,
        device=device,
        repeat=False,
)

In [199]:
model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=64,
            kernel_sizes=kernel_sizes, dropout=dropout)

word_embeddings = TEXT.vocab.vectors

prev_shape = model.embedding.weight.shape

model.embedding.weight.data.copy_(word_embeddings)

#model.embedding.weight = word_embeddings# инициализируйте эмбэдинги

assert prev_shape == model.embedding.weight.shape
model.to(device)

opt = torch.optim.Adam(model.parameters())

Вы знаете, что делать.

In [200]:
cnn_tr_val(model=model, criterion=loss_func, optimizer=opt, max_grad_norm=None,train_iter=train_iter, val_iter=val_iter, device=device,max_epochs=20)

  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 62.691232931874964, Validation Loss: 0.3502676378885905


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 37.79346376962035, Validation Loss: 0.29537369588216145


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 22.93571676825085, Validation Loss: 0.2872915367126465


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 10.69864973708661, Validation Loss: 0.30187635320027667


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 4.068130278239285, Validation Loss: 0.3374638707478841


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

In [201]:
test_cnn(model=model, criterion=loss_func,
        test_iter=test_iter, device=device)

  0%|          | 0/98 [00:00<?, ?it/s]

F1 score: 0.8715, accuracy: 0.8725199699401855


Посчитайте f1-score вашего классификатора.

**Ответ**:0.8715

Вывод. Иногда CNN отлично подходят для работы с текстом. С эмбеддингами CNN показала скоринг даже лучше чем на LSTM-RNN (без особой подстройки и эмбеддингов)

Проверим насколько все хорошо!

In [202]:
PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)
vis_data_records_ig = []

interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

pred:  pos ( 0.99 ) , delta:  tensor([0.0004], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.01 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.20 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([1.5681e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.32 ) , delta:  tensor([7.4271e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)


In [203]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.79,It was a fantastic performance ! pad
,,,,
pos,neg (0.01),pos,1.29,Best film ever pad pad pad pad
,,,,
pos,neg (0.20),pos,1.68,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.53,It was a horrible movie pad pad
,,,,
neg,neg (0.32),pos,1.5,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.79,It was a fantastic performance ! pad
,,,,
pos,neg (0.01),pos,1.29,Best film ever pad pad pad pad
,,,,
pos,neg (0.20),pos,1.68,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.53,It was a horrible movie pad pad
,,,,
neg,neg (0.32),pos,1.5,I've never watched something as bad pad
,,,,
