<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ</b></h3>

---

# Задание 3

## Классификация текстов

В этом задании вам предстоит попробовать несколько методов, используемых в задаче классификации, а также понять насколько хорошо модель понимает смысл слов и какие слова в примере влияют на результат.

In [None]:
!pip install -q pymorphy2
!pip install torchtext==0.8.1

[K     |████████████████████████████████| 55 kB 2.3 MB/s 
[K     |████████████████████████████████| 8.2 MB 8.6 MB/s 
[?25hCollecting torchtext==0.8.1
  Downloading torchtext-0.8.1-cp37-cp37m-manylinux1_x86_64.whl (7.0 MB)
[K     |████████████████████████████████| 7.0 MB 4.5 MB/s 
Collecting torch==1.7.1
  Downloading torch-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (776.8 MB)
[K     |████████████████████████████████| 776.8 MB 17 kB/s 
Installing collected packages: torch, torchtext
  Attempting uninstall: torch
    Found existing installation: torch 1.9.0+cu111
    Uninstalling torch-1.9.0+cu111:
      Successfully uninstalled torch-1.9.0+cu111
  Attempting uninstall: torchtext
    Found existing installation: torchtext 0.10.0
    Uninstalling torchtext-0.10.0:
      Successfully uninstalled torchtext-0.10.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

In [1]:
import pandas as pd
import numpy as np
import torch

from torchtext.legacy import datasets

from torchtext.legacy.data import Field, LabelField
from torchtext.legacy.data import BucketIterator

from torchtext.vocab import Vectors, GloVe

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
from tqdm.autonotebook import tqdm

В этом задании мы будем использовать библиотеку torchtext. Она довольна проста в использовании и поможет нам сконцентрироваться на задаче, а не на написании Dataloader-а.

In [2]:
TEXT = Field(sequential=True, lower=True, include_lengths=True)  # Поле текста
LABEL = LabelField(dtype=torch.float)  # Поле метки

In [3]:
SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

Датасет на котором мы будем проводить эксперементы это комментарии к фильмам из сайта IMDB.

In [4]:
train, test = datasets.IMDB.splits(TEXT, LABEL)  # загрузим датасет
train, valid = train.split(random_state=random.seed(SEED))  # разобьем на части

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:03<00:00, 21.7MB/s]


In [5]:
TEXT.build_vocab(train)
LABEL.build_vocab(train)

In [6]:
device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, valid_iter, test_iter = BucketIterator.splits(
    (train, valid, test), 
    batch_size = 64,
    sort_within_batch = True,
    device = device)

## RNN

Для начала попробуем использовать рекурентные нейронные сети. На семинаре вы познакомились с GRU, вы можете также попробовать LSTM. Можно использовать для классификации как hidden_state, так и output последнего токена.



In [7]:
class RNNBaseline(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        self.n_layers = n_layers
        self.bidirectional = bidirectional

        self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=n_layers,
                           bidirectional=bidirectional)
        self.dropout = nn.Dropout(p=dropout)
        if bidirectional:
            self.fc = nn.Linear(hidden_dim*2, output_dim)
        else:
            self.fc = nn.Linear(hidden_dim, output_dim)
        
        
    def forward(self, text, text_lengths):
        embedded = self.embedding(text)
        
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths)
        
        packed_output, (hidden, cell) = self.rnn(packed_embedded)

        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)  
        if self.bidirectional:
            hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]),dim=1)
        else:
            hidden = hidden[-1,:,:]

        hidden = self.dropout(hidden)        
        return self.fc(hidden)

Поиграйтесь с гиперпараметрами

In [8]:
vocab_size = len(TEXT.vocab)
emb_dim = 300
hidden_dim = 256
output_dim = 1
n_layers = 2
bidirectional = True
dropout = 0.2
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
patience=5

In [9]:
model = RNNBaseline(
    vocab_size=vocab_size,
    embedding_dim=emb_dim,
    hidden_dim=hidden_dim,
    output_dim=output_dim,
    n_layers=n_layers,
    bidirectional=bidirectional,
    dropout=dropout,
    pad_idx=PAD_IDX
)

In [10]:
model = model.to(device)

In [11]:
opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

max_epochs = 20

Обучите сетку! Используйте любые вам удобные инструменты, Catalyst, PyTorch Lightning или свои велосипеды.

In [12]:
import numpy as np

def train_model(model, optimizer, criterion, max_epochs, max_grad_norm = 2):
    min_loss = np.inf

    cur_patience = 0

    for epoch in range(1, max_epochs + 1):
        model.train()
        pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        train_loss = 0.0
        for it, batch in pbar:
            optimizer.zero_grad()
            input_embeds = batch.text[0].to(device)
            text_lengths = batch.text[1].to('cpu')
            labels = batch.label.to(device)
            prediction = model(input_embeds, text_lengths)
            prediction = prediction.squeeze()
            loss = criterion(prediction, labels)
            train_loss += loss
            loss.backward()
            if max_grad_norm is not None:
                torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
            optimizer.step()

        train_loss /= len(train_iter)
        val_loss = 0.0
        model.eval()
        pbar = tqdm(enumerate(valid_iter), total=len(valid_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        with torch.no_grad():
            correct = 0
            num_objs = 0
            num_iter = 0
            for it, batch in pbar:
                input_embeds = batch.text[0].to(device)
                text_lengths = batch.text[1].to('cpu')
                labels = batch.label.to(device)
                prediction = model(input_embeds, text_lengths)
                prediction = prediction.squeeze()
                val_loss += criterion(prediction, labels)
                prediction[prediction <= 0] = 0
                prediction[prediction > 0] = 1
                correct += (labels == prediction).float().sum()
                num_objs += len(labels)
                num_iter += 1
            val_loss /= len(valid_iter)
            if val_loss < min_loss:
                min_loss = val_loss
                best_model = model.state_dict()
            else:
                cur_patience += 1
                if cur_patience == patience:
                    cur_patience = 0
                    break

        print('Epoch: {}, Training Loss: {}, Validation Loss: {}, Accuracy: {}'.format(epoch, train_loss, val_loss, correct/num_objs))
    model.load_state_dict(best_model)

In [13]:
train_model(model, opt, loss_func, max_epochs)

  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.5941611528396606, Validation Loss: 0.5660535097122192, Accuracy: 0.7484000325202942


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.3757791221141815, Validation Loss: 0.42332401871681213, Accuracy: 0.8182666897773743


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.18470433354377747, Validation Loss: 0.46134695410728455, Accuracy: 0.8254666924476624


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.08631286770105362, Validation Loss: 0.5772088170051575, Accuracy: 0.8254666924476624


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.020901110023260117, Validation Loss: 0.8076033592224121, Accuracy: 0.8166666626930237


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.01063429657369852, Validation Loss: 0.724364697933197, Accuracy: 0.8485333323478699


  0%|          | 0/274 [00:00<?, ?it/s]

  0%|          | 0/118 [00:00<?, ?it/s]

Посчитайте f1-score вашего классификатора на тестовом датасете.

**Ответ**:

In [15]:
from sklearn.metrics import f1_score
model.eval()
whole_labels = torch.empty(0).to(device)
whole_predictions = torch.empty(0).to(device)
print('whole labels', whole_labels)
with torch.no_grad():
    for it in test_iter:
        input_embeds, text_length = it.text
        labels = it.label
        whole_labels = torch.cat((whole_labels, labels))
        prediction = model(input_embeds, text_length.cpu())
        prediction = prediction.squeeze()
        prediction[prediction <= 0] = 0
        prediction[prediction > 0] = 1
        whole_predictions = torch.cat((whole_predictions, prediction))

f1_score(whole_labels.cpu(), whole_predictions.cpu())

whole labels tensor([], device='cuda:0')


0.8268252489790069

Наблюдается быстрое переобучение и подстроение под трейн выборку

## CNN

![](https://www.researchgate.net/publication/333752473/figure/fig1/AS:769346934673412@1560438011375/Standard-CNN-on-text-classification.png)

Для классификации текстов также часто используют сверточные нейронные сети. Идея в том, что как правило сентимент содержат словосочетания из двух-трех слов, например "очень хороший фильм" или "невероятная скука". Проходясь сверткой по этим словам мы получим какой-то большой скор и выхватим его с помощью MaxPool. Далее идет обычная полносвязная сетка. Важный момент: свертки применяются не последовательно, а параллельно. Давайте попробуем!

In [None]:
TEXT = Field(sequential=True, lower=True, batch_first=True)  # batch_first тк мы используем conv  
LABEL = LabelField(batch_first=True, dtype=torch.float)

train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split(random_state=random.seed(SEED))

TEXT.build_vocab(trn)
LABEL.build_vocab(trn)

device = "cuda" if torch.cuda.is_available() else "cpu"



downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:05<00:00, 15.6MB/s]


In [None]:
train_iter, valid_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 256, 512),
        sort=False,
        sort_key= lambda x: len(x.src),
        sort_within_batch=False,
        device=device,
        repeat=False,
)



Вы можете использовать Conv2d с `in_channels=1, kernel_size=(kernel_sizes[0], emb_dim))` или Conv1d c `in_channels=emb_dim, kernel_size=kernel_size[0]`. Но хорошенько подумайте над shape в обоих случаях.

In [None]:
class CNN(nn.Module):
    def __init__(
        self,
        vocab_size,
        emb_dim,
        out_channels,
        kernel_sizes,
        dropout=0.5,
    ):
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, emb_dim)
        self.conv_0 = nn.Sequential(
            nn.Conv1d(emb_dim, out_channels, kernel_size=kernel_sizes[0]),
            nn.BatchNorm1d(out_channels))
        
        self.conv_1 = nn.Sequential(
            nn.Conv1d(emb_dim, out_channels, kernel_size=kernel_sizes[1]),
            nn.BatchNorm1d(out_channels))
        
        
        self.conv_2 = nn.Sequential(
            nn.Conv1d(emb_dim, out_channels, kernel_size=kernel_sizes[2]),
            nn.BatchNorm1d(out_channels)
        )
        
        
        self.fc = nn.Linear(len(kernel_sizes) * out_channels, 1)
        
        self.dropout = nn.Dropout(dropout)
        
        
    def forward(self, text):
        embedded = self.embedding(text)
        embedded = embedded.permute(0,2,1)
        
        conved_0 = F.relu(self.conv_0(embedded))
        conved_1 = F.relu(self.conv_1(embedded))
        conved_2 = F.relu(self.conv_2(embedded))
        
        pooled_0 = F.max_pool1d(conved_0, conved_0.shape[2]).squeeze(2)
        pooled_1 = F.max_pool1d(conved_1, conved_1.shape[2]).squeeze(2)
        pooled_2 = F.max_pool1d(conved_2, conved_2.shape[2]).squeeze(2)
        
        cat = self.dropout(torch.cat((pooled_0, pooled_1, pooled_2), dim=1))
            
        return self.fc(cat)

In [None]:
vocab_size = len(TEXT.vocab)
out_channels=64
kernel_sizes = [3,4,5]
dropout = 0.5
dim = 300
patience = 5
model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=out_channels,
            kernel_sizes=kernel_sizes, dropout=dropout)

In [None]:
model.to(device)

CNN(
  (embedding): Embedding(201383, 300)
  (conv_0): Sequential(
    (0): Conv1d(300, 64, kernel_size=(3,), stride=(1,))
    (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv_1): Sequential(
    (0): Conv1d(300, 64, kernel_size=(4,), stride=(1,))
    (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv_2): Sequential(
    (0): Conv1d(300, 64, kernel_size=(5,), stride=(1,))
    (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (fc): Linear(in_features=192, out_features=1, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

In [None]:
opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

In [None]:
max_epochs = 30

In [None]:
import numpy as np

def train_cnn_model(model, optimizer, criterion, max_epochs, max_grad_norm = 2):
    min_loss = np.inf

    cur_patience = 0

    for epoch in range(1, max_epochs + 1):
        model.train()
        pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        train_loss = 0.0
        for it, batch in pbar:
            optimizer.zero_grad()
            input_embeds = batch.text.to(device)
            #text_lengths = batch.text[1].to('cpu')
            labels = batch.label.to(device)
            prediction = model(input_embeds)
            prediction = prediction.squeeze()
            loss = criterion(prediction, labels)
            train_loss += loss
            loss.backward()
            if max_grad_norm is not None:
                torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
            optimizer.step()

        train_loss /= len(train_iter)
        val_loss = 0.0
        model.eval()
        pbar = tqdm(enumerate(valid_iter), total=len(valid_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        with torch.no_grad():
            correct = 0
            num_objs = 0
            num_iter = 0
            for it, batch in pbar:
                input_embeds = batch.text.to(device)
                labels = batch.label.to(device)
                prediction = model(input_embeds)
                prediction = prediction.squeeze()
                val_loss += criterion(prediction, labels)
                prediction[prediction <= 0] = 0
                prediction[prediction > 0] = 1
                correct += (labels == prediction).float().sum()
                num_objs += len(labels)
                num_iter += 1
            val_loss /= len(valid_iter)
            if val_loss < min_loss:
                min_loss = val_loss
                best_model = model.state_dict()
            else:
                cur_patience += 1
                if cur_patience == patience:
                    cur_patience = 0
                    break

        print('Epoch: {}, Training Loss: {}, Validation Loss: {}, Accuracy: {}'.format(epoch, train_loss, val_loss, correct/num_objs))
    model.load_state_dict(best_model)

Обучите!

In [None]:
train_cnn_model(model, opt, loss_func, max_epochs)

  0%|          | 0/137 [00:00<?, ?it/s]



  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.7810875773429871, Validation Loss: 0.6486999988555908, Accuracy: 0.66839998960495


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.6330894231796265, Validation Loss: 0.5632702708244324, Accuracy: 0.7315999865531921


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.5813029408454895, Validation Loss: 0.5248895883560181, Accuracy: 0.7633333206176758


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.5424948334693909, Validation Loss: 0.48198267817497253, Accuracy: 0.7944000363349915


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.5083868503570557, Validation Loss: 0.4595663249492645, Accuracy: 0.8055999875068665


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.46033793687820435, Validation Loss: 0.4205177426338196, Accuracy: 0.8193333148956299


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 7, Training Loss: 0.41591590642929077, Validation Loss: 0.3912450969219208, Accuracy: 0.8310666680335999


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 8, Training Loss: 0.3650835156440735, Validation Loss: 0.37357035279273987, Accuracy: 0.8402667045593262


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 9, Training Loss: 0.3246888518333435, Validation Loss: 0.373189240694046, Accuracy: 0.8398666977882385


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 10, Training Loss: 0.2737072706222534, Validation Loss: 0.3465358316898346, Accuracy: 0.8521333336830139


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 11, Training Loss: 0.22864930331707, Validation Loss: 0.3338434398174286, Accuracy: 0.853866696357727


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 12, Training Loss: 0.1917518824338913, Validation Loss: 0.33341777324676514, Accuracy: 0.8560000061988831


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 13, Training Loss: 0.15904513001441956, Validation Loss: 0.3285972476005554, Accuracy: 0.8613333702087402


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 14, Training Loss: 0.12807519733905792, Validation Loss: 0.3280714750289917, Accuracy: 0.859333336353302


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 15, Training Loss: 0.10656827688217163, Validation Loss: 0.33049777150154114, Accuracy: 0.8619999885559082


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 16, Training Loss: 0.08587555587291718, Validation Loss: 0.33017951250076294, Accuracy: 0.8601333498954773


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 17, Training Loss: 0.07807717472314835, Validation Loss: 0.3352421224117279, Accuracy: 0.8628000020980835


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 18, Training Loss: 0.0580594427883625, Validation Loss: 0.3361768126487732, Accuracy: 0.8615999817848206


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Посчитайте f1-score вашего классификатора.

**Ответ**:

In [None]:
from sklearn.metrics import f1_score
model.eval()
whole_labels = torch.empty(0).to(device)
whole_predictions = torch.empty(0).to(device)
print('whole labels', whole_labels)
with torch.no_grad():
    for it in test_iter:
        input_embeds = it.text
        labels = it.label
        whole_labels = torch.cat((whole_labels, labels))
        prediction = model(input_embeds)
        prediction = prediction.squeeze()
        prediction[prediction <= 0] = 0
        prediction[prediction > 0] = 1
        whole_predictions = torch.cat((whole_predictions, prediction))

f1_score(whole_labels.cpu(), whole_predictions.cpu())

whole labels tensor([], device='cuda:0')




0.8666116935642204

Очень странно, что на половине батчей f1 score высокий, а на половине вообще = 0. Кажется это тест странный

## Интерпретируемость

Посмотрим, куда смотрит наша модель. Достаточно запустить код ниже.

In [None]:
!pip install -q captum

[?25l[K     |▎                               | 10 kB 22.8 MB/s eta 0:00:01[K     |▌                               | 20 kB 7.3 MB/s eta 0:00:01[K     |▊                               | 30 kB 5.5 MB/s eta 0:00:01[K     |█                               | 40 kB 5.2 MB/s eta 0:00:01[K     |█▏                              | 51 kB 2.6 MB/s eta 0:00:01[K     |█▍                              | 61 kB 2.9 MB/s eta 0:00:01[K     |█▋                              | 71 kB 2.9 MB/s eta 0:00:01[K     |██                              | 81 kB 3.3 MB/s eta 0:00:01[K     |██▏                             | 92 kB 3.5 MB/s eta 0:00:01[K     |██▍                             | 102 kB 2.8 MB/s eta 0:00:01[K     |██▋                             | 112 kB 2.8 MB/s eta 0:00:01[K     |██▉                             | 122 kB 2.8 MB/s eta 0:00:01[K     |███                             | 133 kB 2.8 MB/s eta 0:00:01[K     |███▎                            | 143 kB 2.8 MB/s eta 0:00:01[K    

In [None]:
from captum.attr import LayerIntegratedGradients, TokenReferenceBase, visualization

PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)

In [None]:
def forward_with_softmax(inp):
    logits = model(inp)
    return torch.softmax(logits, 0)[0][1]

def forward_with_sigmoid(input):
    return torch.sigmoid(model(input))


# accumalate couple samples in this array for visualization purposes
vis_data_records_ig = []

def interpret_sentence(model, sentence, min_len = 7, label = 0):
    model.eval()
    text = [tok for tok in TEXT.tokenize(sentence)]
    if len(text) < min_len:
        text += ['pad'] * (min_len - len(text))
    indexed = [TEXT.vocab.stoi[t] for t in text]

    model.zero_grad()

    input_indices = torch.tensor(indexed, device=device)
    input_indices = input_indices.unsqueeze(0)
    
    # input_indices dim: [sequence_length]
    seq_length = min_len

    # predict
    pred = forward_with_sigmoid(input_indices).item()
    pred_ind = round(pred)

    # generate reference indices for each sample
    reference_indices = token_reference.generate_reference(seq_length, device=device).unsqueeze(0)

    # compute attributions and approximation delta using layer integrated gradients
    attributions_ig, delta = lig.attribute(input_indices, reference_indices, \
                                           n_steps=5000, return_convergence_delta=True)

    print('pred: ', LABEL.vocab.itos[pred_ind], '(', '%.2f'%pred, ')', ', delta: ', abs(delta))

    add_attributions_to_visualizer(attributions_ig, text, pred, pred_ind, label, delta, vis_data_records_ig)
    
def add_attributions_to_visualizer(attributions, text, pred, pred_ind, label, delta, vis_data_records):
    attributions = attributions.sum(dim=2).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    attributions = attributions.cpu().detach().numpy()

    # storing couple samples in an array for visualization purposes
    vis_data_records.append(visualization.VisualizationDataRecord(
                            attributions,
                            pred,
                            LABEL.vocab.itos[pred_ind],
                            LABEL.vocab.itos[label],
                            LABEL.vocab.itos[1],
                            attributions.sum(),       
                            text,
                            delta))

In [None]:
interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

#в последних получилось таки обмануть сеть)
interpret_sentence(model, 'I love such horrible things!', label=1)
interpret_sentence(model, 'I hate filmes about love', label=0)

pred:  pos ( 0.76 ) , delta:  tensor([2.4362e-06], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.29 ) , delta:  tensor([1.8191e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.82 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.03 ) , delta:  tensor([7.2399e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.41 ) , delta:  tensor([8.2461e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.31 ) , delta:  tensor([7.2803e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.08 ) , delta:  tensor([0.0002], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.59 ) , delta:  tensor([4.6966e-05], device='cuda:0', dtype=torch.float64)


Попробуйте добавить свои примеры!

In [None]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.76),pos,1.97,It was a fantastic performance ! pad
,,,,
pos,neg (0.29),pos,0.88,Best film ever pad pad pad pad
,,,,
pos,pos (0.82),pos,1.28,Such a great show! pad pad pad
,,,,
neg,neg (0.03),pos,-0.69,It was a horrible movie pad pad
,,,,
neg,neg (0.41),pos,0.69,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.76),pos,1.97,It was a fantastic performance ! pad
,,,,
pos,neg (0.29),pos,0.88,Best film ever pad pad pad pad
,,,,
pos,pos (0.82),pos,1.28,Such a great show! pad pad pad
,,,,
neg,neg (0.03),pos,-0.69,It was a horrible movie pad pad
,,,,
neg,neg (0.41),pos,0.69,I've never watched something as bad pad
,,,,


## Эмбеддинги слов

Вы ведь не забыли, как мы можем применить знания о word2vec и GloVe. Давайте попробуем!

In [None]:
TEXT.build_vocab(trn, vectors='glove.6B.300d')# YOUR CODE GOES HERE
# подсказка: один из импортов пока не использовался, быть может он нужен в строке выше :)
LABEL.build_vocab(trn)

word_embeddings = TEXT.vocab.vectors

kernel_sizes = [3, 4, 5]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300

.vector_cache/glove.6B.zip: 862MB [02:44, 5.25MB/s]                           
100%|█████████▉| 399999/400000 [00:59<00:00, 6714.54it/s]


In [None]:
train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split(random_state=random.seed(SEED))

device = "cuda" if torch.cuda.is_available() else "cpu"
train_iter, val_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 256, 256),
        sort=False,
        sort_key= lambda x: len(x.src),
        sort_within_batch=False,
        device=device,
        repeat=False,
)



In [None]:
model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=64,
            kernel_sizes=kernel_sizes, dropout=dropout)

word_embeddings = TEXT.vocab.vectors

prev_shape = model.embedding.weight.shape

model.embedding.weight = nn.Parameter(torch.rand(prev_shape))

assert prev_shape == model.embedding.weight.shape
model.to(device)

opt = torch.optim.Adam(model.parameters())

Вы знаете, что делать.

In [None]:
import numpy as np

def train_cnnv_model(model, optimizer, criterion, max_epochs, max_grad_norm = 2):
    min_loss = np.inf

    cur_patience = 0

    for epoch in range(1, max_epochs + 1):
        model.train()
        pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        train_loss = 0.0
        for it, batch in pbar:
            optimizer.zero_grad()
            input_embeds = batch.text.to(device)
            #text_lengths = batch.text[1].to('cpu')
            labels = batch.label.to(device)
            prediction = model(input_embeds)
            prediction = prediction.squeeze()
            loss = criterion(prediction, labels)
            train_loss += loss
            loss.backward()
            if max_grad_norm is not None:
                torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
            optimizer.step()

        train_loss /= len(train_iter)
        val_loss = 0.0
        model.eval()
        pbar = tqdm(enumerate(valid_iter), total=len(valid_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        with torch.no_grad():
            correct = 0
            num_objs = 0
            num_iter = 0
            for it, batch in pbar:
                input_embeds = batch.text.to(device)
                labels = batch.label.to(device)
                prediction = model(input_embeds)
                prediction = prediction.squeeze()
                val_loss += criterion(prediction, labels)
                prediction[prediction <= 0] = 0
                prediction[prediction > 0] = 1
                correct += (labels == prediction).float().sum()
                num_objs += len(labels)
                num_iter += 1
            val_loss /= len(valid_iter)
            if val_loss < min_loss:
                min_loss = val_loss
                best_model = model.state_dict()
            else:
                cur_patience += 1
                if cur_patience == patience:
                    cur_patience = 0
                    break

        print('Epoch: {}, Training Loss: {}, Validation Loss: {}, Accuracy: {}'.format(epoch, train_loss, val_loss, correct/num_objs))
    model.load_state_dict(best_model)

In [None]:
train_cnnv_model(model, opt, loss_func, max_epochs)

  0%|          | 0/137 [00:00<?, ?it/s]



  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.7468419671058655, Validation Loss: 0.6167594194412231, Accuracy: 0.7338666915893555


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.6212998628616333, Validation Loss: 0.5439022779464722, Accuracy: 0.7508000135421753


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.5392245054244995, Validation Loss: 0.4568289816379547, Accuracy: 0.8037333488464355


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.4417702555656433, Validation Loss: 0.40130001306533813, Accuracy: 0.8185333609580994


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.3296484649181366, Validation Loss: 0.37175798416137695, Accuracy: 0.836400032043457


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.24175618588924408, Validation Loss: 0.35449719429016113, Accuracy: 0.8389333486557007


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 7, Training Loss: 0.16766005754470825, Validation Loss: 0.3573201894760132, Accuracy: 0.843999981880188


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 8, Training Loss: 0.11783715337514877, Validation Loss: 0.3603387176990509, Accuracy: 0.8473333716392517


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 9, Training Loss: 0.06948806345462799, Validation Loss: 0.38479694724082947, Accuracy: 0.8453333377838135


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: 10, Training Loss: 0.051184091717004776, Validation Loss: 0.3789317011833191, Accuracy: 0.8594666719436646


  0%|          | 0/137 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

Наблюдается более быстрое обучение, однако окончательное качество не улучшилось, а даже немного ухудшилось

Посчитайте f1-score вашего классификатора.

**Ответ**:

In [None]:
from sklearn.metrics import f1_score
model.eval()
whole_labels = torch.empty(0).to(device)
whole_predictions = torch.empty(0).to(device)
print('whole labels', whole_labels)
with torch.no_grad():
    for it in test_iter:
        input_embeds = it.text
        labels = it.label
        whole_labels = torch.cat((whole_labels, labels))
        prediction = model(input_embeds)
        prediction = prediction.squeeze()
        prediction[prediction <= 0] = 0
        prediction[prediction > 0] = 1
        whole_predictions = torch.cat((whole_predictions, prediction))

f1_score(whole_labels.cpu(), whole_predictions.cpu())

whole labels tensor([], device='cuda:0')




0.8390187494837696

Проверим насколько все хорошо!

In [None]:
PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)
vis_data_records_ig = []

interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

#в последних получилось таки обмануть сеть)
interpret_sentence(model, 'I love such horrible things!', label=1)
interpret_sentence(model, 'I hate filmes about love', label=0)

pred:  pos ( 0.98 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.12 ) , delta:  tensor([1.0699e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.99 ) , delta:  tensor([0.0002], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([5.6483e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.12 ) , delta:  tensor([3.7584e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.15 ) , delta:  tensor([3.0729e-06], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([6.8424e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.66 ) , delta:  tensor([4.9861e-05], device='cuda:0', dtype=torch.float64)


In [None]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.98),pos,1.66,It was a fantastic performance ! pad
,,,,
pos,neg (0.12),pos,-0.16,Best film ever pad pad pad pad
,,,,
pos,pos (0.99),pos,1.51,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.94,It was a horrible movie pad pad
,,,,
neg,neg (0.12),pos,-0.05,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.98),pos,1.66,It was a fantastic performance ! pad
,,,,
pos,neg (0.12),pos,-0.16,Best film ever pad pad pad pad
,,,,
pos,pos (0.99),pos,1.51,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.94,It was a horrible movie pad pad
,,,,
neg,neg (0.12),pos,-0.05,I've never watched something as bad pad
,,,,
