<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ</b></h3>

---

# Задание 3

## Классификация текстов

В этом задании вам предстоит попробовать несколько методов, используемых в задаче классификации, а также понять насколько хорошо модель понимает смысл слов и какие слова в примере влияют на результат.

In [1]:
# !pip install -q torch torchvision torchtext
!pip install -q pytorch-lightning==1.2.3
!pip install -q torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchtext==0.9.0 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

In [2]:
import pytorch_lightning as pl
from pytorch_lightning.callbacks.early_stopping import EarlyStopping

In [3]:
import pandas as pd
import numpy as np
import torch
from sklearn import metrics
from torchtext.legacy import datasets

# from torchtext.data import Field, LabelField
# from torchtext.data import BucketIterator
from torchtext.legacy.data import Field, TabularDataset, BucketIterator, Iterator, LabelField

from torchtext.vocab import Vectors, GloVe

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
from tqdm.autonotebook import tqdm

In [4]:
def seed_everything(seed: int):
    import random, os
    import numpy as np
    import torch
    import pytorch_lightning as pl

    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    pl.seed_everything(seed)
SEED = 42 
seed_everything(SEED)

Global seed set to 69
INFO:lightning:Global seed set to 69


В этом задании мы будем использовать библиотеку torchtext. Она довольна проста в использовании и поможет нам сконцентрироваться на задаче, а не на написании Dataloader-а.

In [5]:
TEXT = Field(sequential=True, lower=True, include_lengths=True)  # Поле текста
LABEL = LabelField(dtype=torch.float)  # Поле метки

In [6]:
# SEED = 1234

# torch.manual_seed(SEED)
# torch.backends.cudnn.deterministic = True

Датасет на котором мы будем проводить эксперементы это комментарии к фильмам из сайта IMDB.

In [7]:
train, test = datasets.IMDB.splits(TEXT, LABEL)  # загрузим датасет
train, valid = train.split(random_state=random.seed(SEED))  # разобьем на части

In [8]:
TEXT.build_vocab(train)
LABEL.build_vocab(train)

In [9]:
device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, valid_iter, test_iter = BucketIterator.splits(
    (train, valid, test), 
    batch_size = 64,
    sort_within_batch = True)

## RNN

Для начала попробуем использовать рекурентные нейронные сети. На семинаре вы познакомились с GRU, вы можете также попробовать LSTM. Можно использовать для классификации как hidden_state, так и output последнего токена.

In [10]:
class RNNBaseline(pl.LightningModule):
    def __init__(self, 
                 vocab_size: int, 
                 embedding_dim: int, 
                 hidden_dim: int, 
                 output_dim: int, 
                 n_layers: int, 
                 if_bidirectional: bool, 
                 dropout_rate: float, 
                 pad_idx: int, 
                 threshold: float=0.5):
        
        super().__init__()
        self.batch_size = 64
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        self.rnn = nn.LSTM(embedding_dim,
                           hidden_dim, 
                           num_layers=n_layers, 
                           bidirectional=if_bidirectional, 
                           dropout=dropout_rate)  # YOUR CODE GOES HERE 
        
        self.fc = nn.Linear(hidden_dim * 2, output_dim)  # YOUR CODE GOES HERE
        
        self.dropout = nn.Dropout(dropout)
        self.threshold = threshold
        self.f1_scr = pl.metrics.F1(output_dim+1, self.threshold)
        self.loss_func = F.binary_cross_entropy_with_logits
        
    def forward(self, text, text_lengths):
        
        #text = [sent len, batch size]
        
        embedded = self.dropout(self.embedding(text))
        
        #embedded = [sent len, batch size, emb dim]
        
        #pack sequence
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths)
        
        # cell arg for LSTM, remove for GRU
        packed_output, (hidden, cell) = self.rnn(packed_embedded)
        #unpack sequence
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)  

        #output = [sent len, batch size, hid dim * num directions]
        #output over padding tokens are zero tensors
        
        #hidden = [num layers * num directions, batch size, hid dim]
        #cell = [num layers * num directions, batch size, hid dim]
        
        #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
        #and apply dropout
        
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)) # YOUR CODE GOES HERE
                
        #hidden = [batch size, hid dim * num directions] or [batch_size, hid dim * num directions]
            
        return self.fc(hidden)

    def training_step(self, batch, batch_idx):
        (sentences, sent_lenghts), labels = batch
        pred_labels = self(sentences, sent_lenghts.cpu()).squeeze()
        loss = self.loss_func(pred_labels, labels)
        metric = self.f1_scr(pred_labels, labels)
        self.log_dict({"train_loss": loss, "f1_train": metric}, on_epoch=True, prog_bar=True, logger=True)
        return {"loss": loss, "f1_train": metric}
    
    def validation_step(self, batch, batch_idx):
        (sentences, sent_lenghts), labels = batch
        pred_labels = self(sentences, sent_lenghts.cpu()).squeeze()
        loss = self.loss_func(pred_labels, labels)
        metric = self.f1_scr(pred_labels, labels)
        self.log_dict({"val_loss": loss, "f1_valid": metric}, on_epoch=True, prog_bar=True, logger=True)
        return {"val_loss": loss, "f1_valid": metric}
    
    def test_step(self, batch, batch_idx):
        (sentences, sent_lenghts), labels = batch
        pred_labels = self(sentences, sent_lenghts.cpu()).squeeze()
        # loss = F.binary_cross_entropy_with_logits(pred_labels.squeeze(), labels)
        metric = self.f1_scr(pred_labels, labels)
        self.log_dict({"f1_test": metric})
        return {"f1_test": metric}

    def configure_optimizers(self):
        opt = optim.AdamW(self.parameters())
        return opt

Поиграйтесь с гиперпараметрами

In [11]:
vocab_size = len(TEXT.vocab)
emb_dim = 100
hidden_dim = 256
output_dim = 1
n_layers = 2
bidirectional = True
dropout = 0.2
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
patience=3

In [12]:
model = RNNBaseline(
    vocab_size=vocab_size,
    embedding_dim=emb_dim,
    hidden_dim=hidden_dim,
    output_dim=output_dim,
    n_layers=n_layers,
    if_bidirectional=bidirectional,
    dropout_rate=dropout,
    pad_idx=PAD_IDX,
    threshold=0.5
)

In [13]:
max_epochs = 20

Обучите сетку! Используйте любые вам удобные инструменты, Catalyst, PyTorch Lightning или свои велосипеды.

In [14]:
trainer = pl.Trainer(deterministic=True,
                     fast_dev_run=False,
                     gpus=1, 
                     max_epochs=max_epochs,
                     gradient_clip_val=2.,
                     auto_lr_find=True,
                     callbacks=[EarlyStopping(monitor="f1_valid", 
                                              min_delta=0.001, 
                                              patience=patience, 
                                              verbose=False, 
                                              mode="max")])
trainer.fit(model, train_iter, valid_iter)

GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO:lightning:TPU available: None, using: 0 TPU cores

  | Name      | Type      | Params
----------------------------------------
0 | embedding | Embedding | 20.2 M
1 | rnn       | LSTM      | 2.3 M 
2 | fc        | Linear    | 513   
3 | dropout   | Dropout   | 0     
4 | f1_scr    | F1        | 0     
----------------------------------------
22.5 M    Trainable params
0         Non-trainable params
22.5 M    Total params
90.142    Total estimated model params size (MB)
INFO:lightning:
  | Name      | Type      | Params
----------------------------------------
0 | embedding | Embedding | 20.2 M
1 | rnn       | LSTM      | 2.3 M 
2 | fc        | Linear    | 513   
3 | dropout   | Dropout   | 0     
4 | f1_scr    | F1        | 0     
----------------------------------------
22.5 M    Trainable params
0         Non-trainable params
22.5 M    Total params
90.142    Tota

Validation sanity check: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

1

In [15]:
# %load_ext tensorboard
# %tensorboard --logdir lightning_logs/

In [16]:
res = trainer.test(model, test_iter)

Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'f1_test': 0.7547169923782349}
--------------------------------------------------------------------------------


Посчитайте f1-score вашего классификатора на тестовом датасете.

**Ответ**: 0.754 (можно поиграться и получить более высокий результат)

## CNN

![](https://www.researchgate.net/publication/333752473/figure/fig1/AS:769346934673412@1560438011375/Standard-CNN-on-text-classification.png)

Для классификации текстов также часто используют сверточные нейронные сети. Идея в том, что как правило сентимент содержат словосочетания из двух-трех слов, например "очень хороший фильм" или "невероятная скука". Проходясь сверткой по этим словам мы получим какой-то большой скор и выхватим его с помощью MaxPool. Далее идет обычная полносвязная сетка. Важный момент: свертки применяются не последовательно, а параллельно. Давайте попробуем!

In [17]:
TEXT = Field(sequential=True, lower=True, batch_first=True)  # batch_first тк мы используем conv  
LABEL = LabelField(batch_first=True, dtype=torch.float)

train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split(random_state=random.seed(SEED))

TEXT.build_vocab(trn)
LABEL.build_vocab(trn)

# device = "cuda" if torch.cuda.is_available() else "cpu"

In [18]:
# train_iter_cnn, valid_iter_cnn, test_iter_cnn = BucketIterator.splits(
#         (trn, vld, tst),
#         batch_sizes=(128, 256, 256),
#         sort=False,
#         sort_key=lambda x: len(x.src),
#         sort_within_batch=False,
#         repeat=False,
# )
train_iter_cnn, valid_iter_cnn, test_iter_cnn = BucketIterator.splits(
    (trn, vld, tst), 
    batch_sizes=(128, 256, 256),
    sort_within_batch = False)

Вы можете использовать Conv2d с `in_channels=1, kernel_size=(kernel_sizes[0], emb_dim))` или Conv1d c `in_channels=emb_dim, kernel_size=kernel_size[0]`. Но хорошенько подумайте над shape в обоих случаях.

In [19]:
class CNN(pl.LightningModule):
    def __init__(
        self,
        vocab_size,
        emb_dim,
        out_channels,
        kernel_sizes,
        pad_idx,
        dropout=0.5,
        threshold=0.5
    ):
        super().__init__()
        self.vocab_size = vocab_size
        self.emb_dim = emb_dim
        self.out_channels = out_channels
        self.kernel_sizes = kernel_sizes
        self.pad_idx = pad_idx
        self.dropout = dropout
        self.threshold = threshold
        self.f1_scr = pl.metrics.F1(2, self.threshold)
        self.loss_func = F.binary_cross_entropy_with_logits

        self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=self.pad_idx)
        self.conv_layers = nn.ModuleList([nn.Conv1d(in_channels=self.emb_dim, 
                                              out_channels=self.out_channels,
                                              kernel_size=krnl) for krnl in self.kernel_sizes])
        
        self.fc = nn.Linear(len(kernel_sizes) * out_channels, 1)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        
        embedded = self.embedding(text).permute(0, 2, 1)
        
        conved = [F.relu(conv(embedded)) for conv in self.conv_layers]
        pooled = [F.max_pool1d(cnvd, cnvd.shape[2]).squeeze(2) for cnvd in conved]
        
        cat = self.dropout(torch.cat((pooled), dim=1))
            
        return self.fc(cat)

    def training_step(self, batch, batch_idx):
        sentences, labels = batch
        pred_labels = self(sentences).squeeze()
        loss = self.loss_func(pred_labels, labels)
        metric = self.f1_scr(pred_labels, labels)
        self.log_dict({"train_loss": loss, "f1_train": metric}, on_epoch=True, prog_bar=True, logger=True)
        return {"loss": loss, "f1_train": metric}
    
    def validation_step(self, batch, batch_idx):
        sentences, labels = batch
        pred_labels = self(sentences).squeeze()
        loss = self.loss_func(pred_labels, labels)
        metric = self.f1_scr(pred_labels, labels)
        self.log_dict({"val_loss": loss, "f1_valid": metric}, on_epoch=True, prog_bar=True, logger=True)
        return {"val_loss": loss, "f1_valid": metric}
    
    def test_step(self, batch, batch_idx):
        sentences, labels = batch
        pred_labels = self(sentences).squeeze()
        metric = self.f1_scr(pred_labels, labels)
        self.log_dict({"f1_test": metric})
        return {"f1_test": metric}

    def configure_optimizers(self):
        opt = optim.AdamW(self.parameters())
        return opt

In [20]:
kernel_sizes = [2, 3, 4, 5]
vocab_size = len(TEXT.vocab)
out_channels=64
dropout = 0.5
dim = 300

model_cnn = CNN(vocab_size=vocab_size, 
                emb_dim=dim, 
                out_channels=out_channels,
                kernel_sizes=kernel_sizes, 
                dropout=dropout, 
                pad_idx=PAD_IDX)

In [21]:
# opt = torch.optim.Adam(model.parameters())
# loss_func = nn.BCEWithLogitsLoss()

In [22]:
max_epochs = 30

Обучите!

In [23]:
trainer_cnn = pl.Trainer(deterministic=True,
                     fast_dev_run=False,
                     gpus=1, 
                     max_epochs=max_epochs,
                     gradient_clip_val=2.,
                     auto_lr_find=True,
                     callbacks=[EarlyStopping(monitor="f1_valid", 
                                              min_delta=0.001, 
                                              patience=patience, 
                                              verbose=False, 
                                              mode="max")])

GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO:lightning:TPU available: None, using: 0 TPU cores


In [24]:
trainer_cnn.fit(model_cnn, train_iter_cnn, valid_iter_cnn)


  | Name        | Type       | Params
-------------------------------------------
0 | f1_scr      | F1         | 0     
1 | embedding   | Embedding  | 60.7 M
2 | conv_layers | ModuleList | 269 K 
3 | fc          | Linear     | 257   
4 | dropout     | Dropout    | 0     
-------------------------------------------
60.9 M    Trainable params
0         Non-trainable params
60.9 M    Total params
243.776   Total estimated model params size (MB)
INFO:lightning:
  | Name        | Type       | Params
-------------------------------------------
0 | f1_scr      | F1         | 0     
1 | embedding   | Embedding  | 60.7 M
2 | conv_layers | ModuleList | 269 K 
3 | fc          | Linear     | 257   
4 | dropout     | Dropout    | 0     
-------------------------------------------
60.9 M    Trainable params
0         Non-trainable params
60.9 M    Total params
243.776   Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

1

In [25]:
res_cnn = trainer_cnn.test(model_cnn, test_iter_cnn)

Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'f1_test': 0.8098158836364746}
--------------------------------------------------------------------------------


Посчитайте f1-score вашего классификатора.

**Ответ**: 0.81

## Интерпретируемость

Посмотрим, куда смотрит наша модель. Достаточно запустить код ниже.

In [26]:
!pip install -q captum

In [27]:
from captum.attr import LayerIntegratedGradients, TokenReferenceBase, visualization

PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model_cnn, model_cnn.embedding)

In [28]:
def forward_with_sigmoid(input, model):
    return torch.sigmoid(model(input))


# accumalate couple samples in this array for visualization purposes
vis_data_records_ig = []

def interpret_sentence(model, sentence, min_len = 7, label = 0):
    model.eval()
    text = [tok for tok in TEXT.tokenize(sentence)]
    if len(text) < min_len:
        text += ['pad'] * (min_len - len(text))
    indexed = [TEXT.vocab.stoi[t] for t in text]

    model.zero_grad()

    input_indices = torch.tensor(indexed, device=device)
    input_indices = input_indices.unsqueeze(0)
    
    # input_indices dim: [sequence_length]
    seq_length = min_len

    # predict
    pred = forward_with_sigmoid(input_indices, model).item()
    pred_ind = round(pred)

    # generate reference indices for each sample
    reference_indices = token_reference.generate_reference(seq_length, device=device).unsqueeze(0)

    # compute attributions and approximation delta using layer integrated gradients
    attributions_ig, delta = lig.attribute(input_indices, reference_indices, \
                                           n_steps=5000, return_convergence_delta=True)

    print('pred: ', LABEL.vocab.itos[pred_ind], '(', '%.2f'%pred, ')', ', delta: ', abs(delta))

    add_attributions_to_visualizer(attributions_ig, text, pred, pred_ind, label, delta, vis_data_records_ig)
    
def add_attributions_to_visualizer(attributions, text, pred, pred_ind, label, delta, vis_data_records):
    attributions = attributions.sum(dim=2).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    attributions = attributions.cpu().detach().numpy()

    # storing couple samples in an array for visualization purposes
    vis_data_records.append(visualization.VisualizationDataRecord(
                            attributions,
                            pred,
                            LABEL.vocab.itos[pred_ind],
                            LABEL.vocab.itos[label],
                            LABEL.vocab.itos[1],
                            attributions.sum(),       
                            text,
                            delta))

In [29]:
interpret_sentence(model_cnn, 'It was a fantastic performance !', label=1)
interpret_sentence(model_cnn, 'Best film ever', label=1)
interpret_sentence(model_cnn, 'Such a great show!', label=1)
interpret_sentence(model_cnn, 'It was a horrible movie', label=0)
interpret_sentence(model_cnn, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model_cnn, 'It is a disgusting movie!', label=0)

pred:  pos ( 1.00 ) , delta:  tensor([5.4734e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.94 ) , delta:  tensor([8.9192e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.99 ) , delta:  tensor([7.5855e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.01 ) , delta:  tensor([3.4876e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.13 ) , delta:  tensor([5.1994e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.73 ) , delta:  tensor([2.0472e-05], device='cuda:0', dtype=torch.float64)


Попробуйте добавить свои примеры!

In [30]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (1.00),pos,1.1,It was a fantastic performance ! pad
,,,,
pos,pos (0.94),pos,1.51,Best film ever pad pad pad pad
,,,,
pos,pos (0.99),pos,0.99,Such a great show! pad pad pad
,,,,
neg,neg (0.01),pos,-1.11,It was a horrible movie pad pad
,,,,
neg,neg (0.13),pos,-1.29,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (1.00),pos,1.1,It was a fantastic performance ! pad
,,,,
pos,pos (0.94),pos,1.51,Best film ever pad pad pad pad
,,,,
pos,pos (0.99),pos,0.99,Such a great show! pad pad pad
,,,,
neg,neg (0.01),pos,-1.11,It was a horrible movie pad pad
,,,,
neg,neg (0.13),pos,-1.29,I've never watched something as bad pad
,,,,


## Эмбеддинги слов

Вы ведь не забыли, как мы можем применить знания о word2vec и GloVe. Давайте попробуем!

In [31]:
word_embeddings = TEXT.vocab.vectors

kernel_sizes = [2, 3, 4, 5]
vocab_size = len(TEXT.vocab)
dropout = 0.2
dim = 300

TEXT.build_vocab(trn, vectors=GloVe(name="6B", dim=dim))# YOUR CODE GOES HERE
# подсказка: один из импортов пока не использовался, быть может он нужен в строке выше :)
LABEL.build_vocab(trn)

word_embeddings = TEXT.vocab.vectors

100%|█████████▉| 399999/400000 [00:49<00:00, 8115.00it/s] 


In [32]:
train_iter, val_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 256, 256),
        sort_within_batch=False,
)

In [33]:
model_cnn2 = CNN(vocab_size=vocab_size, 
                 emb_dim=dim, 
                 out_channels=64,
            kernel_sizes=kernel_sizes, 
            dropout=dropout, 
            pad_idx=PAD_IDX)

word_embeddings = TEXT.vocab.vectors

prev_shape = model_cnn2.embedding.weight.shape

model_cnn2.embedding.weight = torch.nn.Parameter(word_embeddings)# инициализируйте эмбэдинги

assert prev_shape == model_cnn2.embedding.weight.shape

# opt = torch.optim.Adam(model.parameters())

Вы знаете, что делать.

In [34]:
trainer_cnn2 = pl.Trainer(deterministic=True,
                     fast_dev_run=False,
                     gpus=1, 
                     max_epochs=max_epochs,
                     gradient_clip_val=2.,
                     auto_lr_find=True,
                     callbacks=[EarlyStopping(monitor="f1_valid", 
                                              min_delta=0.001, 
                                              patience=patience, 
                                              verbose=False, 
                                              mode="max")])

GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO:lightning:TPU available: None, using: 0 TPU cores


In [35]:
trainer_cnn2.fit(model_cnn2, train_iter, val_iter)


  | Name        | Type       | Params
-------------------------------------------
0 | f1_scr      | F1         | 0     
1 | embedding   | Embedding  | 60.7 M
2 | conv_layers | ModuleList | 269 K 
3 | fc          | Linear     | 257   
4 | dropout     | Dropout    | 0     
-------------------------------------------
60.9 M    Trainable params
0         Non-trainable params
60.9 M    Total params
243.776   Total estimated model params size (MB)
INFO:lightning:
  | Name        | Type       | Params
-------------------------------------------
0 | f1_scr      | F1         | 0     
1 | embedding   | Embedding  | 60.7 M
2 | conv_layers | ModuleList | 269 K 
3 | fc          | Linear     | 257   
4 | dropout     | Dropout    | 0     
-------------------------------------------
60.9 M    Trainable params
0         Non-trainable params
60.9 M    Total params
243.776   Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

1

In [36]:
res_cnn2 = trainer_cnn2.test(model_cnn2, test_iter)

Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'f1_test': 0.8999999165534973}
--------------------------------------------------------------------------------


Посчитайте f1-score вашего классификатора.

**Ответ**: 0.90

Проверим насколько все хорошо!

In [37]:
PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model_cnn2, model_cnn2.embedding)
vis_data_records_ig = []

interpret_sentence(model_cnn2, 'It was a fantastic performance !', label=1)
interpret_sentence(model_cnn2, 'Best film ever', label=1)
interpret_sentence(model_cnn2, 'Such a great show!', label=1)
interpret_sentence(model_cnn2, 'It was a horrible movie', label=0)
interpret_sentence(model_cnn2, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model_cnn2, 'It is a disgusting movie!', label=0)

pred:  pos ( 0.99 ) , delta:  tensor([8.7031e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.09 ) , delta:  tensor([4.6753e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.90 ) , delta:  tensor([4.4859e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([8.8414e-06], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.10 ) , delta:  tensor([1.5882e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([0.0002], device='cuda:0', dtype=torch.float64)


In [38]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.88,It was a fantastic performance ! pad
,,,,
pos,neg (0.09),pos,0.46,Best film ever pad pad pad pad
,,,,
pos,pos (0.90),pos,1.69,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.86,It was a horrible movie pad pad
,,,,
neg,neg (0.10),pos,0.24,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.88,It was a fantastic performance ! pad
,,,,
pos,neg (0.09),pos,0.46,Best film ever pad pad pad pad
,,,,
pos,pos (0.90),pos,1.69,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.86,It was a horrible movie pad pad
,,,,
neg,neg (0.10),pos,0.24,I've never watched something as bad pad
,,,,


В ходе экспериментов можно сделать следующие выводы:

1) Применение предобученных эмбеддингов повышает качество модели

2) LSTM и сверточные сети имеют схожее качество

3) В ходе обучения можно понять, на какие слова модель обращает большее внимание

4) Pytorch Lightning хорошая библиотека для обучения))


