<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ</b></h3>

---

# Задание 3

## Классификация текстов

В этом задании вам предстоит попробовать несколько методов, используемых в задаче классификации, а также понять насколько хорошо модель понимает смысл слов и какие слова в примере влияют на результат.

In [None]:
!pip uninstall -y torchtext
!pip uninstall -y torch
!pip install torch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0
!pip install torchtext==0.8.0

Found existing installation: torchtext 0.12.0
Uninstalling torchtext-0.12.0:
  Successfully uninstalled torchtext-0.12.0
Found existing installation: torch 1.11.0+cu113
Uninstalling torch-1.11.0+cu113:
  Successfully uninstalled torch-1.11.0+cu113
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch==1.7.0
  Downloading torch-1.7.0-cp37-cp37m-manylinux1_x86_64.whl (776.7 MB)
[K     |████████████████████████████████| 776.7 MB 4.7 kB/s 
[?25hCollecting torchvision==0.8.0
  Downloading torchvision-0.8.0-cp37-cp37m-manylinux1_x86_64.whl (11.8 MB)
[K     |████████████████████████████████| 11.8 MB 36.2 MB/s 
[?25hCollecting torchaudio==0.7.0
  Downloading torchaudio-0.7.0-cp37-cp37m-manylinux1_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 28.8 MB/s 
Collecting dataclasses
  Downloading dataclasses-0.6-py3-none-any.whl (14 kB)
Installing collected packages: dataclasses, torch, torchvision, torchaudio
  

In [None]:
import pandas as pd
import numpy as np
import torch

from torchtext import datasets

from torchtext.data import Field, LabelField
from torchtext.data import BucketIterator

from torchtext.vocab import Vectors, GloVe

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
from tqdm.autonotebook import tqdm

В этом задании мы будем использовать библиотеку torchtext. Она довольна проста в использовании и поможет нам сконцентрироваться на задаче, а не на написании Dataloader-а.

In [None]:
TEXT = Field(sequential=True, lower=True, include_lengths=True)  # Поле текста
LABEL = LabelField(dtype=torch.float)  # Поле метки



In [None]:
SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

Датасет на котором мы будем проводить эксперементы это комментарии к фильмам из сайта IMDB.

In [None]:
train, test = datasets.IMDB.splits(TEXT, LABEL)  # загрузим датасет
valid, test = test.split(random_state=random.seed(SEED), 
                         stratified=True)  # разобьем на части



In [None]:
TEXT.build_vocab(train, min_freq=5)
LABEL.build_vocab(train, min_freq=5)

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, valid_iter, test_iter = BucketIterator.splits(
    (train, valid, test), 
    batch_size = 64,
    sort_within_batch = True,
    device = device)



## RNN

Для начала попробуем использовать рекурентные нейронные сети. На семинаре вы познакомились с GRU, вы можете также попробовать LSTM. Можно использовать для классификации как hidden_state, так и output последнего токена.

In [None]:
class RNNBaseline(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()

        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        
        self.rnn = nn.LSTM(embedding_dim, 
                           hidden_dim, 
                           num_layers=n_layers, 
                           bidirectional=bidirectional)
        
        self.flatten = nn.Flatten()
        
        directions_count = 2 if bidirectional else 1
        
        self.fc = nn.Linear(directions_count*hidden_dim, output_dim)

        self.dropout = nn.Dropout(p=dropout)
        
    def forward(self, text, text_lengths):
        #text = [sent len, batch size]
        
        embedded = self.embedding(text)
        
        #embedded = [sent len, batch size, emb dim]
        
        #pack sequence
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths)
        
        # cell arg for LSTM, remove for GRU
        packed_output, (hidden, cell) = self.rnn(packed_embedded)
        #unpack sequence
        #output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)  

        #output = [sent len, batch size, hid dim * num directions]
        #output over padding tokens are zero tensors
        
        #hidden = [num layers * num directions, batch size, hid dim]
        #cell = [num layers * num directions, batch size, hid dim]
        
        #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
        #and apply dropout
        
        if self.rnn.bidirectional:
            hidden = torch.cat((hidden[-2, :, :], hidden[-1, :, :]), dim=-1)
        else:
            hidden = hidden[-1, :, :]

        #hidden = [batch size, hid dim * num directions]
            
        return self.fc(self.dropout(hidden))

Поиграйтесь с гиперпараметрами

In [None]:
vocab_size = len(TEXT.vocab)
emb_dim = 128
hidden_dim = 128
output_dim = 1
n_layers = 1
bidirectional = True
dropout = 0.75
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
patience = 5

In [None]:
model = RNNBaseline(
    vocab_size=vocab_size,
    embedding_dim=emb_dim,
    hidden_dim=hidden_dim,
    output_dim=output_dim,
    n_layers=n_layers,
    bidirectional=bidirectional,
    dropout=dropout,
    pad_idx=PAD_IDX
)

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

model = model.to(device)
print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 6,172,801 trainable parameters


In [None]:
opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

max_epochs = 20

Обучите сетку! Используйте любые вам удобные инструменты, Catalyst, PyTorch Lightning или свои велосипеды.

In [None]:
import numpy as np

min_loss = np.inf

cur_patience = 0

for epoch in range(1, max_epochs + 1):
    train_loss = 0.0
    model.train()
    pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
    pbar.set_description(f"Epoch {epoch}")

    for it, batch in pbar:
        opt.zero_grad()
        (text_indices, text_lengths), labels = batch
        text_lengths = text_lengths.cpu()

        preds = model.forward(text_indices, text_lengths)
        loss = loss_func(preds, labels.unsqueeze(1))
        loss.backward()
        opt.step()

        train_loss += loss.item()

    train_loss /= len(train_iter)
    val_loss = 0.0
    model.eval()
    pbar = tqdm(enumerate(valid_iter), total=len(valid_iter), leave=False)
    pbar.set_description(f"Epoch {epoch}")

    for it, batch in pbar:
        (text_indices, text_lengths), labels = batch
        text_lengths = text_lengths.cpu()

        preds = model.forward(text_indices, text_lengths)

        with torch.no_grad():
            loss = loss_func(preds, labels.unsqueeze(1))
            val_loss += loss.item()

    val_loss /= len(valid_iter)
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, 
                                                                     round(train_loss, 4), 
                                                                     round(val_loss, 4)))
    
    if val_loss < min_loss:
        min_loss = val_loss
        best_model = model.state_dict()
    else:
        cur_patience += 1
        if cur_patience == patience:
            cur_patience = 0
            break

model.load_state_dict(best_model)

  0%|          | 0/391 [00:00<?, ?it/s]



  0%|          | 0/274 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.0171, Validation Loss: 0.6406


  0%|          | 0/391 [00:00<?, ?it/s]

  0%|          | 0/274 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.0159, Validation Loss: 0.6406


  0%|          | 0/391 [00:00<?, ?it/s]

  0%|          | 0/274 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.0168, Validation Loss: 0.6406


  0%|          | 0/391 [00:00<?, ?it/s]

KeyboardInterrupt: ignored

Посчитайте f1-score вашего классификатора на тестовом датасете.

**Ответ**: 0.57

In [None]:
TP, TN, FP, FN = 0, 0, 0, 0

model.eval()

for batch in test_iter:
    (text_indices, text_lengths), labels = batch
    text_lengths = text_lengths.cpu()
    labels = labels.cpu().numpy()

    preds = model.forward(text_indices, text_lengths).detach().cpu().numpy()
    predicted_labels = (preds > 0)

    TP += np.sum(np.logical_and(predicted_labels == labels, labels == 1))
    TN += np.sum(np.logical_and(predicted_labels == labels, labels == 0))
    FP += np.sum(np.logical_and(predicted_labels != labels, labels == 0))
    FN += np.sum(np.logical_and(predicted_labels != labels, labels == 1))

PR = TP / (TP + FP)
RC = TP / (TP + FN)

f1_score = 2 * PR * RC / (PR + RC)
print(f1_score)



0.5743824809443638


## CNN

![](https://www.researchgate.net/publication/333752473/figure/fig1/AS:769346934673412@1560438011375/Standard-CNN-on-text-classification.png)

Для классификации текстов также часто используют сверточные нейронные сети. Идея в том, что как правило сентимент содержат словосочетания из двух-трех слов, например "очень хороший фильм" или "невероятная скука". Проходясь сверткой по этим словам мы получим какой-то большой скор и выхватим его с помощью MaxPool. Далее идет обычная полносвязная сетка. Важный момент: свертки применяются не последовательно, а параллельно. Давайте попробуем!

In [None]:
TEXT = Field(sequential=True, lower=True, batch_first=True) # batch_first тк мы используем conv
LABEL = LabelField(batch_first=True, dtype=torch.float)

trn, test = datasets.IMDB.splits(TEXT, LABEL)
vld, tst = test.split(random_state=random.seed(SEED), 
                      stratified=True)

TEXT.build_vocab(trn, min_freq=5)
LABEL.build_vocab(trn, min_freq=5)

device = "cuda" if torch.cuda.is_available() else "cpu"



downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:09<00:00, 9.15MB/s]


In [None]:
train_iter, val_iter, test_iter = BucketIterator.splits(
    (trn, vld, tst),
    batch_sizes=(128, 128, 128),
    sort=False,
    sort_key= lambda x: len(x.src),
    sort_within_batch=False,
    device=device,
    repeat=False,
)



Вы можете использовать Conv2d с `in_channels=1, kernel_size=(kernel_sizes[0], emb_dim))` или Conv1d c `in_channels=emb_dim, kernel_size=kernel_size[0]`. Но хорошенько подумайте над shape в обоих случаях.

In [None]:
class CNN(nn.Module):
    def __init__(
        self,
        vocab_size,
        emb_dim,
        out_channels,
        kernel_sizes,
        paddings,
        dropout=0.5,
    ):
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, emb_dim)

        assert len(kernel_sizes) == len(paddings)

        for i, (kernel_size, padding) in enumerate(zip(kernel_sizes, paddings)):
            setattr(self, f"conv_{i}", nn.Conv1d(in_channels=emb_dim, 
                                                 out_channels=out_channels, 
                                                 kernel_size=kernel_size, 
                                                 padding=padding))
            
        self.kernels_count = len(kernel_sizes)

        self.fc = nn.Linear(len(kernel_sizes)*out_channels, 1)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        embedded = self.embedding(text)
        
        embedded = embedded.permute(0, 2, 1)

        pooled_tensors = []
        
        for i in range(self.kernels_count):
            conv_layer = getattr(self, f"conv_{i}")
            conved = F.relu(conv_layer(embedded))

            pooled = F.max_pool1d(conved, conved.shape[2]).squeeze(2)
            pooled_tensors.append(pooled)

        cat = self.dropout(torch.cat(pooled_tensors, dim=1))
            
        return self.fc(cat)

In [None]:
kernel_sizes = [2, 3, 4, 5]
paddings = [1, 1, 1, 1]
vocab_size = len(TEXT.vocab)
out_channels=64
dropout = 0.5
dim = 300
patience = 5

model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=out_channels,
            kernel_sizes=kernel_sizes, paddings=paddings, dropout=dropout)

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

model.to(device)
print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 14,117,013 trainable parameters


In [None]:
opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

In [None]:
max_epochs = 30

Обучите!

In [None]:
import numpy as np


min_loss = np.inf

cur_patience = 0

for epoch in range(1, max_epochs + 1):
    train_loss = 0.0
    model.train()
    pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
    pbar.set_description(f"Epoch {epoch}")

    for it, batch in pbar:
        opt.zero_grad()
        text_indices, labels = batch

        preds = model.forward(text_indices)
        loss = loss_func(preds, labels.unsqueeze(1))
        loss.backward()
        opt.step()

        train_loss += loss.item()

    train_loss /= len(train_iter)
    val_loss = 0.0
    model.eval()
    pbar = tqdm(enumerate(val_iter), total=len(val_iter), leave=False)
    pbar.set_description(f"Epoch {epoch}")

    for it, batch in pbar:
        text_indices, labels = batch

        preds = model.forward(text_indices)

        with torch.no_grad():
            loss = loss_func(preds, labels.unsqueeze(1))
            val_loss += loss.item()

    val_loss /= len(val_iter)
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, 
                                                                     round(train_loss, 4), 
                                                                     round(val_loss, 4)))
    
    if val_loss < min_loss:
        min_loss = val_loss
        best_model = model.state_dict()
    else:
        cur_patience += 1
        if cur_patience == patience:
            cur_patience = 0
            break

model.load_state_dict(best_model)

  0%|          | 0/196 [00:00<?, ?it/s]



  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.4596, Validation Loss: 0.3797


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.4048, Validation Loss: 0.3453


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.3378, Validation Loss: 0.3321


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.2778, Validation Loss: 0.3023


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.2231, Validation Loss: 0.298


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.1586, Validation Loss: 0.3103


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 7, Training Loss: 0.1122, Validation Loss: 0.3334


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 8, Training Loss: 0.079, Validation Loss: 0.3613


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 9, Training Loss: 0.0601, Validation Loss: 0.39


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 10, Training Loss: 0.0409, Validation Loss: 0.4179


<All keys matched successfully>

Посчитайте f1-score вашего классификатора.

**Ответ**: 0.873

In [None]:
TP, TN, FP, FN = 0, 0, 0, 0

model.eval()

for batch in test_iter:
    text_indices, labels = batch
    labels = labels.cpu().numpy()

    preds = model.forward(text_indices).detach().cpu().numpy()
    predicted_labels = (preds > 0)

    TP += np.sum(np.logical_and(predicted_labels == labels, labels == 1))
    TN += np.sum(np.logical_and(predicted_labels == labels, labels == 0))
    FP += np.sum(np.logical_and(predicted_labels != labels, labels == 0))
    FN += np.sum(np.logical_and(predicted_labels != labels, labels == 1))

PR = TP / (TP + FP)
RC = TP / (TP + FN)

f1_score = 2 * PR * RC / (PR + RC)
print(f1_score)



0.8731342030943798


## Интерпретируемость

Посмотрим, куда смотрит наша модель. Достаточно запустить код ниже.

In [None]:
!pip install -q captum

[?25l[K     |▎                               | 10 kB 26.5 MB/s eta 0:00:01[K     |▌                               | 20 kB 31.0 MB/s eta 0:00:01[K     |▊                               | 30 kB 35.7 MB/s eta 0:00:01[K     |█                               | 40 kB 38.9 MB/s eta 0:00:01[K     |█▏                              | 51 kB 23.4 MB/s eta 0:00:01[K     |█▍                              | 61 kB 25.9 MB/s eta 0:00:01[K     |█▋                              | 71 kB 27.9 MB/s eta 0:00:01[K     |█▉                              | 81 kB 27.7 MB/s eta 0:00:01[K     |██                              | 92 kB 29.1 MB/s eta 0:00:01[K     |██▎                             | 102 kB 30.7 MB/s eta 0:00:01[K     |██▌                             | 112 kB 30.7 MB/s eta 0:00:01[K     |██▊                             | 122 kB 30.7 MB/s eta 0:00:01[K     |███                             | 133 kB 30.7 MB/s eta 0:00:01[K     |███▏                            | 143 kB 30.7 MB/s eta 0:

In [None]:
from captum.attr import LayerIntegratedGradients, TokenReferenceBase, visualization

PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)

In [None]:
def forward_with_softmax(inp):
    logits = model(inp)
    return torch.softmax(logits, 0)[0][1]

def forward_with_sigmoid(input):
    return torch.sigmoid(model(input))


# accumalate couple samples in this array for visualization purposes
vis_data_records_ig = []

def interpret_sentence(model, sentence, min_len = 7, label = 0):
    model.eval()
    text = [tok for tok in TEXT.tokenize(sentence)]
    if len(text) < min_len:
        text += ['pad'] * (min_len - len(text))
    indexed = [TEXT.vocab.stoi[t] for t in text]

    model.zero_grad()

    input_indices = torch.tensor(indexed, device=device)
    input_indices = input_indices.unsqueeze(0)
    
    # input_indices dim: [sequence_length]
    seq_length = min_len

    # predict
    pred = forward_with_sigmoid(input_indices).item()
    pred_ind = round(pred)

    # generate reference indices for each sample
    reference_indices = token_reference.generate_reference(seq_length, device=device).unsqueeze(0)

    # compute attributions and approximation delta using layer integrated gradients
    attributions_ig, delta = lig.attribute(input_indices, reference_indices, \
                                           n_steps=5000, return_convergence_delta=True)

    print('pred: ', LABEL.vocab.itos[pred_ind], '(', '%.2f'%pred, ')', ', delta: ', abs(delta))

    add_attributions_to_visualizer(attributions_ig, text, pred, pred_ind, label, delta, vis_data_records_ig)
    
def add_attributions_to_visualizer(attributions, text, pred, pred_ind, label, delta, vis_data_records):
    attributions = attributions.sum(dim=2).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    attributions = attributions.cpu().detach().numpy()

    # storing couple samples in an array for visualization purposes
    vis_data_records.append(visualization.VisualizationDataRecord(
                            attributions,
                            pred,
                            LABEL.vocab.itos[pred_ind],
                            LABEL.vocab.itos[label],
                            LABEL.vocab.itos[1],
                            attributions.sum(),       
                            text,
                            delta))

In [None]:
interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

pred:  pos ( 0.99 ) , delta:  tensor([0.0003], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.59 ) , delta:  tensor([4.0488e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.97 ) , delta:  tensor([2.2312e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.05 ) , delta:  tensor([4.1763e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.02 ) , delta:  tensor([0.0003], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.43 ) , delta:  tensor([4.5760e-05], device='cuda:0', dtype=torch.float64)


Попробуйте добавить свои примеры!

In [None]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.31,It was a fantastic performance ! pad
,,,,
pos,pos (0.59),pos,0.75,Best film ever pad pad pad pad
,,,,
pos,pos (0.97),pos,1.01,Such a great show! pad pad pad
,,,,
neg,neg (0.05),pos,-1.2,It was a horrible movie pad pad
,,,,
neg,neg (0.02),pos,-0.91,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.31,It was a fantastic performance ! pad
,,,,
pos,pos (0.59),pos,0.75,Best film ever pad pad pad pad
,,,,
pos,pos (0.97),pos,1.01,Such a great show! pad pad pad
,,,,
neg,neg (0.05),pos,-1.2,It was a horrible movie pad pad
,,,,
neg,neg (0.02),pos,-0.91,I've never watched something as bad pad
,,,,


## Эмбеддинги слов

Вы ведь не забыли, как мы можем применить знания о word2vec и GloVe. Давайте попробуем!

In [None]:
glove = GloVe(dim=300)

.vector_cache/glove.840B.300d.zip: 2.18GB [06:56, 5.23MB/s]                            
100%|█████████▉| 2196016/2196017 [03:31<00:00, 10392.32it/s]


In [None]:
TEXT.build_vocab(trn, min_freq=5, vectors=glove)
LABEL.build_vocab(trn)

word_embeddings = TEXT.vocab.vectors

kernel_sizes = [2, 3, 4, 5]
paddings = [1, 1, 1, 1]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300

In [None]:
trn, test = datasets.IMDB.splits(TEXT, LABEL)
vld, tst = test.split(random_state=random.seed(SEED))

device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, val_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 128, 128),
        sort=False,
        sort_key= lambda x: len(x.src),
        sort_within_batch=False,
        device=device,
        repeat=False,
)



In [None]:
model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=64,
            kernel_sizes=kernel_sizes, paddings=paddings, dropout=dropout)

word_embeddings = TEXT.vocab.vectors

prev_shape = model.embedding.weight.shape

model.embedding.weight = nn.Parameter(word_embeddings)

assert prev_shape == model.embedding.weight.shape
model.to(device)

opt = torch.optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

Вы знаете, что делать.

In [None]:
def freeze_embeddings(model, req_grad=False):
    embeddings = model.embedding
    for c_p in embeddings.parameters():
        c_p.requires_grad = req_grad

freeze_embeddings(model)

In [None]:
import numpy as np


min_loss = np.inf

cur_patience = 0
max_epochs = 30
patience = 5

best_model = None

for epoch in range(1, max_epochs + 1):
    train_loss = 0.0
    model.train()
    pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
    pbar.set_description(f"Epoch {epoch}")

    for it, batch in pbar:
        opt.zero_grad()
        text_indices, labels = batch

        preds = model.forward(text_indices)
        loss = loss_func(preds, labels.unsqueeze(1))
        loss.backward()
        opt.step()

        train_loss += loss.item()

    train_loss /= len(train_iter)
    val_loss = 0.0
    model.eval()
    pbar = tqdm(enumerate(val_iter), total=len(val_iter), leave=False)
    pbar.set_description(f"Epoch {epoch}")

    for it, batch in pbar:
        text_indices, labels = batch

        preds = model.forward(text_indices)

        with torch.no_grad():
            loss = loss_func(preds, labels.unsqueeze(1))
            val_loss += loss.item()

    val_loss /= len(val_iter)
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, 
                                                                     round(train_loss, 4), 
                                                                     round(val_loss, 4)))
    
    if val_loss < min_loss:
        min_loss = val_loss
        best_model = model.state_dict()
    else:
        cur_patience += 1
        if cur_patience == patience:
            cur_patience = 0
            break

model.load_state_dict(best_model)

  0%|          | 0/196 [00:00<?, ?it/s]



  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 1, Training Loss: 0.4583, Validation Loss: 0.335


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 2, Training Loss: 0.3375, Validation Loss: 0.3034


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 3, Training Loss: 0.2947, Validation Loss: 0.2866


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 4, Training Loss: 0.2657, Validation Loss: 0.2793


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 5, Training Loss: 0.2375, Validation Loss: 0.2883


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 6, Training Loss: 0.2142, Validation Loss: 0.2722


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 7, Training Loss: 0.188, Validation Loss: 0.2685


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 8, Training Loss: 0.1702, Validation Loss: 0.2668


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 9, Training Loss: 0.1501, Validation Loss: 0.2741


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 10, Training Loss: 0.1365, Validation Loss: 0.2774


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 11, Training Loss: 0.1206, Validation Loss: 0.2943


  0%|          | 0/196 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Epoch: 12, Training Loss: 0.112, Validation Loss: 0.2837


<All keys matched successfully>

Посчитайте f1-score вашего классификатора.

**Ответ**: 0.51

In [None]:
TP, TN, FP, FN = 0, 0, 0, 0

model.eval()

for batch in test_iter:
    text_indices, labels = batch
    labels = labels.cpu().numpy()

    preds = model.forward(text_indices).detach().cpu().numpy()
    predicted_labels = (preds > 0)

    TP += np.sum(np.logical_and(predicted_labels == labels, labels == 1))
    TN += np.sum(np.logical_and(predicted_labels == labels, labels == 0))
    FP += np.sum(np.logical_and(predicted_labels != labels, labels == 0))
    FN += np.sum(np.logical_and(predicted_labels != labels, labels == 1))

PR = TP / (TP + FP)
RC = TP / (TP + FN)

f1_score = 2 * PR * RC / (PR + RC)
print(f1_score)



0.5149747426279919


Проверим насколько все хорошо!

In [None]:
PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)
vis_data_records_ig = []

interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

pred:  pos ( 0.99 ) , delta:  tensor([0.0005], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.55 ) , delta:  tensor([4.8424e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.88 ) , delta:  tensor([0.0003], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.01 ) , delta:  tensor([3.4112e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.28 ) , delta:  tensor([5.5420e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)


In [None]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.35,It was a fantastic performance ! pad
,,,,
pos,pos (0.55),pos,1.45,Best film ever pad pad pad pad
,,,,
pos,pos (0.88),pos,1.46,Such a great show! pad pad pad
,,,,
neg,neg (0.01),pos,-0.95,It was a horrible movie pad pad
,,,,
neg,neg (0.28),pos,0.12,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.99),pos,1.35,It was a fantastic performance ! pad
,,,,
pos,pos (0.55),pos,1.45,Best film ever pad pad pad pad
,,,,
pos,pos (0.88),pos,1.46,Such a great show! pad pad pad
,,,,
neg,neg (0.01),pos,-0.95,It was a horrible movie pad pad
,,,,
neg,neg (0.28),pos,0.12,I've never watched something as bad pad
,,,,
