<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ</b></h3>

Основано на: https://github.com/DanAnastasyev/DeepNLP-Course Week 12


# GoalOriented


In [None]:
!git clone https://github.com/MiuLab/SlotGated-SLU.git
!wget -qq https://raw.githubusercontent.com/yandexdataschool/nlp_course/master/week08_multitask/conlleval.py

Cloning into 'SlotGated-SLU'...
remote: Enumerating objects: 51, done.[K
remote: Total 51 (delta 0), reused 0 (delta 0), pack-reused 51[K
Unpacking objects: 100% (51/51), done.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
DEVICE = torch.device('cuda')

# Диалоговые системы

Диалоговые системы делятся на два типа - *goal-orientied* и *general conversation*.

**General conversation** - это болталка, разговор на свободную тему:  
<img src="https://i.ibb.co/bFwwGpc/alice.jpg" width="200"/>

Сегодня будем говорить не про них, а про **goal-orientied** системы:

<img src="https://hsto.org/webt/gj/3y/xl/gj3yxlqbr7ujuqr9r2akacxmkee.jpeg" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Пользователь говорит что-то, это что-то распознается. По распознанному определяется - что, где и когда он хотел. Дальше диалоговый движок решает, действительно ли пользователь знает, чего хотел попросить. Происходит поход в источники - узнать информацию, которую (кажется) запросил пользователь. Исходя из всего этого генерируется некоторый ответ:

<img src="https://i.ibb.co/8XcdpJ7/goal-orientied.png" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Будем учить ту часть, которая посередине - классификатор и теггер. Всё остальное обычно - эвристики и захардкоженные ответы.

## Данные

Есть условно стандартный датасет - atis, который неприлично маленький, на самом деле.

К нему можно взять еще датасет snips - он больше и разнообразнее.

Оба датасета возьмем из репозитория статьи [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://aclweb.org/anthology/N18-2118).

Начнем с atis.

In [None]:
import os 

def read_dataset(path):
    with open(os.path.join(path, 'seq.in')) as f_words, \
            open(os.path.join(path, 'seq.out')) as f_tags, \
            open(os.path.join(path, 'label')) as f_intents:
        
        return [
            (words.strip().split(), tags.strip().split(), intent.strip()) 
            for words, tags, intent in zip(f_words, f_tags, f_intents)
        ]

In [None]:
train_data = read_dataset('SlotGated-SLU/data/atis/train/') + read_dataset('SlotGated-SLU/data/snips/train/')
val_data = read_dataset('SlotGated-SLU/data/atis/valid/') + read_dataset('SlotGated-SLU/data/snips/valid/')
test_data = read_dataset('SlotGated-SLU/data/atis/test/') + read_dataset('SlotGated-SLU/data/snips/test/')

In [None]:
intent_to_example = {example[2]: example for example in train_data}
for example in intent_to_example.values():
    print('Intent:\t', example[2])
    print('Text:\t', '\t'.join(example[0]))
    print('Tags:\t', '\t'.join(example[1]))
    print()

Intent:	 atis_flight
Text:	 is	there	a	delta	flight	from	denver	to	san	francisco
Tags:	 O	O	O	B-airline_name	O	O	B-fromloc.city_name	O	B-toloc.city_name	I-toloc.city_name

Intent:	 atis_airfare
Text:	 what	is	the	most	expensive	one	way	fare	from	boston	to	atlanta	on	american	airlines
Tags:	 O	O	O	B-cost_relative	I-cost_relative	B-round_trip	I-round_trip	O	O	B-fromloc.city_name	O	B-toloc.city_name	O	B-airline_name	I-airline_name

Intent:	 atis_airline
Text:	 list	airlines	serving	between	denver	and	san	francisco
Tags:	 O	O	O	O	B-fromloc.city_name	O	B-toloc.city_name	I-toloc.city_name

Intent:	 atis_ground_service
Text:	 tell	me	about	ground	transportation	between	orlando	international	and	orlando
Tags:	 O	O	O	O	O	O	B-fromloc.airport_name	I-fromloc.airport_name	O	B-toloc.city_name

Intent:	 atis_quantity
Text:	 how	many	airlines	have	flights	with	service	class	yn
Tags:	 O	O	O	O	O	O	O	O	B-fare_basis_code

Intent:	 atis_city
Text:	 where	is	lester	pearson	airport
Tags:	 O	O	B-airport_name	

In [None]:
from torchtext.data import Field, LabelField, Example, Dataset, BucketIterator

tokens_field = Field()
tags_field = Field(unk_token=None)
intent_field = LabelField()

fields = [('tokens', tokens_field), ('tags', tags_field), ('intent', intent_field)]

train_dataset = Dataset([Example.fromlist(example, fields) for example in train_data], fields)
val_dataset = Dataset([Example.fromlist(example, fields) for example in val_data], fields)
test_dataset = Dataset([Example.fromlist(example, fields) for example in test_data], fields)

tokens_field.build_vocab(train_dataset)
tags_field.build_vocab(train_dataset)
intent_field.build_vocab(train_dataset)

print('Vocab size =', len(tokens_field.vocab))
print('Tags count =', len(tags_field.vocab))
print('Intents count =', len(intent_field.vocab))

train_iter, val_iter, test_iter = BucketIterator.splits(
    datasets=(train_dataset, val_dataset, test_dataset), batch_sizes=(32, 128, 128), 
    shuffle=True, device=DEVICE, sort=False
)

Vocab size = 11804
Tags count = 192
Intents count = 28


In [None]:
print('Num train batch =', len(train_iter))
print('Num val batch =', len(val_iter))
print('Num test batch =', len(test_iter))

Num train batch = 549
Num val batch = 10
Num test batch = 13


## LSTM-pytorch(напоминание)

batch_first=False<br>
hidden - [num_layers * num_direction, batch_size, hid_dim]<br>
output - [len_seq, batch_size, hid_dim * num_direction]<br>
<br>
batch_first=True<br>
hidden - [num_layers * num_direction, batch_size, hid_dim]<br>
output - [batch_size, len_seq, hid_dim * num_direction]<br>


## Классификатор интентов

Начнем с классификатора: к какому интенту относится данный запрос.

Ничего умного - берём rnn'ку и учимся предсказывать метки-интенты.

In [None]:
class IntentClassifierModel(nn.Module):
    def __init__(self, vocab_size, intents_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(input_size=emb_dim, hidden_size=lstm_hidden_dim, bidirectional=True, num_layers=num_layers, batch_first=True)
        self.out_layer = nn.Linear(lstm_hidden_dim * 2, intents_count)

    def forward(self, inputs):

        projections = self.embeddings_layer(inputs)
        _, (final_hidden_state, _) = self.lstm_layer(projections)
        # cat final_hidden_state
        hidden = self.dropout(torch.cat([final_hidden_state[0], final_hidden_state[1]], dim=1))

        output = self.out_layer(hidden)
        return output

In [None]:
model = IntentClassifierModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab)).to(DEVICE)
for x in train_iter:
    break
print(x.tokens.shape)
print(x.intent.shape)
model(x.tokens.transpose(0, 1)).shape

torch.Size([20, 32])
torch.Size([32])


torch.Size([32, 28])

In [None]:
!pip install colorama
from colorama import Fore, Style

Collecting colorama
  Downloading https://files.pythonhosted.org/packages/44/98/5b86278fbbf250d239ae0ecb724f8572af1c91f4a11edf4d36a206189440/colorama-0.4.4-py2.py3-none-any.whl
Installing collected packages: colorama
Successfully installed colorama-0.4.4


In [None]:
class ModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self, is_train):
        if is_train:
            return '{}{:>5s} Loss = {:.5f}, Accuracy = {:.2%}{}'.format(
                Fore.RED, self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count, Style.RESET_ALL
            )
        else:
            return '{}{:>5s} Loss = {:.5f}, Accuracy = {:.2%}{}'.format(
                Fore.GREEN, self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count, Style.RESET_ALL
            )
        
    def on_batch(self, batch):
        logits = self.model(batch.tokens.transpose(0, 1))

        # loss
        loss = self.criterion(logits, batch.intent)
        # predicts
        predicted_intent = logits.argmax(dim=1)
        self.total_count += predicted_intent.size(0)
        self.correct_count += torch.sum(predicted_intent == batch.intent).item()
        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [None]:
import math
from tqdm import tqdm
tqdm.get_lock().locks = []


def do_epoch(trainer, data_iter, is_train, name=None):
    trainer.on_epoch_begin(is_train, name, batches_count=len(data_iter))
    
    with torch.autograd.set_grad_enabled(is_train):
        with tqdm(total=trainer.batches_count) as progress_bar:
            for i, batch in enumerate(data_iter):
                batch_progress = trainer.on_batch(batch)

                progress_bar.update()
                progress_bar.set_description(batch_progress)

            epoch_progress = trainer.on_epoch_end(is_train)
            progress_bar.set_description(epoch_progress)
            progress_bar.refresh()

            
def fit(trainer, train_iter, epochs_count=1, val_iter=None):
    best_val_loss = None
    for epoch in range(epochs_count):
        name_prefix = '[{} / {}] '.format(epoch + 1, epochs_count)
        do_epoch(trainer, train_iter, is_train=True, name=name_prefix + 'Train:')
        
        if not val_iter is None:
            do_epoch(trainer, val_iter, is_train=False, name=name_prefix + '  Val:')        

In [None]:
model = IntentClassifierModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
#collect all
trainer = ModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=10, val_iter=val_iter)

[31m[1 / 10] Train: Loss = 0.55999, Accuracy = 85.79%[0m: 100%|██████████| 549/549 [00:04<00:00, 114.84it/s]
[32m[1 / 10]   Val: Loss = 0.32010, Accuracy = 91.33%[0m: 100%|██████████| 10/10 [00:00<00:00, 153.52it/s]
[31m[2 / 10] Train: Loss = 0.17612, Accuracy = 95.32%[0m: 100%|██████████| 549/549 [00:04<00:00, 118.86it/s]
[32m[2 / 10]   Val: Loss = 0.19864, Accuracy = 94.00%[0m: 100%|██████████| 10/10 [00:00<00:00, 163.03it/s]
[31m[3 / 10] Train: Loss = 0.10052, Accuracy = 97.37%[0m: 100%|██████████| 549/549 [00:04<00:00, 119.40it/s]
[32m[3 / 10]   Val: Loss = 0.17302, Accuracy = 95.75%[0m: 100%|██████████| 10/10 [00:00<00:00, 148.90it/s]
[31m[4 / 10] Train: Loss = 0.06026, Accuracy = 98.44%[0m: 100%|██████████| 549/549 [00:04<00:00, 115.37it/s]
[32m[4 / 10]   Val: Loss = 0.16303, Accuracy = 96.25%[0m: 100%|██████████| 10/10 [00:00<00:00, 151.14it/s]
[31m[5 / 10] Train: Loss = 0.03737, Accuracy = 99.11%[0m: 100%|██████████| 549/549 [00:04<00:00, 115.05it/s]
[32m[5 /

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

[32mTest: Loss = 0.26252, Accuracy = 94.16%[0m: 100%|██████████| 13/13 [00:00<00:00, 154.18it/s]


## Теггер

![](https://commons.bmstu.wiki/images/0/00/NER1.png)  
*From [NER](https://ru.bmstu.wiki/NER_(Named-Entity_Recognition)*

#### **Задание 1.1**
Напишите простой теггер

In [None]:
class TokenTaggerModel(nn.Module):
    def __init__(self, vocab_size, tags_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(input_size=emb_dim, hidden_size=lstm_hidden_dim, bidirectional=True, num_layers=num_layers, batch_first=True)
        self.out_layer = nn.Linear(lstm_hidden_dim * 2, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer(inputs)
        output, _ = self.lstm_layer(projections)

        output = self.out_layer(self.dropout(output))
        return output

In [None]:
model = TokenTaggerModel(vocab_size=len(tokens_field.vocab), tags_count=len(tags_field.vocab)).to(DEVICE)
for x in train_iter:
    break
print(x.tokens.shape)
print(x.tags.shape)
model(x.tokens.transpose(0, 1)).shape

torch.Size([18, 32])
torch.Size([18, 32])


torch.Size([32, 18, 192])

#### **Задание 1.2**
Обновите `ModelTrainer`: считать нужно всё те же лосс и accuracy, только теперь немного по-другому.

In [None]:
class TagModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self, is_train):
        if is_train:
            return '{}{:>5s} Loss = {:.5f}, Accuracy = {:.2%}{}'.format(
                Fore.RED, self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count, Style.RESET_ALL
            )
        else:
            return '{}{:>5s} Loss = {:.5f}, Accuracy = {:.2%}{}'.format(
                Fore.GREEN, self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count, Style.RESET_ALL
            )
        
    def on_batch(self, batch):
        logits = self.model(batch.tokens.transpose(0, 1))
        #loss
        true_tags = batch.tags.transpose(0, 1)
        loss = self.criterion(logits.transpose(1, 2), true_tags)
        #predicts
        predicted_tags = logits.argmax(dim=2)
        self.correct_count += torch.sum(true_tags == predicted_tags).item() - torch.sum(true_tags == 0).item()
        self.total_count += torch.sum(true_tags != 0).item()
        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [None]:
model = TokenTaggerModel(vocab_size=len(tokens_field.vocab), tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
#collect all
trainer = TagModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=10, val_iter=val_iter)

[31m[1 / 10] Train: Loss = 0.77032, Accuracy = 68.60%[0m: 100%|██████████| 549/549 [00:04<00:00, 118.83it/s]
[32m[1 / 10]   Val: Loss = 0.23957, Accuracy = 84.83%[0m: 100%|██████████| 10/10 [00:00<00:00, 147.13it/s]
[31m[2 / 10] Train: Loss = 0.24152, Accuracy = 87.24%[0m: 100%|██████████| 549/549 [00:04<00:00, 117.75it/s]
[32m[2 / 10]   Val: Loss = 0.14454, Accuracy = 90.65%[0m: 100%|██████████| 10/10 [00:00<00:00, 145.17it/s]
[31m[3 / 10] Train: Loss = 0.15128, Accuracy = 91.63%[0m: 100%|██████████| 549/549 [00:04<00:00, 117.45it/s]
[32m[3 / 10]   Val: Loss = 0.10156, Accuracy = 92.15%[0m: 100%|██████████| 10/10 [00:00<00:00, 150.84it/s]
[31m[4 / 10] Train: Loss = 0.10950, Accuracy = 93.90%[0m: 100%|██████████| 549/549 [00:04<00:00, 119.60it/s]
[32m[4 / 10]   Val: Loss = 0.08291, Accuracy = 94.06%[0m: 100%|██████████| 10/10 [00:00<00:00, 150.99it/s]
[31m[5 / 10] Train: Loss = 0.08215, Accuracy = 95.35%[0m: 100%|██████████| 549/549 [00:04<00:00, 119.04it/s]
[32m[5 /

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

[32mTest: Loss = 0.08637, Accuracy = 95.22%[0m: 100%|██████████| 13/13 [00:00<00:00, 154.69it/s]


In [None]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1)).transpose(1, 2).argmax(dim=1).cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 88.01%, Recall = 89.83%, F1 = 88.91%


## Multi-task learning

Реализуем модель, которая умеет сразу и предсказывать теги и интенты. Идея в том, что в этом всем есть общая информация, которая должна помочь как одной, так и другой задаче: зная интент, можно понять, какие слоты вообще могут быть, а зная слоты, можно угадать и интент.

#### **Задание 2.1**
Реализуйте объединенную модель.

In [None]:
class SharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=300,
                 lstm_hidden_dim=256, num_layers=2, dropout_p=0.3):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        
        self.lstm_layer = nn.LSTM(input_size=emb_dim, hidden_size=lstm_hidden_dim, bidirectional=True, num_layers=num_layers, batch_first=True)
        self.out_layer_intent = nn.Linear(2 * lstm_hidden_dim, intents_count)
        self.out_layer_tags = nn.Linear(2 * lstm_hidden_dim, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer(inputs)
        output, (hidden, _) = self.lstm_layer(projections)
        hidden = torch.cat([hidden[0], hidden[1]], dim=1)

        intent_output = self.out_layer_intent(self.dropout(hidden))
        tags_output = self.out_layer_tags(self.dropout(output))
        
        return tags_output, intent_output

In [None]:
model = SharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                    tags_count=len(tags_field.vocab)).to(DEVICE)

for x in train_iter:
    break
print(x.tokens.shape)
print(x.intent.shape)
print(x.tags.shape)
tags_output, intent_output = model(x.tokens.transpose(0, 1))
tags_output.shape, intent_output.shape

torch.Size([17, 32])
torch.Size([32])
torch.Size([17, 32])


(torch.Size([32, 17, 192]), torch.Size([32, 28]))

#### **Задание 2.2**
Допишите SharedModelTrainer

In [None]:
class SharedModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tags_correct_count, self.tags_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self, is_train):
        if is_train:
            return '{}{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}{}'.format(
                Fore.RED, self.name, self.epoch_loss / self.batches_count, self.tags_correct_count / self.tags_total_count, self.intent_correct_count / self.intent_total_count,\
                Style.RESET_ALL
            )
        else:
            return '{}{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}{}'.format(
                Fore.GREEN, self.name, self.epoch_loss / self.batches_count, self.tags_correct_count / self.tags_total_count, self.intent_correct_count / self.intent_total_count,\
                Style.RESET_ALL
            )
        
    def on_batch(self, batch):
        tags_logits, intent_logits = self.model(batch.tokens.transpose(0, 1))
        true_tags = batch.tags.transpose(0, 1)
        true_intent = batch.intent

        #loss
        tags_loss = self.criterion(tags_logits.transpose(1, 2), true_tags)
        intent_loss = self.criterion(intent_logits, true_intent)
        loss = tags_loss + intent_loss

        #predicts
        predicted_tags = tags_logits.argmax(axis=2)
        predicted_intent = intent_logits.argmax(axis=1)

        self.tags_correct_count += torch.sum(true_tags == predicted_tags).item() - torch.sum(true_tags == 0).item()
        self.tags_total_count += torch.sum(true_tags != 0).item()
        self.intent_correct_count += torch.sum(true_intent == predicted_intent).item()
        self.intent_total_count += true_intent.size(0)

        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [None]:
model = SharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                    tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
#collect all
trainer = SharedModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=20, val_iter=val_iter)

[31m[1 / 20] Train: Loss = 0.81605, Tags accuracy = 78.42%, Intents accuracy = 91.70%[0m: 100%|██████████| 549/549 [00:06<00:00, 80.83it/s]
[32m[1 / 20]   Val: Loss = 0.27999, Tags accuracy = 91.95%, Intents accuracy = 95.75%[0m: 100%|██████████| 10/10 [00:00<00:00, 102.38it/s]
[31m[2 / 20] Train: Loss = 0.17738, Tags accuracy = 93.94%, Intents accuracy = 98.02%[0m: 100%|██████████| 549/549 [00:06<00:00, 82.24it/s]
[32m[2 / 20]   Val: Loss = 0.16572, Tags accuracy = 94.85%, Intents accuracy = 97.58%[0m: 100%|██████████| 10/10 [00:00<00:00, 98.76it/s] 
[31m[3 / 20] Train: Loss = 0.08127, Tags accuracy = 96.94%, Intents accuracy = 99.28%[0m: 100%|██████████| 549/549 [00:06<00:00, 79.60it/s]
[32m[3 / 20]   Val: Loss = 0.13905, Tags accuracy = 95.95%, Intents accuracy = 97.83%[0m: 100%|██████████| 10/10 [00:00<00:00, 99.00it/s] 
[31m[4 / 20] Train: Loss = 0.04059, Tags accuracy = 98.43%, Intents accuracy = 99.73%[0m: 100%|██████████| 549/549 [00:07<00:00, 76.96it/s]
[32m[4 /

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

[32mTest: Loss = 0.32141, Tags accuracy = 96.04%, Intents accuracy = 95.98%[0m: 100%|██████████| 13/13 [00:00<00:00, 106.58it/s]


In [None]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            true = batch.tags.transpose(0, 1).cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in true])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 90.77%, Recall = 92.08%, F1 = 91.42%


 ## Асинхронное обучение

Идея описана в статье [A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling](http://aclweb.org/anthology/N18-2050).

<img src="https://i.ibb.co/qrgVSqF/2018-11-27-2-11-17.png" width="600"/>

Основное отличие от того, что уже реализовали в том, в каком порядке все оптимизируется. Вместо объединенного обучения всех слоев, сети для теггера и для классификатора обучаются отдельно.

На каждом шаге обучения генерируются последовательности скрытых состояний $h^1$ и $h^2$ - для классификатора и для теггера.

Дальше сначала считаются потери от предсказания интента и делается шаг оптимизатора, а затем потери от предсказания теггов - и опять шаг оптимизатора.

#### **Задание 3.1**
Реализуйте асинхронное обучение совместной модели

In [None]:
class AsyncSharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=300,
                 lstm_hidden_dim=256, num_layers=1, dropout_p=0.3):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)

        self.inner_lstm_layer_tags = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True, bidirectional=True, num_layers=num_layers)
        self.inner_lstm_layer_intent = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True, bidirectional=True, num_layers=num_layers)
        
        self.outer_lstm_layer_tags = nn.LSTM(lstm_hidden_dim * 4, lstm_hidden_dim, batch_first=True)
        self.outer_lstm_layer_intent = nn.LSTM(lstm_hidden_dim * 4, lstm_hidden_dim, batch_first=True)
        
        self.out_layer_intent = nn.Linear(lstm_hidden_dim, intents_count)
        self.out_layer_tags = nn.Linear(lstm_hidden_dim, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        out_intent, _ = self.inner_lstm_layer_intent(projections)
        out_tags, _ = self.inner_lstm_layer_tags(projections)

        h = torch.cat((out_intent, out_tags), dim=2)
        tags_output, _ = self.outer_lstm_layer_tags(h)
        _, (hidden, _) = self.outer_lstm_layer_intent(h)
        intent_output = hidden[-1]
        
        tags_output = self.dropout(tags_output)
        intent_output = self.dropout(intent_output)
        intent_output = self.out_layer_intent.forward(intent_output)
        tags_output = self.out_layer_tags.forward(tags_output)

        #print(tags_output.shape, intent_output.shape)
        return tags_output, intent_output

In [None]:
model = AsyncSharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                         tags_count=len(tags_field.vocab)).to(DEVICE)
for x in train_iter:
    break
print(x.tokens.shape)
print(x.intent.shape)
print(x.tags.shape)
tags_output, intent_output = model(x.tokens.transpose(0, 1))
tags_output.shape, intent_output.shape

torch.Size([15, 32])
torch.Size([32])
torch.Size([15, 32])


(torch.Size([32, 15, 192]), torch.Size([32, 28]))

In [None]:
class AsyncSharedModelTrainer():
    def __init__(self, model, criterion, tags_optimizer, intent_optimizer):
        self.model = model
        self.criterion = criterion
        self.tags_optimizer = tags_optimizer
        self.intent_optimizer = intent_optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tags_correct_count, self.tags_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self, is_train):
        if is_train:
            return '{}{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}{}'.format(
                Fore.RED, self.name, self.epoch_loss / self.batches_count, self.tags_correct_count / self.tags_total_count, self.intent_correct_count / self.intent_total_count,\
                Style.RESET_ALL
            )
        else:
            return '{}{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}{}'.format(
                Fore.GREEN, self.name, self.epoch_loss / self.batches_count, self.tags_correct_count / self.tags_total_count, self.intent_correct_count / self.intent_total_count,\
                Style.RESET_ALL
            )
        
        
    def on_batch(self, batch):
        tags_logits, intent_logits = self.model(batch.tokens.transpose(0, 1))
        true_tags = batch.tags.transpose(0, 1)
        true_intent = batch.intent
        #loss
        tags_loss = self.criterion(tags_logits.transpose(1, 2), true_tags)
        intent_loss = self.criterion(intent_logits, true_intent)
        #predicts
        predicted_tags = tags_logits.max(axis=2)[1]
        predicted_intent = intent_logits.max(axis=1)[1]
        self.tags_correct_count += torch.sum(true_tags == predicted_tags).item() - torch.sum(true_tags == 0).item()
        self.tags_total_count += torch.sum(true_tags != 0).item()
        self.intent_correct_count += torch.sum(true_intent == predicted_intent).item()
        self.intent_total_count += true_intent.size(0)
        if self.is_train:
    
            self.intent_optimizer.zero_grad()
            self.tags_optimizer.zero_grad()

            intent_loss.backward(retain_graph=True)
            tags_loss.backward()

            self.intent_optimizer.step()
            # model.embeddings_layer.zero_grad()
            self.tags_optimizer.step()
            
        self.epoch_loss += tags_loss.item() + intent_loss.item()

In [None]:
tags_parameters_names = [name for name, param in model.named_parameters() if not 'intent' in name]
intent_parameters_names = [name for name, param in model.named_parameters() if not 'tags' in name]
tags_parameters_names, intent_parameters_names

(['embeddings_layer.weight',
  'inner_lstm_layer_tags.weight_ih_l0',
  'inner_lstm_layer_tags.weight_hh_l0',
  'inner_lstm_layer_tags.bias_ih_l0',
  'inner_lstm_layer_tags.bias_hh_l0',
  'inner_lstm_layer_tags.weight_ih_l0_reverse',
  'inner_lstm_layer_tags.weight_hh_l0_reverse',
  'inner_lstm_layer_tags.bias_ih_l0_reverse',
  'inner_lstm_layer_tags.bias_hh_l0_reverse',
  'outer_lstm_layer_tags.weight_ih_l0',
  'outer_lstm_layer_tags.weight_hh_l0',
  'outer_lstm_layer_tags.bias_ih_l0',
  'outer_lstm_layer_tags.bias_hh_l0',
  'out_layer_tags.weight',
  'out_layer_tags.bias'],
 ['embeddings_layer.weight',
  'inner_lstm_layer_intent.weight_ih_l0',
  'inner_lstm_layer_intent.weight_hh_l0',
  'inner_lstm_layer_intent.bias_ih_l0',
  'inner_lstm_layer_intent.bias_hh_l0',
  'inner_lstm_layer_intent.weight_ih_l0_reverse',
  'inner_lstm_layer_intent.weight_hh_l0_reverse',
  'inner_lstm_layer_intent.bias_ih_l0_reverse',
  'inner_lstm_layer_intent.bias_hh_l0_reverse',
  'outer_lstm_layer_intent.we

Затем их нужно передать в отдельные оптимизаторы и учить отдельно.

*Еще, может быть, пригодится retain_graph параметр метода backward()*.

In [None]:
import torch
torch.manual_seed(0)

model = AsyncSharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                         tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)

tags_parameters = [param for name, param in model.named_parameters() if not 'intent' in name]
intent_parameters = [param for name, param in model.named_parameters() if not 'tags' in name]


tags_optimizer = optim.Adam(tags_parameters)
intent_optimizer = optim.Adam(intent_parameters)
#collect all
trainer = AsyncSharedModelTrainer(model, criterion, tags_optimizer, intent_optimizer)
fit(trainer, train_iter, epochs_count=10, val_iter=val_iter)

[31m[1 / 10] Train: Loss = 1.29559, Tags accuracy = 78.14%, Intents accuracy = 76.69%[0m: 100%|██████████| 549/549 [00:12<00:00, 45.71it/s]
[32m[1 / 10]   Val: Loss = 0.45093, Tags accuracy = 91.20%, Intents accuracy = 92.58%[0m: 100%|██████████| 10/10 [00:00<00:00, 62.99it/s]
[31m[2 / 10] Train: Loss = 0.26203, Tags accuracy = 93.65%, Intents accuracy = 96.52%[0m: 100%|██████████| 549/549 [00:12<00:00, 45.40it/s]
[32m[2 / 10]   Val: Loss = 0.25837, Tags accuracy = 94.86%, Intents accuracy = 95.25%[0m: 100%|██████████| 10/10 [00:00<00:00, 63.71it/s]
[31m[3 / 10] Train: Loss = 0.12315, Tags accuracy = 97.09%, Intents accuracy = 98.38%[0m: 100%|██████████| 549/549 [00:12<00:00, 45.50it/s]
[32m[3 / 10]   Val: Loss = 0.15893, Tags accuracy = 96.23%, Intents accuracy = 97.67%[0m: 100%|██████████| 10/10 [00:00<00:00, 66.74it/s]
[31m[4 / 10] Train: Loss = 0.05998, Tags accuracy = 98.56%, Intents accuracy = 99.35%[0m: 100%|██████████| 549/549 [00:11<00:00, 46.12it/s]
[32m[4 / 10

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

[32mTest: Loss = 0.29915, Tags accuracy = 96.54%, Intents accuracy = 96.36%[0m: 100%|██████████| 13/13 [00:00<00:00, 74.80it/s]


In [None]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 91.19%, Recall = 92.88%, F1 = 92.03%


#### **Задание 3.2**
Посмотрите на параметры в статье и попробуйте добиться похожего качества.

#### **Задание 4**
Посмотрите результаты на SNIPS

## Async Multi-task Learning for POS Tagging

Ещё одна статья: [Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings](https://arxiv.org/pdf/1805.08237.pdf)

Архитектура там такая:

<img src="https://i.ibb.co/0nSX6CC/2018-11-27-9-26-15.png" width="400"/>

Multi-task задача - обучение отдельных классификаторов более низкого уровня (над символами и словами) для предсказания тегов отдельными оптимизаторами.

## DeepPavlov go_bot

http://docs.deeppavlov.ai/en/master/features/skills/go_bot.html

Поддробные туториалы:

Simple: https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_tutorial.ipynb

Extended: https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb

# Дополнительные материалы

## Статьи
A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling, 2018 [[pdf]](http://aclweb.org/anthology/N18-2050)

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction, 2018 [[pdf]](http://aclweb.org/anthology/N18-2118) 

Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings, 2018 [[pdf]](https://arxiv.org/pdf/1805.08237.pdf)

BERT for Joint Intent Classification and Slot Filling
 [[pdf]](https://arxiv.org/pdf/1902.10909.pdf)

## Блоги
[Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)  