**«Механизм внимания»**

Решить задачу перевода с помощью механизма внимания

1. Возьмите англо-русскую пару фраз (www.manythings.org....org/anki/)

2. Обучите на них seq2seq with attention
  
  a. На основе скалярного произведения
  
  b. На основе MLP

3. Оцените качество

In [None]:
%matplotlib inline

In [None]:
from io import open
import unicodedata
import string
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# **1.Подготовка англо-русской пары фраз.**

**Подготовка данных.**

In [None]:
!unzip rus-eng.zip

Archive:  rus-eng.zip
  inflating: rus.txt                 
  inflating: _about.txt              


In [None]:
!head rus.txt

Go.	Марш!	CC-BY 2.0 (France) Attribution: tatoeba.org #2877272 (CM) & #1159202 (shanghainese)
Go.	Иди.	CC-BY 2.0 (France) Attribution: tatoeba.org #2877272 (CM) & #5898247 (marafon)
Go.	Идите.	CC-BY 2.0 (France) Attribution: tatoeba.org #2877272 (CM) & #5898250 (marafon)
Hi.	Здравствуйте.	CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #402127 (odexed)
Hi.	Привет!	CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #466968 (katjka)
Hi.	Хай.	CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #467233 (timsa)
Hi.	Здрасте.	CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #3803577 (marafon)
Hi.	Здоро́во!	CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #3854188 (marafon)
Hi.	Приветик!	CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #7234283 (marafon)
Run!	Беги!	CC-BY 2.0 (France) Attribution: tatoeba.org #906328 (papabear) & #1569978 (Biga)


In [None]:
!tail rus.txt

We need to uphold laws against discrimination — in hiring, and in housing, and in education, and in the criminal justice system. That is what our Constitution and our highest ideals require.	Нам нужно отстаивать законы против дискриминации при найме на работу, в жилищной сфере, в сфере образования и правоохранительной системе. Этого требуют наша Конституция и высшие идеалы.	CC-BY 2.0 (France) Attribution: tatoeba.org #5762728 (BHO) & #6390439 (odexed)
I've heard that you should never date anyone who is less than half your age plus seven. Tom is now 30 years old and Mary is 17. How many years will Tom need to wait until he can start dating Mary?	Я слышал, что никогда не следует встречаться с кем-то вдвое младше вас плюс семь лет. Тому 30 лет, a Мэри 17. Сколько лет Тому нужно ждать до тех пор, пока он сможет начать встречаться с Мэри?	CC-BY 2.0 (France) Attribution: tatoeba.org #10068197 (CK) & #10644473 (notenoughsun)
I do have one final ask of you as your president, the same thing I a

In [None]:
SOS_token = 0
EOS_token = 1


class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

In [None]:
# Turn a Unicode string to plain ASCII, thanks to
# http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters


def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Zа-яёА-ЯЁ.!?]+", r" ", s) #добавлены буквы кириллицы
    return s

При чтении файла языки зададим вручную в качествве параметров функции readLangs(), т.к. в имени файла задан только один.
Перед разделением строк файла на пары, удалим вспомогательную информацию (подстроки, начинающиеся с "CC-BY").

In [None]:
def readLangs(lang1='eng', lang2='rus', reverse=False):
    print("Reading lines...")

    # Read the file and split into lines
    lines = open('rus.txt', encoding='utf-8').\
        read().strip().split('\n')

    # Split every line into pairs and normalize
    lines = [row.lower().split('cc-by')[0].rstrip() for row in lines]    # удаление вспомогательной информации, начинающейся с подстроки 'CC-BY'
    pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]

    # Reverse pairs, make Lang instances
    if reverse:
        pairs = [list(reversed(p)) for p in pairs]
        input_lang = Lang(lang2)
        output_lang = Lang(lang1)
    else:
        input_lang = Lang(lang1)
        output_lang = Lang(lang2)

    return input_lang, output_lang, pairs

In [None]:
MAX_LENGTH = 10

eng_prefixes = (
    "i am ", "i m ",
    "he is", "he s ",
    "she is", "she s",
    "you are", "you re ",
    "we are", "we re ",
    "they are", "they re "
)


def filterPair(p):
    return len(p[0].split(' ')) < MAX_LENGTH and \
        len(p[1].split(' ')) < MAX_LENGTH and \
        p[1].startswith(eng_prefixes)


def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

In [None]:
def prepareData(lang1='eng', lang2='rus', reverse=False):
    input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
    print("Read %s sentence pairs" % len(pairs))
    pairs = filterPairs(pairs)
    print("Trimmed to %s sentence pairs" % len(pairs))
    print("Counting words...")
    for pair in pairs:
        input_lang.addSentence(pair[0])
        output_lang.addSentence(pair[1])
    print("Counted words:")
    print(input_lang.name, input_lang.n_words)
    print(output_lang.name, output_lang.n_words)
    return input_lang, output_lang, pairs


input_lang, output_lang, pairs = prepareData('eng', 'rus', True)
print(random.choice(pairs))

Reading lines...
Read 496059 sentence pairs
Trimmed to 28719 sentence pairs
Counting words...
Counted words:
rus 10177
eng 4303
['мы как братья .', 'we re like brothers .']


# **Функции для работы с данными, обучения и оценки.**

In [None]:
def indexesFromSentence(lang, sentence):
    return [lang.word2index[word] for word in sentence.split(' ')]


def tensorFromSentence(lang, sentence):
    indexes = indexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)


def tensorsFromPair(pair):
    input_tensor = tensorFromSentence(input_lang, pair[0])
    target_tensor = tensorFromSentence(output_lang, pair[1])
    return (input_tensor, target_tensor)

In [None]:
teacher_forcing_ratio = 0.5


def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=MAX_LENGTH):
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    loss = 0

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(
            input_tensor[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0, 0]

    decoder_input = torch.tensor([[SOS_token]], device=device)

    decoder_hidden = encoder_hidden

    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    if use_teacher_forcing:
        # Teacher forcing: Feed the target as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            loss += criterion(decoder_output, target_tensor[di])
            decoder_input = target_tensor[di]  # Teacher forcing

    else:
        # Without teacher forcing: use its own predictions as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            topv, topi = decoder_output.topk(1)
            decoder_input = topi.squeeze().detach()  # detach from history as input

            loss += criterion(decoder_output, target_tensor[di])
            if decoder_input.item() == EOS_token:
                break

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

In [None]:
import time
import math


def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return '%s (- %s)' % (asMinutes(s), asMinutes(rs))

In [None]:
def trainIters(encoder, decoder, n_iters, print_every=1000, plot_every=100, learning_rate=0.01):
    start = time.time()
    plot_losses = []
    print_loss_total = 0  # Reset every print_every
    plot_loss_total = 0  # Reset every plot_every

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
    training_pairs = [tensorsFromPair(random.choice(pairs))
                      for i in range(n_iters)]
    criterion = nn.NLLLoss()

    for iter in range(1, n_iters + 1):
        training_pair = training_pairs[iter - 1]
        input_tensor = training_pair[0]
        target_tensor = training_pair[1]

        loss = train(input_tensor, target_tensor, encoder,
                     decoder, encoder_optimizer, decoder_optimizer, criterion)
        print_loss_total += loss
        plot_loss_total += loss

        if iter % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            print('%s (%d %d%%) %.4f' % (timeSince(start, iter / n_iters),
                                         iter, iter / n_iters * 100, print_loss_avg))

        if iter % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0

    showPlot(plot_losses)

In [None]:
import matplotlib.pyplot as plt
plt.switch_backend('agg')
import matplotlib.ticker as ticker
import numpy as np


def showPlot(points):
    plt.figure()
    fig, ax = plt.subplots()
    # this locator puts ticks at regular intervals
    loc = ticker.MultipleLocator(base=0.2)
    ax.yaxis.set_major_locator(loc)
    plt.plot(points)

In [None]:
def evaluate(encoder, decoder, sentence, max_length=MAX_LENGTH):
    with torch.no_grad():
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_length = input_tensor.size()[0]
        encoder_hidden = encoder.initHidden()

        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei],
                                                     encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        decoder_hidden = encoder_hidden

        decoded_words = []
        decoder_attentions = torch.zeros(max_length, max_length)

        for di in range(max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            decoder_attentions[di] = decoder_attention.data
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        return decoded_words, decoder_attentions[:di + 1]

In [None]:
def evaluateRandomly(encoder, decoder, n=10):
    for i in range(n):
        pair = random.choice(pairs)
        print('>', pair[0])
        print('=', pair[1])
        output_words, attentions = evaluate(encoder, decoder, pair[0])
        output_sentence = ' '.join(output_words)
        print('<', output_sentence)
        print('')

# **2. Обучение модели seq2seq with attention.**

# **Энкодер.**
-----------





Энкодер останется неизменным для всех моделей.

In [None]:
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

# **Декодер на основе конкатенации векторов.**
-----------




In [None]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs): # encoder_outputs = (k = v)
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

In [None]:
hidden_size = 256
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

1m 45s (- 24m 37s) (5000 6%) 3.0701
3m 20s (- 21m 42s) (10000 13%) 2.5864
4m 55s (- 19m 41s) (15000 20%) 2.3446
6m 31s (- 17m 56s) (20000 26%) 2.1598
8m 6s (- 16m 13s) (25000 33%) 1.9721
9m 42s (- 14m 33s) (30000 40%) 1.8288
11m 25s (- 13m 3s) (35000 46%) 1.7507
13m 17s (- 11m 37s) (40000 53%) 1.6613
14m 56s (- 9m 57s) (45000 60%) 1.6090
16m 33s (- 8m 16s) (50000 66%) 1.5296
18m 9s (- 6m 36s) (55000 73%) 1.4769
19m 44s (- 4m 56s) (60000 80%) 1.4199
21m 20s (- 3m 17s) (65000 86%) 1.3484
22m 57s (- 1m 38s) (70000 93%) 1.3342
24m 32s (- 0m 0s) (75000 100%) 1.3053


In [None]:
evaluateRandomly(encoder1, attn_decoder1)

> вечером мы будем работать .
= we re going to work tonight .
< we re going to work work . <EOS>

> вы пьяны !
= you are drunk !
< you re drunk ! <EOS>

> ты жалкая старуха .
= you re a mean old woman .
< you re a a . . <EOS>

> ты мошенница .
= you re a fraud .
< you re a . <EOS>

> я не очень хорошо говорю по французски .
= i m not very good at speaking french .
< i m not really good french . <EOS>

> мы лучшие .
= we re the best .
< we are friends . <EOS>

> мы в школу опоздаем .
= we re going to be late for school .
< we re going to for for for . <EOS>

> то дерево посадили вы .
= you re the one who planted that tree .
< you re the one who . . <EOS>

> я жду объяснении .
= i m waiting for an explanation .
< i m waiting for . <EOS>

> я учусь водить .
= i m learning how to drive .
< i m learning the house . <EOS>



# **2.a. Обучение модели seq2seq with attention на основе скалярного произведения.**

# **Декодер с механизмом внимания на основе скалярного произведения.**

In [None]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded) # тензор запросов (query)
        # Тензоры ключей и запросов (key, value): key = value = encoder_outputs

        scale = 1 / math.sqrt(embedded.size(-1)) # коэффициент
        attn_weights = embedded @ encoder_outputs.unsqueeze(0).transpose(-2, -1) * scale # скалярное произведение векторов query и key и коэффициента scale
        attn_weights = torch.softmax(attn_weights, dim=-1)

        output = attn_weights @ encoder_outputs.unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

In [None]:
hidden_size = 256

encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

7m 30s (- 105m 13s) (5000 6%) 3.1216
15m 21s (- 99m 52s) (10000 13%) 2.6437
23m 1s (- 92m 4s) (15000 20%) 2.4223
30m 46s (- 84m 38s) (20000 26%) 2.2158
38m 33s (- 77m 6s) (25000 33%) 2.0704
46m 9s (- 69m 14s) (30000 40%) 1.9561
53m 53s (- 61m 35s) (35000 46%) 1.8481
61m 40s (- 53m 58s) (40000 53%) 1.7357
69m 31s (- 46m 20s) (45000 60%) 1.6671
77m 38s (- 38m 49s) (50000 66%) 1.5846
85m 19s (- 31m 1s) (55000 73%) 1.5252
93m 14s (- 23m 18s) (60000 80%) 1.4588
100m 58s (- 15m 32s) (65000 86%) 1.4061
108m 44s (- 7m 46s) (70000 93%) 1.3469
116m 26s (- 0m 0s) (75000 100%) 1.3062


In [None]:
evaluateRandomly(encoder1, attn_decoder1)

> я уверен что ты занят том .
= i m sure you re busy tom .
< i m sure you tom tom tom . . <EOS>

> мы в ловушке .
= we re trapped .
< we re trapped . <EOS>

> ты идеален .
= you re perfect .
< you re perfect . <EOS>

> я уверен что том может это уладить .
= i m confident tom can fix it .
< i m sure tom tom that tom . <EOS>

> ты очень наивная .
= you re very naive .
< you re very brave . <EOS>

> простите что это заняло столько времени .
= i m sorry this took so long .
< i m sorry that was s this this . <EOS>

> я канадка но живу в австралии .
= i m canadian but i live in australia .
< i m canadian but i australia in australia . <EOS>

> на этот счет вы ошибаетесь .
= you are mistaken about that .
< you re always right this this . <EOS>

> я уверен что видел тома .
= i m certain i saw tom .
< i m sure tom saw tom . . <EOS>

> нам не рады .
= we re not welcome .
< we aren t welcome . <EOS>



# **2.b. Обучение модели seq2seq with attention на основе MLP.**

# **Декодер с механизмом внимания на основе многослойного перцептрона.**

In [None]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.Wk = nn.Linear(hidden_size, hidden_size) # веса для тензоров ключей
        self.Wq = nn.Linear(hidden_size, hidden_size) # веса для тензоров запросов
        self.Va = nn.Linear(hidden_size, 1) # финальный слой для получения ненормированных весов механизма внимания

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs): # encoder_outputs = (k = v)
        embedded = self.embedding(input)
        embedded = self.dropout(embedded) # тензор запросов (query)
        # Тензоры ключей и запросов (key, value): key = value = encoder_outputs

        scores = self.Va(torch.tanh(self.Wk(encoder_outputs) + self.Wq(embedded.squeeze(0)))) # ненормированные веса механизма внимания

        attn_weights = F.softmax(scores, dim=-1).view(1, -1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0), encoder_outputs.unsqueeze(0))

        output = F.relu(attn_applied)
        output, hidden = self.gru(output, hidden)
        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

In [None]:
hidden_size = 256

encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

2m 11s (- 30m 34s) (5000 6%) 3.1065
4m 12s (- 27m 18s) (10000 13%) 2.5642
6m 13s (- 24m 55s) (15000 20%) 2.3676
8m 17s (- 22m 47s) (20000 26%) 2.2167
10m 19s (- 20m 38s) (25000 33%) 2.1052
12m 20s (- 18m 31s) (30000 40%) 1.9818
14m 20s (- 16m 22s) (35000 46%) 1.9373
16m 18s (- 14m 16s) (40000 53%) 1.8563
18m 19s (- 12m 12s) (45000 60%) 1.8054
20m 18s (- 10m 9s) (50000 66%) 1.7846
22m 19s (- 8m 7s) (55000 73%) 1.7147
24m 20s (- 6m 5s) (60000 80%) 1.6686
26m 21s (- 4m 3s) (65000 86%) 1.6357
28m 22s (- 2m 1s) (70000 93%) 1.6385
30m 21s (- 0m 0s) (75000 100%) 1.5892


In [None]:
evaluateRandomly(encoder1, attn_decoder1)

> вы еще молоды и неопытны .
= you re still young and inexperienced .
< you re still young and and . <EOS>

> я старею .
= i m getting older .
< i m a a <EOS>

> вы нарушаете мои гражданские права .
= you re violating my civil rights .
< you re thinking with . . <EOS>

> мы входим .
= we re coming in .
< we re the . <EOS>

> я так рад что сегодня тепло .
= i m so glad it s warm today .
< i m so glad back an to . . <EOS>

> я очень голоден .
= i m very hungry .
< i m very hungry . <EOS>

> прости что я в тебя выстрелил .
= i m sorry i shot you .
< i m sorry i doubted you . . <EOS>

> мне трудно сосредоточиться .
= i m having trouble focusing .
< i m having with him . <EOS>

> боюсь вам дали неправильныи номер .
= i am afraid you have the wrong number .
< i m afraid you have wrong wrong . <EOS>

> я беспристрастная .
= i m impartial .
< i m studying . <EOS>



# **3. Оценка качества работы моделей.**

Конечные значения лосс-функции для первых двух моделей получились 1,3, последней 1,6. Третья модель с механизмом внимания на основе MLP самая сложная из 3-х, поэтому потенциально лосс можно еще снизить более продолжительным обучением.