# Homework №2

This homework will be dedicated to **ASR & Co**.

In general, you may implement any ASR model that was discussed in the lecture,
but we recommend to implement **QuartzNet**.

## **Important aspects (model)**
1) Pay attention on different length of utterances. P.S. **masking**.
    
2) A good ASR is a robust ASR, so we ask you to implement and use at least **4 types of augmentations** (P.S. 2 seminar).

3) Also, to get better quality, we ask you to implement a **beam search** for better decoding.

4) (Bonus) As a bonus you can use **BPE** instead of Char. You can use SentencePiece, HuggingFace or YouTokenToMe.

5) (Bonus) As a bonus you can take pretrained **LM** (or train yourself) and fusing LM with ASR.
    Way of fusing you may choose yourself.

## **Important aspects (code)**
1) You already know about pytorch-lighting (I hope :)) but you are not allowed to use it in this homework.

2) Try to write code more structurally and cleanly !

3) Good logging of experiments save your nerves and time,
    so we ask you to use **W&B** and log at least loss, WER, CER and pairs (audio -- recognized text).
    **Do not remove** the logs until we have checked your work and given you a grade!

4) We also ask you to organize your code in github repo with Docker and setup.py. You can use my template https://github.com/markovka17/dl-start-pack.

5) Your work **must be** reproducable, so fix seed, save the weights of model, and etc.

6) In the end of your work write inference utils. Anyone should be able to take your weight, load it into the model and run it on some audio track.

## Data

CommonVoice Mozilla: https://commonvoice.mozilla.org/en/datasets
Он весит 50 гигабайт. В нем значительно больше коротких записей, что должно ускорить сходимость методов на нем.
Можно обучиться на нем всем или просто отщипнуть себе кусочек.

Все еще настоятельно рекомендую препроцессить датасет и выкидывать все записи длиннее N-секунд (а если вы учите не CTC, то стоит еще дополнительно выкидывать все записи длиннее  K символов), чтобы максимизировать размер батча.

##### config

https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/quartznet_15x5.yaml

### Начинаем решение

In [1]:
import wandb

In [172]:
import string
import re
import math
import random
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
import librosa
import asrtoolkit
import torch_optimizer



from tqdm import tqdm
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
from torch.optim.lr_scheduler import StepLR
from torch.optim.lr_scheduler import CosineAnnealingLR
from torch import distributions

from collections import Counter
from IPython import display as display_
%pylab inline

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


In [221]:
BATCH_SIZE = 80
NUM_EPOCHS = 10
N_MELS     = 64

In [222]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

### Датасет

In [223]:
class TrainDataset(torch.utils.data.Dataset):
    """Custom competition dataset."""

    def __init__(self, csv_file, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.answers = pd.read_csv(csv_file, '\t')
        self.transform = transform


    def __len__(self):
        return len(self.answers)


    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        utt_name = 'cv-corpus-5.1-2020-06-22/en/clips/' + self.answers.loc[idx, 'path']
        utt = torchaudio.load(utt_name)[0].squeeze()
        
        if len(utt.shape) != 1:
            print(utt.shape)
            print(utt)
            utt = utt[1]
            
        answer = self.answers.loc[idx, 'sentence']

        if self.transform:
            utt = self.transform(utt)

        sample = {'utt': utt, 'answer': answer}
        return sample

In [224]:
class TestDataset(torch.utils.data.Dataset):
    """Custom competition dataset."""

    def __init__(self, csv_file, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.names = pd.read_csv(csv_file, '\t')
        self.transform = transform


    def __len__(self):
        return len(self.names)


    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        utt_name = 'cv-corpus-5.1-2020-06-22/en/clips/' + self.names.loc[idx, 'path']
        utt = torchaudio.load(utt_name)[0].squeeze()
  

        if self.transform:
            utt = self.transform(utt)

        sample = {'utt': utt}
        return sample

#### Аугментации

In [225]:
def transform_tr(wav):
    aug_num = torch.randint(low=0, high=3, size=(1,)).item()
    augs = [
        lambda x: x,
        lambda x: (x + distributions.Normal(0, 0.01).sample(x.size())).clamp_(-1, 1),
        lambda x: torchaudio.transforms.Vol(.1)(x)
    ]
    
    return augs[aug_num](wav)

#### Визуализация

In [226]:
def viz(wav):
    figsize(20, 5)
    plot(wav)
    plt.show()

    display_.display(display_.Audio(wav, rate=48000, normalize=False))

#### Класс для кодировки текста (0 - blank_token)

In [227]:
class TextTransform:
    def __init__(self):
        self.char_dict = {}
        self.index_dict = {}
        
        self.char_dict['\''] = 0
        self.index_dict[0] = '\''
        self.char_dict[' '] = 1
        self.index_dict[1] = ' '
        for i, let in enumerate(string.ascii_lowercase):
            self.index_dict[i + 2] = let
            self.char_dict[let] = i + 2
            
    def text_to_int(self, text):
        labels = []
        for let in text:
            labels.append(self.char_dict[let])
        return labels
    
    def int_to_text(self, labels):
        text = []
        for num in labels:
            text.append(self.index_dict[num])
        return text

#### Collate_fn

In [243]:
#win_len=1024, hop_len=256
# counting len of MelSpec before doing it (cause of padding)
def mel_len(x):
    return int(x // 256) + 1

In [229]:
def preprocess_data(data):
    text_transform = TextTransform()
    wavs = []
    input_lens = []
    labels = []
    label_lens = []
    
    for el in data:
        wavs.append(el['utt'])
        input_lens.append(math.ceil(mel_len(el['utt'].shape[0]) / 2))    # cause of stride 2
        label = torch.Tensor(text_transform.text_to_int(el['answer']))
        labels.append(label)
        label_lens.append(len(label))

    wavs = pad_sequence(wavs, batch_first=True)
    labels = pad_sequence(labels, batch_first=True)
    
    return wavs, input_lens, labels, label_lens    

In [230]:
# Loading data and loaders
my_dataset = TrainDataset(csv_file='train_preprocessed.tsv', transform=transform_tr)
print('all train+val samples:', len(my_dataset))
test_dataset = TestDataset(csv_file='cv-corpus-5.1-2020-06-22/en/test.tsv', transform=None)

all train+val samples: 435947


100k files ~ 10GB

95_percentile_len = 412416.0

Отсортированные по длинне звука индексы объектов. 

In [236]:
# sorted indexes
with open('sorted.npy', 'rb') as f:
    s = np.load(f)

In [237]:
to_save = s[:100000][:, 0]

In [239]:
my_dataset = torch.utils.data.Subset(my_dataset, to_save)
train_set, val_set = torch.utils.data.random_split(my_dataset, [85000, 15000])

#### Даталоадеры

In [241]:
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE,
                          shuffle=True, collate_fn=preprocess_data, drop_last=True,
                          num_workers=0, pin_memory=True)

val_loader = DataLoader(val_set, batch_size=BATCH_SIZE,
                        shuffle=True, collate_fn=preprocess_data, drop_last=True,
                        num_workers=0, pin_memory=True)

test_dataloader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, drop_last=True)

## Модель

#### Фикс сидов и смотрим кол-во параметров

In [245]:
def set_seed(seed):
    torch.backends.cudnn.deterministic = True
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)
set_seed(21)

In [281]:
def count_parameters(model):
    model_parameters = filter(lambda p: p.requires_grad, model.parameters())
    return sum([np.prod(p.size()) for p in model_parameters])

#### Вспомогательные блоки

In [247]:
def conv_bn_act(in_size, out_size, kernel_size, stride=1, dilation=1):
    return nn.Sequential(
        nn.Conv1d(in_size, out_size, kernel_size, stride, dilation=dilation),
        nn.BatchNorm1d(out_size),
        nn.ReLU()
    )


def sepconv_bn(in_size, out_size, kernel_size, stride=1, dilation=1, padding=None):
    if padding is None:
        padding = (kernel_size-1)//2
    return nn.Sequential(
        torch.nn.Conv1d(in_size, in_size, kernel_size, 
                        stride=stride, dilation=dilation, groups=in_size,
                        padding=padding),
        torch.nn.Conv1d(in_size, out_size, kernel_size=1),
        nn.BatchNorm1d(out_size)
    )

In [248]:
# Основной блок B_i
class QnetBlock(nn.Module):
    def __init__(self, in_size, out_size, kernel_size, stride=1,
                R=5):
        super().__init__()
        
        self.layers = nn.ModuleList(sepconv_bn(in_size, out_size, kernel_size, stride))
        for i in range(R - 1):
            self.layers.append(nn.ReLU())
            self.layers.append(sepconv_bn(out_size, out_size, kernel_size, stride))
        self.layers = nn.Sequential(*self.layers)
        
        self.residual = nn.ModuleList()
        self.residual.append(torch.nn.Conv1d(in_size, out_size, kernel_size=1))
        self.residual.append(torch.nn.BatchNorm1d(out_size))
        self.residual = nn.Sequential(*self.residual)
    
    def forward(self, x):
        return F.relu(self.residual(x) + self.layers(x))

In [249]:
class QuartzNet(nn.Module):
    def __init__(self, n_mels, num_classes):
        super().__init__()
                    
        self.c1 = sepconv_bn(n_mels, 256, kernel_size=33, stride=2)
                  
        self.blocks = nn.Sequential(
                #         in   out  k   s  R
                QnetBlock(256, 256, 33, 1, R=5),
                QnetBlock(256, 256, 39, 1, R=5),
                QnetBlock(256, 512, 51, 1, R=5),
                QnetBlock(512, 512, 63, 1, R=5),
                QnetBlock(512, 512, 75, 1, R=5)
        )
        
        self.c2 = sepconv_bn(512, 512, kernel_size=87, dilation=2, padding=86)
        
        self.c3 = conv_bn_act(512, 1024, kernel_size=1)
        
        self.c4 = conv_bn_act(1024, num_classes, kernel_size=1)
        
        self.init_weights()


    def init_weights(self):
        pass


    def forward(self, x):
        c1 = F.relu(self.c1(x))
        blocks = self.blocks(c1)
        c2 = F.relu(self.c2(blocks))
        c3 = self.c3(c2)
        return self.c4(c3)

#### обратим внимание, что c1 и c2 тоже separable

### Обучение 

In [242]:
melspec = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000,           ### 22050, 48000
    n_fft=1024,
    hop_length=256,
    n_mels=N_MELS                ### 64,    80
).to(device)


# with augmentations
melspec_transforms = nn.Sequential(
    torchaudio.transforms.MelSpectrogram(sample_rate=16000, n_fft=1024, hop_length=256,  n_mels=N_MELS),
    torchaudio.transforms.FrequencyMasking(freq_mask_param=15),
    torchaudio.transforms.TimeMasking(time_mask_param=35),
).to(device)

In [251]:
def train_epoch(model, optimizer, dataloader, CTCLoss, device, melspec_transforms):
    model.train()
    
    losses = []
    
    for i, (wavs, wavs_len, answ, answ_len) in tqdm(enumerate(dataloader)):
        wavs, answ = wavs.to(device), answ.to(device)
        
        trans_wavs = torch.log(melspec_transforms(wavs) + 1e-9)
        
        optimizer.zero_grad()
            
        output = model(trans_wavs)
        output = F.log_softmax(output, dim=1)
        output = output.transpose(0, 1).transpose(0, 2)        
        
        loss = CTCLoss(output, answ, wavs_len, answ_len)
        loss.backward()        
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), 15)
        optimizer.step()
        losses.append(loss.item())
        if i % 100 == 0:
            wandb.log({'mean_train_loss':loss})
            preds, targets = decoder_func(output, answ, answ_len, del_repeated=False)
            wandb.log({"CER_train": cer(targets[0], preds[0])})
            wandb.log({"WER_train": wer(targets[0], preds[0])})
        
    return np.mean(losses)

In [252]:
def train(model, opt, train_dl, scheduler, CTCLoss, device, n_epochs, val_dl=None):
    for epoch in range(n_epochs):
        print("Epoch {} of {}".format(epoch, n_epochs), 'LR', scheduler.get_last_lr())
        
        mean_loss = train_epoch(model, opt, train_dl, CTCLoss, device)
        print('MEAN EPOCH LOSS IS', mean_loss)
        
        scheduler.step()
        
        if (val_dl != None):
            test(model, opt, val_dl, CTCLoss, device)
            
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': opt.state_dict(),
            'scheduler_state_dict': scheduler.state_dict()
        }, 'epoch_5_and_'+str(epoch))

In [253]:
def decoder_func(output, answ, answ_lens, blank_label=0, del_repeated=True):    
    decoded_preds, decoded_targs = [], []
    
    text_transform = TextTransform()

    batch_freqs = torch.argmax(output, dim=2).transpose(0, 1)
        
    for i, freqs in enumerate(batch_freqs):
        preds = []
        
        decoded_targs.append(
            text_transform.int_to_text(answ[i][:answ_lens[i]].tolist())
        )
        
        for j, num in enumerate(freqs):
            if num != blank_label:
                if del_repeated and j != 0 and num == freqs[j-1]:
                    continue
                preds.append(num.item())
        decoded_preds.append(text_transform.int_to_text(preds))

    return decoded_preds, decoded_targs    

In [254]:
def cer(target, pred):
    cer_res = asrtoolkit.cer(''.join(target), ''.join(pred))
    return cer_res

def wer(target, pred):
    wer_res = asrtoolkit.wer(''.join(target), ''.join(pred))
    return wer_res

In [282]:
def test(model, optimizer, dataloader, CTCLoss, device, melspec, bs_width=None):
    model.eval()
    
    cers, wers, cers_bs, wers_bs = [], [], [], []
    losses = []
    
    with torch.no_grad():
        for i, (wavs, wavs_len, answ, answ_len) in enumerate(dataloader):
            wavs, answ = wavs.to(device), answ.to(device)

            trans_wavs = torch.log(melspec(wavs) + 1e-9)

            output = model(trans_wavs)
            if bs_width != None:
                output_bs = F.softmax(output, dim=1).transpose(0, 1).transpose(0, 2)
                preds_bs, targets_bs = beam_search_decoding(output_bs, answ, answ_len, width=bs_width)
                
            output = F.log_softmax(output, dim=1)
            output = output.transpose(0, 1).transpose(0, 2)
            loss = CTCLoss(output, answ, wavs_len, answ_len)
            losses.append(loss.item())
            
            # argmax
            preds, targets = decoder_func(output, answ, answ_len, del_repeated=True)
            # beam search
            
            for i in range(len(preds)):
                if i == 0:
                    print('target: ', ''.join(targets[i]))
                    print('prediction: ', ''.join(preds[i]))
                    print('beam_search_preds: ', ''.join(preds_bs[i]))
                    
                cers.append(cer(targets[i], preds[i]))
                wers.append(wer(targets[i], preds[i]))
                cers_bs.append(cer(targets_bs[i], preds_bs[i]))
                wers_bs.append(wer(targets_bs[i], preds_bs[i]))
                
        avg_cer = np.mean(cers)
        avg_wer = np.mean(wers)
        if bs_width != None:
            avg_cer_bs = np.mean(cers_bs)
            avg_wer_bs = np.mean(wers_bs)

        
        print('CER', avg_cer)
        print('CER BS', avg_cer_bs)
        
        print('WER', avg_wer)
        print('WER BS', avg_wer_bs)
        #wandb.log({"CER_val": avg_cer})    
        #wandb.log({"WER_val": avg_wer})
        avg_loss= np.mean(losses)
        print('average test loss is', avg_loss)
        #wandb.log({'mean_VAL_loss':avg_loss})


In [256]:
wandb.login()
wandb.init()
train_table = wandb.Table(columns=["Predicted Text", "True Text"])

NameError: name 'wandb' is not defined

In [257]:
model = QuartzNet(n_mels=64, num_classes=28)
print(count_parameters(model))
model.to(device)
wandb.watch(model)

In [259]:
#model = QuartzNet(n_mels=64, num_classes=28)
#num_of_params = count_parameters(model)
#model.to(device)
#wandb.watch(model)
#opt = torch.optim.RMSprop(model.parameters(), weight_decay=0.0001)
#scheduler = StepLR(opt, step_size=1, gamma=0.9) 
#checkpoint = torch.load('epoch_5', map_location=torch.device('cpu'))
#model.load_state_dict(checkpoint['model_state_dict'])

<All keys matched successfully>

In [260]:
opt = torch_optimizer.NovoGrad(
                        model.parameters(),
                        lr=0.01,
                        betas=(0.8, 0.5),
                        weight_decay=0.001,
)
scheduler  = CosineAnnealingLR(opt, T_max=50, eta_min=0, last_epoch=-1) # ###### TMAX = MAX NUM OF EPOCHS

#opt = torch.optim.RMSprop(model.parameters(), weight_decay=0.0001)
#scheduler = StepLR(opt, step_size=1, gamma=0.9) 

In [261]:
CTCLoss = nn.CTCLoss(blank=0).to(device)

In [None]:
#train(model, opt, train_loader, scheduler, CTCLoss, device,
#     n_epochs=NUM_EPOCHS, val_dl=val_loader)

0it [00:00, ?it/s]

Epoch 0 of 150 LR [0.01]


173it [10:03,  3.52s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


231it [13:26,  3.48s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


377it [21:57,  3.47s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


523it [30:29,  3.47s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


526it [30:41,  3.63s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


824it [48:07,  3.49s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


902it [52:43,  3.91s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


1062it [1:02:03,  3.51s/it]


MEAN EPOCH LOSS IS 3.3313134944596086
target w l hough in command
prediction he si i   
target the conservatives soon found themselves leading in the polls
prediction thi sis  i 
target it is centered on the new haven and its suburbs
prediction h  
target gairs actions helped to precipitate a double dissolution
prediction the sisi a  
target he is buried in saint albans cathedral
prediction h  
target the national debt had skyrocketed
prediction h  
target royal scottish orchestra
prediction he i i  
target he attended ohio university
prediction the sis i  
target the show is also available as a podcast available for purchase
prediction h 
target she is married
prediction he e   
target young team of basketball players winning the game
prediction he i i  
target two children play on the road on a snowy day
prediction he i i 
target you know st albans sir
prediction h  
target the photographs show the cup in an uncleaned state
prediction si i 
target rowses design was in streamline mode

target a young baby boy is about to sneeze next to his mother
prediction h 
target i know glenn wants it to continue
prediction o 
target another example is north sea cod
prediction i  
target the region is called kosovo pomoravlje
prediction the i  a 
target a small sample is taken from the specimen to be investigated
prediction i
target goren comes to think of her as his white whale
prediction o 
target zettel was born in sacramento california
prediction she 
target the new slogan your life
prediction h 
target the dolls are then dressed in traditional mayan style
prediction the i i 
target i was shattered at the news she had passed away
prediction he i i  
target he supports expanding the current cap on charter schools
prediction the sis  i 
target the mets previously played at dobynsbennett high school
prediction thi sis  a 
target in december he undertook a survey of western port
prediction h  
target it was only that you looked odd
prediction h  
target it is very similar to cyan

0it [00:00, ?it/s]

target this completes the proof of the claim
prediction he i i  
average test loss is 2.79661633496616
Epoch 1 of 150 LR [0.009990133642141357]


352it [20:29,  3.49s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


460it [26:49,  3.50s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


462it [26:56,  3.51s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


516it [30:05,  3.50s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


573it [33:25,  3.51s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


581it [33:53,  3.46s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


695it [40:35,  3.50s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


1062it [1:02:04,  3.51s/it]


MEAN EPOCH LOSS IS 2.341327511545853
target bauer was born in graz styria
prediction aon as sto paas senea 
target in an instant she was free
prediction ena tan te ta 
target dont you think thats a rather risky course of action
prediction ation tes o ome as ti col ation
target a ripping speech he said
prediction e re ans pa esant
target ok so i was a little dramatic but you saw what happened
prediction hetao as ooltion i ae sol acin
target it is known as the polka palace
prediction it is knoas to fol of coles 
target the couple live in highgate north london
prediction the coe ras  in dae o monton 
target the family settled in toronto
prediction the son e seon te taton 
target thus he employed historical perspectives to serve philology
prediction tosin pos oo estantos e ser fooai
target she is after all a former goldwyn girl
prediction t as a tollas on a bal an ro 
target huave is now considered an isolate
prediction me is nolciontea in ai 
target in short i had come down on purpose
pre

target he named the tortoise after his godson
prediction he mans te colos o  poson 
target the white dog runs across the snow
prediction the wi ton bon e case st
target it is southsouthwest of the crater moretus
prediction thea sa ss asion te traof toraes
target the song became a modest success as a result
prediction the so in tecan an moe sota sesoein
target the facility in sault ste
prediction hi   an sol scat 
target she tried to participate but nobody listened
prediction she ane proti eaed of ome lain 
target pelias kills him and takes the fleece
prediction thewi o an pa to fis 
target they are named interindustrial standards
prediction the a in t in ar in toscols canders
target it is limited to twochannel output
prediction it is onan toe tro canal apa
target fr diese blutgrtsche mit schraubstollen gab es zurecht die rote karte
prediction s to tas o tain non seati os caer 
target i didnt have friends
prediction ta in  caains 
target whitehorse mountain of the north cascades
predict

0it [00:00, ?it/s]

target it is the third single and final track from the album
prediction itis atu si an aen sac for aro
average test loss is 2.062117064700407
Epoch 2 of 150 LR [0.009960573506572389]


5it [00:19,  3.91s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


143it [08:19,  3.49s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


334it [19:29,  3.51s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


532it [31:09,  3.49s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


931it [54:28,  3.50s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


995it [58:11,  3.49s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


1003it [58:41,  3.78s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


1062it [1:02:06,  3.51s/it]


MEAN EPOCH LOSS IS 1.9229573038338268
target she might be intimate and sad with him
prediction she ritecin coad safer 
target it has recently undergone a renovation
prediction i asersonla e din iweranation 
target his position is prop
prediction his psition mi tro 
target shall we just get a takeaway tonight
prediction te wegia tapoi andi 
target arthur went outside for some coal
prediction ater wanolsi fosion chald 
target it is a hardy perennial plant
prediction the pas p hary proin lpa
target birsen was born in ankara turkey
prediction theriston wa bo an cono terf turtie
target the scheringa museum of realist art is in spanbroek
prediction the finom musiom ofvrelastar is an spogork
target his plan became known as the ripley plan
prediction es tin becam non as te briple ti 
target there s cyclist in street
prediction thes sitis tan ste 
target the woman is wearing a purple shirt and runs on the beach
prediction the women is wi of opo thar a ron opi
target this equity could be used to

target it was now eleven oclock
prediction it was aot ilean apok 
target snow was observed to fall from cirrus clouds
prediction to was olte te fiton sai stoe 
target it is the county seat of stephens county
prediction it is a couyseto  st fo cody
target the series was then discontinued
prediction the ciis was tan dis ontene
target four individuals walk down a street with two holding hands and one holding a guitar
prediction trr aas mn orstrin o to oin ons an wonorin toto
target after their exchange the two departed coolly
prediction atoraer a tins sme tror air et tudin
target made his own lake
prediction the is ol li 
target the school has been involved in two controversial incidents to date
prediction the sco as anfol an turcutorrershialins iits toda
target it is insectivorous and nocturnal
prediction it is an setibors a no to 
target it is currently open as a museum
prediction it is curntly i anas n uan 
target the music in the album was all composed and arranged by kikuta
predictio

target a clean version of the album was also made
prediction a tem bied of the olte was also mad 
target what was that something
prediction ofos da sotin 
target he was additionally trained in luta livre and boxing
prediction he was a dismny taed in yotey far and pusing


0it [00:00, ?it/s]

target it is the oldest known exposed rock in the world
prediction hes ecordes ns sti te ia
average test loss is 1.7388111671661948
Epoch 3 of 150 LR [0.009911436253643442]


77it [04:28,  3.48s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


195it [11:22,  3.54s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


458it [26:47,  3.49s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


779it [45:32,  3.48s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


858it [50:09,  3.48s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


887it [51:51,  3.47s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


998it [58:21,  3.49s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


1062it [1:02:06,  3.51s/it]


MEAN EPOCH LOSS IS 1.6946504578527533
target this makes sure they are red and firm for storage
prediction this mak ster ther ro infor fer str
target thus the stimulus intensities are based on various fibers
prediction tes tesenniss ancensiti arpas onveris pivers
target gamespots frank provo reviewed the virtual console version of the game
prediction espos pren crovo ret te vr in conober it bo
target you cant sleep
prediction the con sr 
target the city is served by the richland school district
prediction the sily is sre bytediction sco tister
target a man in a blue shirt holding a protest sign
prediction e mani  boser con rodesin
target it was produced by lil jon
prediction it was pridys palilc 
target a prerelease enclosure was also constructed on the somerset levels
prediction theperrely infloter ws also contrecte on the sere o loes
target the two railroads cross downtown
prediction oo ralon to mon
target in practice the act only delayed the removal of the feeding tube
prediction itn

target gilbert dayles what would a gentleman do
prediction doer da wt we itentolen de
target repeatedly people ask him where he is from
prediction tepate y people as o wervin sco
target the city houses sombor airport
prediction the siy as e sonbor ar coy 
target young children stand outside on a dirt surface
prediction yo sin en asto osi on e to su
target we are in the milky way galaxy
prediction nearin the me toly groloyr 
target they have worked with los condores
prediction the ae war toth los comndonas
target connecting with union station and olvera street
prediction tomactin wot te min sta ion ntovearsto
target he is married and has two children
prediction the is mar an has to cilbren 
target but the fears proved groundless
prediction the tofer pre craunposr 
torch.Size([2, 198144])
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 4.8324e-05, 7.6901e-05,
         3.2309e-05],
        [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 4.8324e-05, 7.6901e-05,
         3.2309e-05]])
target 

target several different political spectra have been proposed
prediction sever defr pis cospctr a bepirs
target the couples daughter ivana is also an actress
prediction the coples oa bonmi is also acces
target the loud crackers went off during the festival
prediction thee log cricpers war of duin tefetiar 
target its headquarters is located in kabul
prediction the sa coers was locater ea cophe wor
target all but fyshwick have book shops
prediction o t tis wri a bsat
target he sprang excitedly to his feet
prediction the scran ise co to e scor 
target sellers would become a longlasting close friend
prediction so  be coe lon los in clus sri 
target my hat it has three corners
prediction maha n has ore co


0it [00:00, ?it/s]

target youre worth more dead than alive
prediction ber wr te e al
average test loss is 1.6230953842560876
Epoch 4 of 150 LR [0.009842915805643154]


166it [09:42,  3.48s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


325it [19:00,  3.50s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


394it [23:01,  3.49s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


465it [27:11,  3.48s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


478it [27:57,  3.50s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


721it [42:10,  3.46s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


727it [42:31,  3.49s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


1062it [1:02:08,  3.51s/it]


MEAN EPOCH LOSS IS 1.5611987014050313
target anderlecht won on coin toss
prediction  derlec wa an qintes
target the hallway was smeared in blood
prediction the oi was sbar inla
target its county seat is eureka
prediction its county sat is eriter 
target woman and baby waiting at an airport
prediction weme wotae wate itear fo 
target he served under general arthur currie
prediction theshere abuber ana arer puis 
target saint francis called him the knight of our round table
prediction san fani sporvin the ie offarrol ti
target the opposite concept is lusophilia
prediction the opaic coensettii affiri
target while the girls learned to play the guitar and dance
prediction p the gos erin epla it tetar in tatss 
target see sons of odin
prediction sa a snn ofvirtent 
target all of the fun was out of it
prediction wi of the porin was o tolet
target no player has won the award more than once
prediction ne laer has o jowar moring wans
target innis and taught canadian economic history
prediction h

target it was named after lake trasimeno
prediction it was naeed of the la trasin nenneve
target it is usually stored cold under argon
prediction it is usly stor co on er orgon 
target mamboundou was born in mouila
prediction the bode was born in woila 
target it is the premier baron in the irish peerage
prediction it is the pomer barin in the iwis purlage 
target she would never speak to him
prediction she wt teiverstic to hine 
target she sang harold arlens a sleepin bee
prediction she san hero orlins te swean ba
torch.Size([2, 198144])
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 4.8324e-05, 7.6901e-05,
         3.2309e-05],
        [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 4.8324e-05, 7.6901e-05,
         3.2309e-05]])
target after three months the revolt in babylonia had ended
prediction batti whremods tevol te bobelo iy hadende  
target kinski began to work with director werner herzog
prediction shens ty begete wer it terecter for ther harsa 
target postwar reconstruction 

target despite the show being picketed the event was transmitted as intended
prediction this py te shobincceted he avet wi tras miti asentented
target a woman walking with her umbrella
prediction a woman walking wit her an boen 
target were in a cinema and we expect something epic
prediction weriasiamol in weispec somting at bic
target the crowd scenes were shot using cranes and helicopters
prediction the pra sans wer shar is in rins avovems


0it [00:00, ?it/s]

target the bill passed the senate but died in a house committee
prediction the bil past the sennet y daed in a hos commity
average test loss is 1.5106954778579467
Epoch 5 of 150 LR [0.009755282581475767]


56it [03:14,  3.46s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


237it [13:47,  3.48s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


312it [18:11,  3.51s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


418it [24:22,  3.48s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


643it [37:30,  3.44s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


709it [41:23,  3.53s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


927it [54:12,  3.49s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


1062it [1:02:04,  3.51s/it]


MEAN EPOCH LOSS IS 1.4634145897885051
target the town of farmersville is on the south
prediction the cow of fammers vi is onthe sese
target it is found in brazil peru and bolivia
prediction it as sonde vesel arre an bolevera
target its all there
prediction this at thatseste 
target he is resleeved from a backup
prediction he is reasllyv frommo bakoe
target they can also be used to specify temporary speed limits
prediction the  also beused i tus hig comprry etemens
target some languages have not been identified
prediction som mi wges a not tei amafid
target the debt has since been paid and the service restored
prediction the gor osetin hi fommofover estol
target a little girl is running
prediction a lote gois fonninhtel 
target the chronicle consists of two parts
prediction the cri fel consis te kwofor
target but going home was out of the question
prediction the doin hon was at of the qestone
target mccanless son immediately rushed into the building
prediction wecambis as son am bete  l

target jennings is a supporter of chelsea football club
prediction jendings is a soqorter of chse fo bo clobe
target tennessee williams because of his delicacy
prediction tcennity ilams the casethis sdolo tisesese
target people running or walking on a field
prediction eople roning a walkng on a faildt 
target the crashed meteor left a gigantic crater on the ground
prediction the crestme toon of the jegenti creter on the grondtloe 
target to tell the truth  he had forgotten them
prediction the ferman tot e a foeedn
target he has it
prediction h has ite 
target the springdale free public library serves the borough
prediction thi brodel fee poli iven sevs te bar
target they carried out salt manufacture
prediction tpaka  as so lne fatfor
target the numerals glow amber
prediction te neeros colaamberltelt 
target he was forbidden any further writing
prediction he was frorpidin meiny forta ritdinge
target the centre also runs a squash league
prediction the senter also run as wisled
target giv

target each species has a distinct number of rings
prediction eac sapesese a desten the mer rins
target a woman in a blue jacket is drinking tea from a cup
prediction a woman in e ble jackit is dentking tefrom acotste 
target his father was a professor of greek
prediction his father was befesser of greke
target with his death the rebellion ceased
prediction wath his da the rebol inceasedse
target it is still in production
prediction it s tan an bodivtione
target it is the county seat of stephens county
prediction it is the counti sea of  sterren condy
target the do not resuscitate order was withdrawn
prediction the  do not reesa ta oter was with fron
target for legal scholars several issues are important
prediction orevas cal as sirvelie sos ont fortents
target they are also equipped with beach couches
prediction they are also a grop of be couteslteo 
target a man lying on the grass next to a small lake
prediction a man win o e taes sis momae


0it [00:00, ?it/s]

target i am growing wary
prediction wi m groing wanryt 
average test loss is 1.3742885117862313
Epoch 6 of 150 LR [0.009648882429441256]


35it [02:02,  3.49s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


240it [13:57,  3.48s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


250it [14:32,  3.47s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


344it [20:01,  3.49s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


801it [46:49,  4.08s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


984it [57:29,  3.49s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


1000it [58:25,  3.47s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


1062it [1:02:05,  3.51s/it]


MEAN EPOCH LOSS IS 1.3780443376515994
target his hero being randy macho man savage
prediction is her  ban ndy mochurman sha age
target he died when the pourquoipas
prediction he died fen  bash
target achilles saves her
prediction i cite siy ca
target the production received both critical and popular acclaim
prediction the erdtion is ca for fritetanand ocoactan
target if a score line is present the knife is safe
prediction it escr onas preset ten natecaf
target you wont have a lawyer
prediction ye  havmen o
target it has also been established in the wild in colombia
prediction it is also enistavish tom the faun o comobee
target it is located in the vincennes historic district
prediction   setea atataa
target this file system has gone through ten versions
prediction this ficistron has con to genversans
target garforth has two railway stations
prediction arsorhav to rran witaso
target the community was built in four quadrants
prediction the comminitem was bil in fol portrentse
target you 

target these animals should be housed alone because they are highly territorial
prediction he can inoso fe hau aman soce a ha i srefiar
target im so happy
prediction in sav hapfi
target please search for the thrilling cities photograph
prediction a ser for the foring sudyis fartagraf
target arkose is typically grey to reddish in colour
prediction arcoces iffecytra outihan cholar
target shabazz was born and raised in the bronx
prediction shaa was born and vrased in he pran
target the event developed from the ancient pentathlon
prediction   han tdo nat a trornteention cantafon
target a man behind either a fence or cage
prediction a man in ben is aunt a n art to
target it is the second rama game to be produced
prediction it is a seco a romenantoe frodu
target it consists of the village of ennetbaden which is a suburb of baden
prediction it consessto the dillagof anaga an wic es a sunder o fa
target however it also removes voice dialogue
prediction herenacat a srreis to e stiouc
target the

target china began to reach its height
prediction hanam bexounturrat ot higt
target he has worked as a journalist and author
prediction he has bri tos a ganais tonoha
target clubs for cricket and football have junior sections
prediction a aean  taosae
target he commissioned the construction of aston hall soon after
prediction hecremision the contructiona ha on hal shoufter
target i know you are surprised
prediction i nowy yar su praced
target more recently he has written on love and caring
prediction morres ola as wen on loven tern
target then ill stay still
prediction an n sta sci
target crete was once a contender for county seat
prediction ra was on e contender for county sek
target ranew was born in albany georgia
prediction rde asfrorn e albaefshurceehe
target a soldier is holding a gun and crouched down with is mouth covered by a cloth
prediction a shouder s hopina doun an crotc don wit is mo cono bafo
target the two daleks inside are destroyed
prediction ta to dollicc ansid er de

0it [00:00, ?it/s]

target stillman is married to the former mara stefanski
prediction stomen is mared to the forvermarss the founcet
average test loss is 1.5802616645945584
Epoch 7 of 150 LR [0.009524135262330098]


46it [02:41,  3.47s/it]

torch.Size([2, 131328])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005],
        [0.0000, 0.0000, 0.0000,  ..., 0.0004, 0.0005, 0.0005]])


287it [16:41,  3.49s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.8108e-05,
         -1.9141e-05, -7.8119e-06]])


389it [22:40,  3.49s/it]

torch.Size([2, 200448])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.9087e-05,
         -5.9199e-05, -5.9370e-05]])


637it [37:13,  3.47s/it]

torch.Size([2, 275328])
tensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.0015, -0.0014, -0.0014]])


666it [38:54,  3.51s/it]

torch.Size([2, 180864])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -4.0159e-05,
         -3.1214e-05,  3.6880e-06]])


926it [54:06,  3.47s/it]

torch.Size([2, 205056])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.7344e-06,
         -7.1339e-06, -7.7367e-05]])


1025it [59:53,  3.49s/it]

torch.Size([2, 270720])
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -7.7933e-06,
         -1.6287e-05, -5.2683e-05]])


1062it [1:02:03,  3.51s/it]


MEAN EPOCH LOSS IS 1.3129917292271631
target a goal task is a task of satisfying a condition
prediction ico pro scasi ost te sodis fi ifpiodition
target he was drawn to a debtors prison
prediction he was cuned to the deters prison
target suspicion eager and sharp looks out
prediction suspision e ter a sror look outolteee
target he believes that it was one of the most important decisions of his career
prediction thebeli setr was ooffemis inpoedo the fisions of his coolis
target five people are sitting together in the snow
prediction fo peofle a siting togather in t sco
target he assisted his mother in leading the denomination
prediction he isisted is mother inleating the denobonation
target it is a commercially important species
prediction it is a commersually inmporo speasies
target whitefish point is a designated important bird area
prediction wi fisbo is a dasiated importen purdarea
target dunckley was born in warwick
prediction tokle was born in ori
target today its catalogue has ex

target oconnor charity fund
prediction the corer hurdy fomehlt 
target the following years were not nearly as successful
prediction thefo ears were not narly as sucessful
target the schools mascot is the rising sun
prediction the scoos masicot is the rising so
target he was replaced by new lead curtis young
prediction he was supplaed by nuele crutes guns
target collins and major chuck yeager
prediction cos an majer trip eater
target he then went on to direct minor films
prediction he te worin to tarit monfils
target grusin is married to nan newton
prediction rruson is married to nan muvise
target text programs are easier to write smaller and run faster
prediction theprorensar isi to rraloter abepesiten
target a woman in a white shirt playing volleyball
prediction e bomen in e wa shirt paying voyble
target he was left in his cage
prediction he was te lefst inus cae
target these are available in both print and electronically
prediction the arrawasbu trit armetor
target he received his in

target it tastes strongly of iodine
prediction it a stoly ofaedise
target dibens lives and trains in boulder colorado
prediction dibins livves an traineds in bulder colleroa
target the contact was limited but it was at full speed
prediction the pontic was ebecatebeytaser fu spedy
target three people giving a presentation before a crowd
prediction fre pepeliing of pisicatiton e for
target they toured around europe and north america mostly as support
prediction the tordaroour of ben northe merita mos le i suppor
target a celebration in king arthurs court follows
prediction a sobasion into otis co follos
target the group subsequently disbanded
prediction thet te coks ubchipentexunded
target it is eaten with bread
prediction it is acin ri brat
target the underfur is paler in color
prediction the counderfer is paler in coler
target the modern term for this meaning is ultrasonic
prediction the moden tem for the se is orte soni
target the toes contain lamellae
prediction he tos conan themello

0it [00:00, ?it/s]

target timothy played herriot in the television series
prediction twoesepla erriat in tytobision seris
average test loss is 1.3849328603336517
Epoch 8 of 150 LR [0.009381533400219317]


50it [02:53,  3.44s/it]

In [291]:
def beam_search_decoding(output, answ, answ_lens, blank_label=0, width=8):
    decoded_preds, decoded_targs  = [], []
    
    text_transform = TextTransform()    
    
    for i, mat in enumerate(output.transpose(0, 1)):
        last = {}        
        P_b, P_t = 1, 1
        P_nb = 0
        # dict       [0:prob_blank, 1:prob_not_blank, 2:prob_total, 3:list of ]
        last[''] = [P_b, P_nb, P_t]
                        
        for t in range(mat.shape[0]):
            curr = {}
            
            # sorting
            cand = [(key, el) for (key, el) in last.items()] # листы pb pnb pt beam
            sorted_cand = sorted(cand, reverse=True, key=lambda x: x[1][2]) # P_Total
            best_beams = [key for (key, el) in sorted_cand][0:width] # лучших w штук bшк
           
            for beam in best_beams:
                P_nb = 0
                if t == 0:
                    beam = ''
                else:
                    if len(beam) > 0:
                        last_num = text_transform.text_to_int(beam[-1])
                        P_nb = last[beam][1] * mat[t, last_num]

                    P_b = last[beam][2] * mat[t, blank_label]

                    if beam not in curr:
                        curr[beam] = [P_b, P_nb, P_b+P_nb]
                    else:
                        curr[beam][0] += P_b
                        curr[beam][1] += P_nb
                        curr[beam][2] += P_b + P_nb                
                
                # 0 is blank
                for c in range(1, mat.shape[1]): 
                    new_beam = beam + ''.join(text_transform.int_to_text([c]))                    
                    
                    if len(beam) > 0 and last_num == c:
                        P_nb = mat[t, c] * last[beam][0]
                    else:
                        P_nb = mat[t, c] * last[beam][2]
                    
                    if new_beam not in curr:
                        curr[new_beam] = [0, P_nb, P_nb]
                    else:
                        curr[new_beam][1] += P_nb 
                        curr[new_beam][2] += P_nb                
            last = curr
        
        cand = [(key, el) for (key, el) in last.items()]
        sorted_cand = sorted(cand, reverse=True, key=lambda x: x[1][2])
        best_beam = [x[0] for x in sorted_cand][0]
        
        decoded_preds.append(best_beam)
        
        # i - номер бача
        decoded_targs.append(
                text_transform.int_to_text(answ[i][:answ_lens[i]].tolist())
        )
        
    return decoded_preds, decoded_targs                

In [292]:
test(model, opt, val_loader, CTCLoss, device, bs_width=8)

target:  he started violently
prediction:  hs ter fi
beam_search_preds:  this ter fise
target:  i think it is
prediction:  i then etis
beam_search_preds:  bi  theng etes
torch.Size([2, 198144])
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 4.8324e-05, 7.6901e-05,
         3.2309e-05],
        [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 4.8324e-05, 7.6901e-05,
         3.2309e-05]])
target:  a boy is sleeping in his dinner bowl
prediction:  abi seping in as tat 
beam_search_preds:  ta bi seping in as ta nrllltttt                                                         
target:  a woman in gray cuts cake
prediction:  a won in gray cat
beam_search_preds:  ta  won in gray cats
target:  is she as good as you say
prediction:  the se as boreses
beam_search_preds:  the se  as boreses
target:  flaming youth
prediction:  flaiin ou
beam_search_preds:  ffllain ou
target:  list of mirmo
prediction:  he list bof e ammo
beam_search_preds:  the list  of e ammol
target:  you have a couch
prediction

target:  they are also used for local level polling

prediction:  they are also used for loka level polling